Case study: Bet365’s Platform in a Box

This article originally appeared in Computing magazine on 31 March 2023. You can view it here.

Case study: Bet365’s Platform in a Box

How bet365 overcame the challenge of the federated US regulatory environment

 

When we entered the U.S. market in 2019, we knew we had taken on a big technical challenge.

 

Unlike most countries, the regulatory environment does not operate as a single entity. Instead, each state has its own regulator with their own specific requirements and compliance architecture. Therefore, we needed to create a delivery framework that could be tailored to those individual requirements and the timescales of each state regulator.

 

It was a conundrum full of complexity. It asked, how do you split off the functional areas, e.g., deposits, withdrawals, etc., that exist in the large codebase and decouple them so they can be deployed separately?

 

What resulted was the biggest project in the company’s history and the creation of a continent spanning delivery framework we call, Platform in a Box.

 

A Solution in Software or Architecture?

The Members part of our system, where we deal with personal data and user activity is both tightly regulated and subject to high rates of change. Typically, we would manage all the requirements, regulatory domains, and the complex cadence, with a single codebase from a single system.

 

The first idea was to split the codebase in two. We’d have a US codebase and a rest of the world codebase. But there were two unworkable challenges with this. First it introduced huge complexity as we’d have to maintain two large monoliths of code. Keeping things simple is critical. You need one version of the truth, or you open the door to expensive error.

Second, if each state could have existed as separate entities that didn’t have to share any information with other states it may have been viable. However, it became obvious quite quickly that wasn’t where the market was going.

 

For example, we found that people living in the tri-state area wanted to move quickly between states and take their account with them wherever they went, without having to swap to another site or app.

 

We then looked at solving the problem through architecture. We explored running Kubernetes while utilising automation through the cloud. In principle this would work, but unfortunately, not in practice.

 

Kubernetes theoretically offers an excellent solution to the complexity and risk inherent in a segregated model, but the cost was huge. It just wouldn’t scale. Our monolithic codebase would likely create the biggest container in history.

 

Putting it into one place is okay. Doing it repeatedly and federating it was close to impossible. There’s also the inherent risk of this approach. The wider the surface area of the code and the greater the risk of it propagating into other areas.

 

Without the ability to parallel develop, it becomes a management and logistical nightmare. The first resource you’d run out of would be change and compliance officers.

 

A Breakthru in Our Thinking

While a code or architecture model wouldn’t work on their own, we realised that a hybrid of the two might. We looked at the deltas and realised that while there was a lot of change, a lot also stayed the same.

 

This prompted us to segregate the change and look at the model that resulted. We found that we could split the code into 3 pots: global, regional, and local. Combined, they provide a composite platform for our uses across the website

 

In principle, we had everything we needed. All that was left, was to work out how to architect it so it could be delivered efficiently. The challenge was to take the monolithic code base and chop it up or rewrite it to conform to our architectural pattern.

 

We achieved one layer of granularity by using microservices to segregate off functionality into smaller discrete services, each with their own individual responsibility.  It didn’t solve all our problems because you don’t want to work in a world with millions of microservices.

 

However, by putting everything into an entity service that could call the right code based on the global, regional, and local model, we were able to significantly reduce the number of operational services we had to build. We no longer had to worry about changes being made in other regions because they wouldn’t be called into the system.

 

Any code that was needed at a regional or local level could then be injected through our existing and well-established system of hot releases using virtual machines.

 

An Unprecedented Business Impact

When taking on any new project, the questions for us are always the same. How do we build a bigger boat? How do we scale wider? How do we get more out of what we’ve already got? Typically, this sees us going through a renewal programme every 5 – 10 years.

 

However, we’ve now engineered an entirely new scale option. We can take the load off the central system and put it in another PiB without going through a complete program of change and upgrade. It saves money because you don’t have to extend the Superdome. You can use virtual machines to run the different instances.

 

I’ve been at bet365 for over a decade, and this must be the single biggest project I’ve ever worked on.

 

When you consider it in man hours it’s likely 10 x bigger than creating the original platform. To hit parity, you’d need to look at all the work that was done during the first 15 years of bet365’s lifetime combined.

 

Authors

Darren Waters, Head of Software Architecture & Alan Reed, Head of Platform Innovation, Hillside Technology (bet365’s Technology Business)