Erlang at bet365
Scale, scale, safety, and more scale.
At bet365, our systems are built and maintained by a large software development organisation comprising various specialised departments, including UI, relational databases, tooling, and more. These departments cater to specific functional areas, but most systems eventually need to integrate with platforms owned by the Core Systems department. These platforms serve as the backbone for essential business flows such as Placement, Settlement, Cashout, Push, and User Data.
The Core Systems platforms are large, distributed systems that have evolved over the past 10 years. In the earlier architecture, many systems resembled a ‘Multitier architecture’ with operational SQL databases at their core. However, this architecture presented limitations and scalability challenges as the business grew rapidly. To fuel the necessary scaling at near-startup pace, a strategic evaluation of emerging hardware and software technologies was undertaken.
Why Erlang?
This evaluation started with various emerging trends in hardware, and software technologies. We built our own datacentres and operate our own infrastructure, the two needed to evolve in lock-step for the business to grow as quickly as we needed to. Moore’s Law was slowing, and we needed to beat it. To do so, we had to capitalise on those emerging hardware trends, namely the rapidly rising number of cores available on an enterprise-level CPU. There were myriad other factors considered here, but we’ll focus on the Core Systems angle as we care a lot about compute, and throughput.
During the evaluation process, two technologies emerged as top contenders: Erlang and Golang. Both languages offered excellent concurrency models and runtimes that could utilise the increasing number of cores available on enterprise-level CPUs. However, Erlang, with its lightweight processes, and Golang, with its ‘Goroutines’, stood out as the primary candidates.
Erlang, specifically, proved to be a strategic choice due to its numerous advantages. It provided a simplicity and consistency across the codebase through functional programming and pattern matching. The elimination of side effects simplifies testing and improves issue identification. Erlang’s lightweight processes allows for the quick and efficient spawning of millions of processes, enabling the use of powerful OTP behaviours such as Gen Servers, Supervisors, and Finite State Machines.
Not only that but Erlang’s code readability, the ability to let processes crash and automatically restart (when supervised), and its support for hot code loading were crucial factors that made it the preferred technology for rebuilding and inventing core platforms such as Cashout.
Developing Erlang at bet365
To explore the use of Erlang at bet365, we assembled a panel of developers from across the Core Systems teams. We’ll delve into various aspects of the developer experience and the benefits Erlang brings to system reliability, scalability, and safety.
Panellists:
- Matthew Downs, Lead Developer, Settlement Team
- Ashley Wyatt, Lead Developer, Placement Team
- Adam Baggaley, Software Developer, Placement Team
- Liam Jolley, Technical Lead, Push Systems Team
Developer Experience
We discussed insights into the advantages of Erlang, including its functional language, pattern matching, simplicity, and code readability. The ability to let processes crash, clean up resources gracefully, and restart them in a known good/default state through supervision trees have proved invaluable. The portability of the Beam/VM, hot code loading, and Erlang’s supportive community were highlighted as significant strengths.
Adam: Erlang’s functional language and pattern matching provide simplicity and consistency throughout our codebase. Eliminating side effects simplifies testing and issue identification. Creating a process in one line and automatic process restart (when supervised) promote clean code structure. Erlang’s lightweight processes (pids) can be spawned quickly and efficiently, enabling the use of Gen Servers, Supervisors, and FSMs from the OTP library.
Matt: Once you get past the syntax, Erlang code conveys a clear narrative. The absence of passing around type information enhances readability and simplifies code understanding.
Adam & Liam: Embracing the “let it crash” approach allows us to handle bad client behaviour by terminating the process and gracefully cleaning up its resources. Supervision trees enable bringing the process back up in a known good/default state, preventing the spread of detrimental behaviour. Customizing the supervision tree helps handle crashes and releases effectively.
Liam & Matt: The Beam/VM offers portability, allowing code to be written once and run anywhere the Beam VM is supported. We can connect to a live running BEAM VM and inspect it. Erlang’s hot code loading is a powerful feature that facilitates module swapping without dropping data or state. It expedites bug fixes and even allows changes in function headers on-the-fly.
Scalability and Safety
We discussed how Erlang’s built-in message passing, combined with the queue-worker-queue pattern, facilitates the easy addition of nodes and efficient distribution of work in highly distributed systems. We highlighted the safety measures provided by Erlang, such as the Heart functionality and the supervisor tree architecture, which enable the recovery and compartmentalisation of failed applications or the BEAM VM.
Ash: Erlang’s built-in message passing between processes and nodes simplifies the addition of nodes. The queue-worker-queue pattern works well in highly distributed systems. Dynamically defining synchronous and asynchronous behaviour facilitates distributing work across millions of threads efficiently.
Pros and Cons
Erlang’s suitability for I/O-intensive applications, features like ‘Distributed Mnesia’ (OTP) and strong tooling support were highlighted as pros. However, we acknowledged challenges such as the limited availability of Erlang expertise (compared to other languages) and potential compatibility issues between major versions.
We’re Hiring
As the demand for Core Systems development capacity continues to grow, bet365 has numerous open positions across all levels. If you are interested in joining our teams, working on massive distributed systems, solving unique technical problems, and prioritising career progression, please reach out to our recruiting team or the individuals mentioned in this post.
We hope this post has provided valuable insights into our use of Erlang at bet365 and the benefits it brings to our Core Systems development.