Varnish Cache

Introduction

HTTP caching is a crucial part of our Content Delivery Network. Not only to improve the performance of the customer facing site, but also the vast amount of internal data associated with real-time applications supporting many thousands of concurrent sessions.

Cache mechanics

The aim of a HTTP cache is to reduce as much load on the backend web servers as possible. In order to do this, HTTP responses are stored directly on the caching server. Each object has a Time To Live (TTL); a counter which determines when that object is stale and should be refreshed from the backend. In order for objects to be referenced and to keep cache efficiency as high as possible, a hash is used for each object.

Each incoming request has its hash value calculated and compared to objects already in the cache. If an object matches and the TTL has not expired that object is served directly, instead of making numerous network hops and resulting in computational load on a back end system.

A cache hash can be made up of any number of attributes and is key to performance. For example, using the URL as a hash — objects can be served across domains for the same asset:

Request 1:

www.domain.com/assets/ilovecaching.jpg

Hash data: /assets/ilovecaching.jpg

This file is retrieved from a web server and stored in our cache.

Request 2:

www.domain.com/assets/ilovecaching.jpg

Hash data is still: /assets/ilovecaching.jpg

Previously retrieved object is returned directly from cache.

Request 3:

www.anotherdomain.com/assets/ilovecaching.jpg

Hash data is still: /assets/ilovecaching.jpg

Previously retrieved object is returned directly from cache.

Using a combination of request attributes to generate object hashes can fulfil the requirements for any cache opportunity, whilst maintaining a high efficiency. Typically, cache solutions and appliances will use a Host header and URL — generating more requests, in our example, than is actually needed. Where you’re dealing with a high request rate, it pays to be intelligent with hashing and save computation load and network bandwidth!

Our Problem

Given our initial use cases we turned to a top-end commercial appliance which also provided solutions to other use cases we had, you could consider it a Swiss army knife solution. Given our growth bursts, this proved to be a short-term strategy and limits were starting to be reached.

We embarked on a programme of growth and capacity testing, so we could predict when the current solution would no longer meet our needs. We found that the hashing table could support many hundreds of thousands of objects. This seemed like a large number, until we started ramping up the data volatility and reducing object TTLs.

Time for Something New

We took the decision to decouple caching from our Swiss army knife solution and replace it with a dedicated solution.

It was time for a re-think…

Further analysis in this space led us to explore open source technologies. At the time we had limited operational experience working with open source and generic compute, however we embraced the opportunity. Not only could we solve our capacity issues, but gain additional flexibility.

We purchased some “off-the-shelf” servers, installed our preferred Linux distribution and away we went to test our theory. It quickly became apparent that a dedicated solution was able to deliver better performance.

Challenges

Once we had verified the solution in our test environment, it was time to start introducing the technology into production. Being a brand new technology, it came with its challenges — both from a technical and a skills point of view.

Technically, we had to integrate the technology — integrating business processes as seamlessly as possible. For example, for day one, we didn’t want to change the process or tooling our Operations team used to flush the cache. To cater for this, we wrote a small integration component that our existing tooling could hit — abstracting the new technology. This proved extremely effective, allowing us to flush two different cache technologies in parallel transparently!

Having such a high hit rate, regularly over 20,000 requests per second, resulted in the default configuration constantly creating request handling threads to meet demand. Sometimes, this was at such a rapid rate, that we couldn’t create threads quickly enough! Thankfully, this turned out to be a simple fix, requiring us to start the cache at a much higher initial thread count.

Being a brand new technology and open source, we needed to make sure our team members were up to speed. The consensus was that formal training was required and so we reached out to a commercial vendor of the technology. Working with them, we were able to tailor-make a two day hands-on training course for our staff. This was a great success and demonstrates our passion for technology and the training opportunities available.

Results

As a result of this work, we now have a system that can operate at MANY times the capacity of the previous solution. Noticeable improvements were also made to the performance of the cache. Figure 1 shows the improvement in response time for a particular object, demonstrating a particular poor response time on the legacy solution. Figure 2 shows the overall mean response time across all objects.

Figure 1: P95 cache response time before/ after for a specific object.

Figure 2: Overall mean time to receive a full cache response.

Running on a standard server platform, requires no proprietary components; providing a more transparent and flexible overall solution. As a side effect, the costs of delivering this solution are also significantly reduced.

Our cache efficiency has improved, due to the removal of cache leaks, and our ability to support continued growth is no longer a concern.