How to scale a web service efficiently : Caching

Meg's tech corner
7 min readSep 3, 2020

Part 2: Caching

This is the second series of the blog, how to scale a web service efficiently. The first part, which contains introduction and patterns to scale databases, can be found here.

Pattern 3 Cache

Cache is one of the mostly widely used approaches to scale a system. Compared to the database, cache provides smaller but faster storage. It takes advantage of locality: recently used data is likely to be used again. Therefore we can store recently accessed data in the cache and it will save us time by reading from cache when we need the data again in the near future. It will greatly reduce the load on the servers and databases if we properly manage the cache, since many requests can be served with the data from the cache.

Cache is not free though. Counter-intuitively, it can degrade the system performance or cause wrong results to be returned if the cache is poorly managed! I will discuss more about the difficulties of maintaining cache consistency in the section below.

Cache can be virtually used at any layer of the system. In the example shown in the figure below, we insert cache at both the frontend layer and the database layer. If the result is present in the frontend cache, we can return the it to the users before dispatching the request to the servers. This type of cache works particularly well for popular, infrequently updated data, such as the landing page. Cache is inserted before the database as well. Our servers can then get the data without requiring the database to execute the SQL query and fetch data from the disk, if the data is inside the database cache.

Roughly speaking, there are two types of cache, global cache and distributed cache, which are explained in the sections below.

Global cache

As its name implies, global cache puts all cached data at a centralized place, for instance, a single server. This approach is very simple and can be quite effective at the early stage.

Global Cache

However, the usefulness of the cache can quickly decrease as the number of the users increases. The amount of data we can cache is limited by the storage of the single machine. The throughput of the cache is limited by the performance of the machine as well. The cache hit ratio can quickly drop since the data is likely to be evicted before it is accessed again.

Distributed cache

To scale the cache, we can partition the cache, just the same as database partitioning introduced in the first article.

Distributed Cache

There are a few partitioning strategies.

  • Static Partitioning

Static partitioning deterministically maps data to a cache. It could for example use some simple hash function to determine the cache location,

hash(data) % num_of_cache_servers

While being very simple, static partitioning has some drawbacks. It is very hard to add or remove a server since it would require moving virtually all the existing data to its new destination. Consider the simple hash function: data % num_of_server where data is an integer. When we have 3 servers, integer 3 is cached on cache server 0. When we add a new server, integer 3 will need to be moved to cache server 3.

If the hash function is not properly chosen, data can be imbalanced between different cache servers and some cache server could become a hotspot.

  • Consistent Hashing

Consistent hashing improves static partitioning schema. It requires only rebalancing 1 / (num of servers) amount of data when adding or removing a cache server. It can effectively balance data across cache server. There are many details to discuss about consistent hashing and many tricks to make it more efficient. We will instead dedicate a blog to talk about consistent hashing. Please remember to follow me so that you don’t miss it.

Difficulty of maintaining cache consistency

As people say,

Two most difficult things in software engineering are naming and cache invalidation.

It is extremely difficult to maintain cache consistency. If it is not well engineered, the performance of system will degrade or incorrect results may be returned to the end users! We will illustrate this with two most widely used caching patterns, write-through cache and write-back cache.

  • Write-through cache

Write-through cache means you update the cache and database simultaneously. The plus point of this paradigm is that it is easier to reason about the correctness of the system since the cache and the database is always in sync. If you can find the data in the cache, it is guaranteed to be up to date.

Write-through cache. Both database and cache are updated at the same time.

Ideally, when the cache server crashes, our system should be able to continue to serve write requests. As illustrated in the figure below, when our server knows that the cache server has crashed, it only needs to update the database and can then reply back to the user.

However since our server can only communicate with the cache server via network requests, it won’t be able to detect if the cache server has truly crashed. Consider the following two scenarios

  • Cache server crashed. Our server won’t receive any response from the cache server since a crashed server can’t handle requests.
  • Cache server is network partitioned from our server. Our request won’t be able to reach the cache server and therefore cache server won’t send anything back to us.

Given that our server won’t be able to know if the cache server is indeed crashed. We have to be conservative and fail the write request. Otherwise the database and the cache server will be out of sync under the second circumstance if we only update the database.

This is an instance of the CAP theorem. Write-through cache favors consistency (CP) and it can’t achieve availability. I will have a blog on CAP later so please remember to follow me.

Cache is supposed to be an add-on to improve performance. With this design, cache can actually hurt the write request quite badly! Note that the read request can still proceed, though with higher latency. We can set a relatively low deadline on read request. If it times out, we can fetch the data from database.

  • Write-back cache
Write-back cache

Write-back cache means we update the database and relies on the database to invalidate the cache.

If we can tolerate stale cache, this design can be simple and effective. Database can for instance have a background job that sends requests to cache to inform them to invalidate the cache.

However many applications require stronger consistency in real life:

  • If we post a tweet or post, we expect it to appear on our timeline
  • If we add an item to our shopping cart, we expect it to be inside our shopping cart

One naive way to support this is that database invalidates cache before it saves the data. Note we need to invalidate before persisting the data to the disk of the database, otherwise there will be a short interval when data are inconsistent between database and cache. Again, if cache server crashes or is network partitioned from the database, all the write requests will fail. So is it possible for database to know cache server has crashed or invalidate the the data in the cache server without even communicating with it?

Another factor to consider is that do we really need to invalidate cache for every write request? Imagine that we are updating a comparatively old blog. That blog has already been in the cache for 1 month. Instead of relying on the database to invalidate the cache on the next update, is it possible for the cache itself to identify some infrequently accessed data, evict them and inform the database about this? So that the next time we update some cold data, we don’t need to wait for cache invalidation?

The solution to the two questions above, cache server crash detection and cold data invalidation, is read lease. It is widely used by large tech companies, for example DynamoDB from Amazon. This topic is very interesting and useful and deserves a complete blog. So please remember to follow us to get notified when we discuss the read lease.

--

--