How to scale a web service efficiently : Database

Meg's tech corner
5 min readAug 30, 2020

Part 1: Scale the database layer

Regardless of working as a freelancer, starting a Startup, or being employed in the tech industry, we are likely to work with some web services. Web services normally start with a simple design and have a small number of users. As our web services become popular, we need to scale them. However, scaling up a web service is extremely complicated since the design of a web service targeting 1,000 users is dramatically different from that of a web service targeting 1,000,000 users. What’s worse, we often need to scale the web service within a very short time frame. For example, your Startup could become super popular overnight and driving 10x more users to your website. Under those circumstances, we need to be very conscious of the change we make. Sub-optimal decisions can bring side-effects over time, such as

  • Cost-ineffective. Without a proper design, we may need to invest 100X more resources to serve 10X more users.
  • Not system friendly. We may depend on many new frameworks or third-party services or introduce some `hacky` logics. This can make our system super complicated and hard to maintain. When the number of users grow again, it might even be impossible to scale the system up without rewriting it.

In this article, we will introduce some guiding principles and best practices adopted by large tech companies to scale their service to serve billions of users. We will also illustrate how to apply them with an example.

The initial application

Let’s imagine that we want to create a blog platform. It essentially provides two functionalities, posting a blog and reading blogs from other authors. Our initial system can be very simple.

Initial system design

As shown in the figure above, we would have an application server and a database. Users interact with the platform by sending request to the application server. For the request to post a blog, the application server persists the blog content to the database. For the request to fetch blogs based on some criteria, such as the title or the author, we can query the database for matched blogs.

However, as the number of our users grow, both the server and database can become the bottleneck. For the server, we may not have enough memory or processing power (CPU cycles), or the number of concurrent network connections is larger than that can be supported by a single machine. For the database, we may be short of disk storage or the database I/O speed may not be fast enough to process the ever-growing number of requests. In the next few sections, we will introduce some patterns to resolve those issues.

Pattern 1. Database partitioning

Database can quickly become the bottleneck of the whole system. Partitioning is a widely-adopted approach to scale the database layer.

With this approach, we will bring up multiple instances of the database. All database will have the same schema. Different databases will host a disjoint set of data. For instance, if we have three database instances, we can partition the data as following. The blogs belonging to the authors with user id starting with a to g can be stored in the first database. h to n can be saved in the second database and the rest in the third database.

The logic to retrieve the blogs belonging a particular author would then become something shown in the figure below.

Query partitioned database

QueryBlogInternal formulates the SQL query and uses the database connection to execute the query. QueryBlog checks the starting character of the user id and passes in the corresponding database connection.

With this, we are effectively storing the data in different database instances. However we still present a unified logical database to the higher level application logics or external users.

Other patterns

Note that partitioning relational databases is not the only solution to scale the database. Non-relational database or NoSQL is another solution with some limitations. There is also a new database type called NewSQL, which combines the scalability of NoSQL and the schema and transaction support of relational database. Of course, NewSQL has its own caveats. We will not discuss them in details here and will instead have another post dedicated to the comparison between, SQL, NoSQL and NewSQL. Remember to follow me so that you don’t miss this!

Pattern 2. Database replication

Users tend to expect high reliability from a web service. It is normally not acceptable to users that their blogs are lost because our database crashed. To improve the system reliability, we can replicate the database.

Database replication

As shown in the figure above, when the server writes to the primary database, the data is replicated to the backup database. With this, we at least have one additional copy of the data in case that the primary database crashes. We can have more instances of backup databases to improve the reliability.

There are a few ways to replicate data, which can be roughly categorized into two groups

  • Asynchronous replication. Primary database persists the data to the disk and replies back to the server. Primary database then sends the data to the backup database(s) in the background. The advantage of this approach is that write latency is low since we don’t need to wait for acknowledgement from backup databases. However there are some delays between primary databases and backup databases. If the primary database crashes, there could be some data loss.
  • Synchronous replication. When primary database receives the request from server, it persists the data to its local disks. In the meanwhile, it sends the data to backup databases. It doesn’t reply back to the server until it receives acknowledgement from (a majority of) backup servers. It is extremely unlikely to lose data with this approach at the expense of higher write latency.

Database replication improves the availability of the web service. When the primary database crashes, the system can promote the backup database to become the primary database. In the meanwhile, we can remove the crashed database and add in a new database instance to be the new backup database. With this, our system will behave like an always on system to our users!

In the next blog, we will share more patterns, such as cache, sharding the server, proxy and isolating read and write traffic. Remember to follow me so that you don’t miss it.

--

--