Introduction to Microservices
Microservices are not new—they have been around for quite a while. However, it is because of innovations that came recently which accelerated the adoption of the microservices architecture. Most importantly:
- Cloud-native architecture
- Cloud computing revolutionized the way how we handle hardware. Even large companies now are moving towards the cloud because it is much easier to manage and scale despite the higher costs associated (costs eventually go down as companies cut down on resources spent for maintenance).
- Containers and Container management
- These are also not newer concepts. The Linux operating system had these capabilities natively for a long time. Still, systems such as Docker and Kubernetes changed how we deploy and manage applications, democratizing the whole process with more features and tooling only a few years ago. As a result, it is much easier to slice hardware to our requirements and also to manage them using these tools, which opens up a whole new world of opportunities.
The idea of microservices comes from the service-oriented architecture world. Although they seem similar, they have a lot of differences. The concept of SOA is an enterprise-level concept, whereas microservices is an architectural concept. It focuses only on the application under consideration.
Microservices offer several advantages.
- Cleaner separation of codebases. Each service will be in a separate codebase/repository.
- Separate code deployment/release pipelines. No collaboration is needed, and breaking changes can be handled by API versioning and other strategies.
- Scaling of individual required services such as payments, order generation, etc.
- Polyglot programming is very much possible since services majorly communicate using REST which runs on top of HTTP, and they are language agnostic.
While there has been a lot of emphasis on tooling and professional support for microservices, the database space has largely been ignored and not talked about. In this article, we will see why approaching database design from a microservices perspective is wrong and how best to design our databases and particularly about how to split our database systems for horizontal scaling. The majority of this article is focused on traditional database systems such as PostgreSQL, MySQL, Maria DB, etc. However, the concepts do extend to other database systems, including NoSQL.
The hardest part of microservices is the data layer
Hardware Limitations of Traditional Databases.
There are plenty of reasons why we should not approach database design with the microservices principle in mind. This can be classified broadly into two categories, i.e., hardware and software.
Traditional RDBMS do not natively fit into a microservice architecture. Relational databases have been designed for vertical scaling and do not really scale horizontally. In fact, many databases were not developed, keeping containerization in mind. It is also not practical to force-fit existing DBMS into a microservice backend.
Certain database queries which are OLAP in nature use a lot more CPU. Therefore, features like CPU Pinning, Low-level CPU optimizations are much better suited for traditional VMs or even bare metal machines, even though they can be configured in containers such as Docker.
Memory & I/O Tuning
Database systems use a lot more memory than most application software because they deal with more data. Scheduling such large nodes on container orchestration tools is really hard and often interferes with other application workloads.
I/O is a whole another beast. Even though the support is improving, certain optimizations such as Software RAIDs, Logical volume caching are much better suited in VMs/bare-metal machines.
Kubernetes and other systems slice the hardware, and they run different applications on them; doing an optimization specifically for a particular service might prove detrimental for other services. So instead, it is better to use VMs, managed database services such as RDS, or even cloud databases such as AWS Aurora.
Hardware limitations sometimes naturally translate to software limitations, but we also need to worry about other things.
Transactional Boundaries Of ACID
ACID, by its principle, cannot be scaled horizontally simply because of strong consistency. If we need to run our transactions across multiple servers, then it will become impractical in terms of performance. This is the trade-off NoSQL systems do for scaling as they offer eventual consistency in place of strong consistency.
There will be potential for duplication in terms of processes if we split our databases across multiple nodes/clusters. E.g., PostgreSQL needs to run a mandatory set of processes (for WAL logging, checkpointing, connections) that needs to run for each cluster. Although they consume resources according to need, they occupy a certain basic threshold similar to the JVM for performance reasons. Therefore, if these resources are duplicated across them, it will lead to wastage and improper utilization.
Database Architecture for Microservices.
With these limitations in mind, let's see some architectural concepts for database design on microservices. This section has both industry standards and some customized patterns from my own experience. First, we will take a standard well-known industry example of an E-Commerce application.
Shared Nothing Architecture.
This sub-section describes database systems that do not share anything with other databases serving the same application. They are clearly separated in terms of hardware and resources and only depend on functional needs.
Note: These individual approaches are complex and would require separate articles to be explained completely. But the goal is to give a high-level overview.
Database per Service
This is the simplest of them. Each microservice has its own database, and there is no sharing with any other service.
- Order Management.
- Product Catalog.
Under this architecture, each of these services will have its own database, and there is no sharing, no matter how small the service is. The advantage of this approach is that failures associated with each service will not affect other services and cleaner logical separation.
The database per service might look simpler in theory, but a business transaction might need to happen across different service boundaries in real-world applications. Sure, we could call the payments service once an order is placed, but depending on the result of the payments service, we need to store the order status appropriately. We cannot take the route of eventual consistency in this scenario since other customers might order the same product, which could have limited stocks. Saga pattern attempts to solve this problem in a much more robust fashion.
Let's see an example of how we can handle it in the case of a product order placed and the sequence of events it follows.
A product order can only be prepared in this sequence after consulting with the catalog service to check for stock availability. Furthermore, the order can only be confirmed after there is a payment success from the payment service. We can think of the below ways to handle this.
- Two-Phase Commit
- This is very costly and involves a lot of database locks. Also not scalable across clusters.
- TPC is not necessarily part of the SAGA pattern but rather as a motivation on why the SAGA pattern is necessary.
- Event/Choreography based
- Each of these services listens to different channels. Once an event is published, the services pick these and then respond back with results.
- The Order service listens to these response events and then does the transactions based on the results.
- Orchestration based
- Instead of listening to response events, the order service takes the central responsibility of orchestrating the communication between these services and then does a local transaction in the order management service itself. It can choose to do this via API calls, Messages, and even scheduled jobs.
Channels can be modeled upon queue libraries such as Rabbit MQ or Active MQ. Of course, each of these approaches has its own advantages/disadvantages, but we need to understand that these are established patterns in the industry. Furthermore, several libraries, such as Spring Boot and Apache Camel, even offer support for this pattern.
The SAGA pattern talks more about implementing transactions across services rather than queries. Subtle differences are depending upon our implementation. In a real-world setup, we might even need both of these. The API composition pattern uses individual API calls to get data from respective services and then combines them to show a more unified view of the data. To implement the API composition pattern, we can take the help of cloud-native serverless technologies such as AWS Lambda, which can serve as a platform/service to combine the data.
An example of the API Composition pattern is described below.
Any shared data can be moved to a separate service + database combination, and the API orchestration pattern can be used to fetch the data. Keep in mind that we are now doing transactions across different databases. Whatever ACID functionality the database offers should now be handled in the application layer.
CQRS stands for Command Query Responsibility Segregation/Separation. It is another architectural concept that separates reads and writes using different components. With this pattern, we are leaving the relational database domain and into the world of eventual consistency.
With the reads and writes separated, the view could even be in a different hardware/system, and the write database can then slowly update the view as and when required. Read and Write clusters can be individually scaled based on the requirement. Since there is this separation, when we query the view, it might not return the latest results, hence the eventual consistency.
Not all services require such complex separation of the data layer. Therefore, we can choose simpler strategies depending on the complexity of the application.
Traditional Database Separation
Each service can be separated using any of the following ways.
- Database per service. Each service has its own database, and a database cluster can host more than one database. This is different from the shared-nothing database per service, where each service essentially has its own cluster.
- Schema per service. Databases can be shared among different services, but each service has its own unique schema, which is not shared.
- Sharding by content. The classical technique of sharding database tables using range/table size.
- Shared schema and database. Tables are shared between different services and are accessed like any other application.
Rather than separating databases by their functionality, multi-tenant databases split them by customer range/demography. This is a natural way of handling database separation, and it helps to think not from a microservice perspective but a database perspective. They can even span across multiple servers, not limited by ACID boundaries since one tenant does not share data with another tenant. The only limitation is that a single tenant's data cannot span across multiple database clusters. It can be implemented in the following ways.
- Database per tenant. Each tenant has its own database. A database cluster can host multiple database systems.
- Single database shared schema. Similar to schemas separated by services, we can separate each tenant/customer range into a separate schema.
- Separation by range. The schemas and databases are the same, but the records are split using a separate column with the range mentioned. E.g., 1-10000 might indicate one range, and the next 10k might indicate another range.
Database as a Service
As discussed in the previous sections, traditional relational databases are not suited for cloud-native/microservice-based architecture. Hence, they gave birth to newer database architecture that evolved. As a result, cloud-native database systems or popularly called database as a service, have been gaining popularity lately. Let's see how AWS Aurora, a cloud database offering from AWS, has many advantages that we, as end-users, need not manage/worry about.
- Automatic upscaling and downscaling of storage space up to 128TB.
- Higher reliability and cross AZ replication.
- "Pay as you use" billing model.
- Up to 15 read replicas for a single Aurora cluster.
This does not mean that everyone should jump on to these and start using them. They have a higher cost (20%) more than traditional RDS. So it really depends on the usage but what is important is that we have such choices now. A few years before, commoditization of such databases was not even there on paper. Cloud database systems are better suited for microservices as they have higher storage, performance, and simpler cost model.
Real World Tradeoffs
We have covered quite a few topics, but I have tried to establish that it is beneficial to look at separating the data layer differently from the application layer. There are no right or wrong decisions—it is always about tradeoffs. This section will try to give a mental model on how to approach the data layer refactoring concerning microservices.
- If you are a startup just trying on a prototype or a small business trying to make it big into the market, then you would probably start with a monolith application or maybe with one or two services that do something else like payments/document generation. You should stick to a single database and share the tables with the other services.
- As you scale from a startup, the newer services create more and more tables as the complexity increases. At this point, we can choose to separate the tables into different schemas but stick to the same database. This can give a fair amount of isolation, and refactoring tables can be a bit easier. Upgrading the hardware to handle more scale also makes sense. Another potential change is talking to the service responsible for those tables rather than directly talking to the tables themselves. This is also called data ownership by services.
- The startup now has grown to a mid-sized company, and there are different business units within your organization. So you start splitting the core monolith application into different services. At this point, depending on the scale, you can choose to go the database per service route or stick to a shared database with different schemas.
- From this point, handling the growth with shared database systems might be very difficult. Depending on their budget and planning, many companies will either go with some form of Saga pattern, API orchestration, or CQRS. Cloud databases such as AWS Aurora, Google Cloud SQL, and Azure SQL are also good choices as they provide managed SQL that scales with the size of the business.
It is much easier to think from an application perspective on what sections of the applications share what tables/schema. Once such clarity is achieved, we can then split the tables and put them into their own database leading to easier management. Duplication of data is not all that bad if it helps to provide isolation and cleaner separation.
- We saw how microservices are different and how they evolved for more complex architecture needs.
- The data layer cannot be treated in the same way we treat our application software. Fundamentally system software such as relational database systems work differently and cannot be fit in the same principles that design application software.
- ACID, by its principle, cannot scale horizontally. This is not just a lack of tools or technology that prevents it, but this is a distributed system problem that is actually prevented by the laws of physics. Any data that goes in for replication can provide maximum travel at the max speed of light in theory (in reality, it is much slower). Thus, there will always be a delay if we want to scale horizontally.
- Several architectural patterns can help us overcome/mitigate these issues to a certain extent, but it comes with its complexity and cost. Therefore, one needs to be careful in not jumping to prematurely optimized solutions before understanding and breaking down the problem space.
- Besides the technology involved, the cost is also a limiting factor for many of these solutions.
- Cloud-native technologies have evolved so that even smaller organizations can use these technologies on a cost-per-usage basis, thus providing easier ways for them to scale.
Hopefully, this article has shed some light on the paths a team/organization should take regarding database scaling. Several cloud providers have playgrounds to rapidly deploy and test our application before it hits the real world. Therefore, we should attempt to make maximum utilization of these resources at our disposal.