One essential concept that every backend engineer should know is connection pooling. This technique can improve the performance of an application by reducing the number of open connections to a database. Another related term is "proxies," which help us implement connection pools.
In this article, we'll discuss connection pooling, implementing it in Postgres, and how proxies fit in. We'll do this while also examining some platform-specific considerations.
Challenges of managing user requests
Most applications have users constantly sending requests for different purposes and activities. Some applications might have many requests, while others might have fewer. Furthermore, the frequency and the intervals of these user requests also vary.
Whenever a user sends a request, the backend database or servers need to perform several activities to open a connection, maintain it, and close it. Several resources are used to perform these functions.
When multiple requests are coming simultaneously, performing these functions for every request becomes challenging. The database overhead is just too much to handle. As a result, when there are too many requests, the number of transactions per second plummets, and latency swells. In essence, the time to fulfill every request increases, and the performance of database servers becomes relatively poor.
Here's where the concept of connection pooling comes into play.
What is connection pooling?
Connection pooling is the process of having a pool of active connections on the backend servers. These can be used any time a user sends a request. Instead of opening, maintaining, and closing a connection when a user sends a request, the server will assign an active connection to the user.
Once the user has completed their process, the active connection will be added back to the pool of connections. This way, the number of overhead operations drops significantly, optimizing the servers' performance and solving the problem of low transaction frequency and high latency.
When should you use connection pooling?
The connection pooling mechanism can be used in the following scenarios:
- When the overhead for opening, maintaining, and closing connections is too much for servers to handle.
- When an application requires JDBC or JTA connection objects.
- When connections need to be shared among multiple users for the same transaction.
- When the application doesn't manage the pooling of its own connections for operations such as creating a connection or searching for a username.
The obvious benefit of connection pooling is that it improves server performance. Interestingly, a server using pooling can operate at speeds almost double that of a server that doesn't leverage this strategy.
Connection Pooling and Postgres
Postgres is an open-source relational database management system (RDBMS) that has been actively developed by volunteers from its community for more than 15 years.
Used by various applications on both web and mobile platforms, Postgres has various tools and offers features such as foreign keys, SQL sub-selects, hot standby (as of 9.0), views, and triggers.
Postgres also has tools for connection pooling. There are several tools in the ecosystem:
- pgpool provides connection pooling, load balancing, high availability, and replication abilities.
- pgbouncer is the go-to tool made for connection pooling only.
- pgcat is pgbouncer rewritten in Rust with support for load balancing between replicas, failover in case a replica fails a health check, and sharding at the pooler level.
You can use either of these tools based on your project requirements. However, Pgbouncer is the more lightweight connection pooler of the two. There are three main types of pooling supported by Pgbouncer:
- Session Pooling: This is the default method, where a connection is assigned to a client application for the lifetime of the client connection. Once the client application disconnects, the connection is added back to the pool.
- Transaction Pooling: In this method, the connection is assigned to the client application for every transaction. Once the transaction ends, the connection is added back to the pool.
- Statement Pooling: The connection is assigned for every statement. Once the statement is complete, the connection is released. In this method, multi-statement transactions are not supported.
Crunchy Data, Supabase, and Heroku all offer built-in solutions for using pgbouncer.
What is Yandex/Odyssey?
Odyssey, developed by Yandex, is a newer solution for pooling. While Pgbouncer is a single-threaded pooler, Odyssey supports multi-core and multi-threaded processing. The tool has the following features:
This feature enables one to set up multiple worker threads to process connections. With single-threaded processing, only one thread of instructions and requests could be processed simultaneously. Here, however, many threads can be processed simultaneously. Due to this, the processing capabilities of the backend databases are scaled multiple times.
Evolved pooling mechanisms
Odyssey's transactional pooling method is much more evolved than Pgbouncer. If the client disconnects unexpectedly in the middle of a transaction after assigning a connection, it'll automatically roll back the client to its previous state and cancel the connection.
Furthermore, Odyssey can remember its last server connection owner client, which means there's lesser overhead for the database to set up client options for every client-to-server assignment.
Besides the above two features, there are numerous others, such as full-featured SSL/TLS support, the ability to define connection pools as database and users, individual authentication for every pool, and more.
These more advanced features are only possible thanks to Odyssey's architecture. Let's see how it contributes.
Odyssey has an instance that manages the incoming client connection requests. Beneath it is the worker pool with multiple worker threads that can process incoming connections. Parallel to this worker pool is the system with routers, servers, consoles, and cron.
● The instance is an entry point for application requests and handles the initialization processes.
● The system then starts the routers, cron, and console. It listens to the incoming requests, and when there is an incoming connection, it notifies the worker pool.
● The router is responsible for the attachment and detachment of client-to-server and client pool queueing, among other operations.
● The worker thread is responsible for creating coroutines for clients when an incoming connection request is received on the queue and handling the complete lifecycle of the client connection. On the other hand, the worker pool is responsible for managing all the threads.
Sometimes the multi-threaded structure might have a lot of communication overhead, such as when there are fewer requests and the requests' time is relatively short. In this case, there is a single worker mode where, instead of creating separate threads and coroutines for every worker thread, only one worker coroutine is created inside the system thread.
While Odyssey and Pgbouncer function well as connection poolers with applications that have servers, there's another solution for implementing connection pools in serverless environments. These are proxy servers. Before I give you an overview of this solution, we'll need to make a detour to discuss how serverless applications function.
Not all applications require servers since maintaining them is quite a hassle, their costs are much higher, and there's a lot of waste. As a result, some applications go serverless.
In serverless applications, when a user makes a request, an instance is created, their request is fulfilled within that instance, and the instance is destroyed. So, in essence, these instances are tiny containers that run only the code needed to fulfill the client's request, and there's no other communication overhead.
Serverless environments present certain challenges for developers who want to use tools like Odyssey or Pgbouncer.
Odyssey's architecture is such that every instance manages the entire connection pool. In a serverless environment, numerous requests are made, and an instance is created and destroyed for each request.
This means when there are multiple requests, multiple instances will be created, and this means there will be multiple connection pools that wouldn't have communication between each other since it all doesn't fall under one server. Hence, they will not be able to coordinate and communicate if and when the database is overloaded.
Thus, Odyssey will not be able to offer its connection pooling services in a serverless environment.
Here's where Prisma Data Proxy comes into play.
What are proxies? How do they work?
Proxy servers are a solution where a data management platform acts as an intermediary between the client and the database. This proxy server can manage the traffic coming the database's way. In this article, we'll learn about proxies by studying Prisma Data Proxy.
This is a recent solution that is still in its early stages. However, it looks promising and even easy to handle, with a clean layout and a navigable UI.
By using the connection string, or clicking Create a new connection string, you can set up the Prisma Data Proxy for serverless applications and manage the connection pools.
Connection pooling is a helpful way to improve the performance of your application. This article showed what challenges server and serverless applications face when managing client requests and how they can be solved with Postgres tools like Pgbouncer, Odyssey, and Prisma Proxy Data. Consider implementing some of these solutions in your next project!