Relational databases distribute their data across many tables by normalization or according to business entities. This makes maintaining a growing database schema easier. Real-world queries often span across multiple tables, and hence joining these tables is inevitable.
PostgreSQL uses many algorithms to join tables. In this article, we will see how joins work behind the scenes from a planner perspective and understand how to optimize them.
Recap of query planning
Before we dive in, a little bit of preliminary reading is in order. This post is the third in a series of posts to help you understand different parts of a query plan. You may find the articles listed below to be helpful. This post covers the JOIN node and the different algorithms Postgres uses to perform joins.
Understanding the structure of a query plan:
Scans in a PostgreSQL query:
To understand how PostgreSQL prepares the plans for join queries, we need to set up data for experimentation.
Let's have two tables in our data that have the following structure.
These tables are primarily for testing and self-explanatory in what they store. The
id column connects them, and they have a one-to-one relationship. Queries for
CREATE TABLE are given below.
We are going to use the
faker library in Python to generate some fake user data. Below is the code used to generate the CSV files for data.
I created a million rows using the above script and loaded the data into PostgreSQL using the below commands.
Let's start querying the table where we have created and loaded the data. We will run several queries and use
EXPLAIN to inspect their query plan. Similar to the article on scan nodes, we will set max workers to zero to make our plans look simpler.
SET max_parallel_workers_per_gather = 0;
Finally, we will look at parallelization in joins in the dedicated section.
Let's run a simple query that combines the data from two tables.
SELECT * from user_info JOIN payment_info on user_info.id = payment_info.id LIMIT 10
explain for this query would generate a
As the name suggests, a hash join builds a hash table based on the join key. In our example, we will create the hash table on the column
id of the table
user_info. We'll then use this table to scan the outer table,
payment_info. Building the hash table is an added cost, but remember that the lookup (based on a good hash function) is effectively
O(1) in terms of asymptotic complexity. A planner usually decides to do a hash join when the tables are more or less the same size and the hash table can fit in memory. An index cannot really help much in this case, since the building of the hash table involves a sequential scan (a scan of all the rows present in the table).
Let's do a query where we want to select all data from both tables on the condition that the ID of the
user_info table is lesser than the
payment_info table. This query might not make sense in the real world, but you can draw parallels where you might join based on this condition. This is typically called a "cartesian product."
This would result in a nested loop join.
A nested loop is one of the simplest and hence the most naive of joins possible. It simply takes the join key from one table and runs through all of the join keys in the secondary table in a nested loop. In other words, a for loop inside a for loop:
A nested loop is chosen by the planner under the following conditions,
- The join condition does not use the
- The outer table to be joined is smaller than the inner table.
These scenarios can typically come up when relations are many-to-one and the inner loop iterations are small.
Let's create indexes on both the tables for the join key.
If we run the same query we used in the hash join example:
The plan might look familiar if you had read the scans blog linked earlier since it uses something called an
index scan, but let's focus on the join algorithm. Since we have indexes on both the tables and the
BTree index is sorted by default, the planner can fetch the rows in the sorted order from the index and do a merge as indicated in the
Merge Cond node in the plan. It is significantly faster than any of the other join methods because of the index.
Let's disconnect the database session (to unset the
max_parallel_workers_per_gather setting) and then drop all the indexes to run our original query. This will result in a parallel hash join which is a parallel version of the original hash join.
Similar to parallel scans, parallel joins make use of multiple cores to speed up execution. In the real world, the parallel joins/scans can be faster or even slower depending on a variety of factors. Parallel queries require a dedicated article in order for them to be explained in depth.
Understanding worker memory setting
work_mem is the memory space used by PostgreSQL for joins, sorting, and other aggregate operations. By default, this uses
4 MB of memory, and each join operation can use up to
4 MB. We have to be careful in setting this since there could be multiple concurrent joins and each of them can use the set amount in a production database. If this setting is higher than what is required, it can cause performance problems and can also bring down the database itself. As always, performance testing is recommended before setting this value.
There are other settings like
logical_decoding_work_mem which can impact join performance, but in a typical production setting all join key/columns should be indexed and the
work_mem setting should be set to a proper value to handle application workloads.
The data setup we did was extensive, so I highly encourage readers to take time to understand
Scans and the join nodes mentioned in this article. You should also feel free to experiment with different query patterns and see how the planner behaves for combinations of such queries.
Joins are a critical piece of functionality for any relational database and it is important to understand how the planner works behind the scenes. To summarize:
- You should create
BTreeindexes for join keys to speed up join operations.
- If indexing is not possible, then try to optimize the
work_memsetting so that hash joins can happen completely in memory and not spill to the disk.
- Nested loop joins should be the last resort and typically involve comparisons like
>on the join key.
- The planner can combine indexes for other operations like
Scansand significantly speed up queries.
BTreeindexes are very powerful and have a wide range of applications.
Hopefully, this article helped you understand more about joins. Stay tuned for more articles on PostgreSQL node types!