A journey down recursive CTEs.

2 years ago   •   5 min read

By Katya Delaney

We are fortunate to be able to experience the joys of being confronted with and solving problems with software. Often we leverage features of tools like SQL engines to help us implement our higher level solution. This is the story of a problem whose solution was a perfect fit for a specific SQL tool.

Postgres' common table expressions, or CTEs, can be a useful structuring tool to craft a SQL query in a more readable and friendly way. They begin as a with statement and allow you execute a query whose results will be available later to subsequent query. If you have a query that relies on the results of another query, a reference of sorts, the CTE will allow Postgres to materialize the results of the reference query, ensuring that the query is only run once and its results are readily available to other queries farther down the line. (Note that this can be a double-edged sword, as it prevents the Postgres query planner from running optimization outside of the reference query’s SQL.)

A CTE used in this scenario can replace an ungainly subquery, derived table, or temporary table. Repeatedly nested subqueries or derived tables can be more difficult to read (especially as their depth increases) versus a sequential list of blocks whose dependencies are always above themselves.

Derived Table

Temporary Table
This will require two separate queries.

CTE

Most people wouldn't consider the CTE to be more readable than the derived table in this example, but as a query gets more and more complex, organization becomes key to reducing errors. Note that the query depends on a, which is listed first with the CTE, but as a derived table it is listed inside the query. CTEs allow you to logically order queries that are dependent on others.

Recursive CTEs are a lot more interesting! Their function exceeds simple syntactic sugar. Once you add the recursive keyword, Postgres will now allow you to reference your CTE from within itself! This allows you to “generate” rows based on recursion.

Here’s an example using Postgres CTEs to implement the Fibonacci sequence:

Running this, we’d get the first 10:

It's probably not too often we find ourselves in a situation where we need Fibonacci numbers in SQL. Let’s continue on and take a look at a real problem I solved using this Postgres feature.

In the manufacturing industry, a bill of materials (or BoM) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an end product … BOMs are of hierarchical nature, with the top level representing the finished product which may be a sub-assembly or a completed item. BOMs that describe the sub-assemblies are referred to as modular BOMs

More simply, a BoM is a list of parts that a product is assembled from. Each part may consist of other parts, creating hierarchy. A car is a final product that is composed of many parts. A door is an example of a part of a car, its subparts being among a window, handle, various switches, and so forth. Recursive CTEs are great for hierarchical data! Let’s take a look at how we could model this hierarchy and use CTEs to generate a modular BoM sheet.

part table stores our parts.

part_hierarchy table stores a simple parent to child reference between parts, as well as a quantity if the part happens to need more than 1 of the child part for assembly. The check constraints prevent a part from referencing itself circularly.

Let's insert some parts

And some relations, approximately resembling the car example mentioned above

In our database we now have 5 parts and some relationships between them. If we wanted to see how many direct subassemblies a door has, we could query something like this, though it would only show us direct descendants:

Finally, let's now construct a query to list all of the parts and their subassemblies of our car.

The union all is the key part to this CTE. The first part of this UNION selects the base part, the second half selects any child parts by joining to the part_hierarchy table and calling itself (inducing recursion), then combines those results with the parent via the union. Without recursion, you'd have to add a join for every level of the hierarchy you needed to support.

The level value is introduced here to track how deep the recursion has run. It is hard coded at 0 to represent the parent. As the query "unwraps" and it calls itself, this increases by 1 to indicate how many generations far from the parent part the child part is.

The very last part that selects from the CTE pads the name with spaces based on the level, which is a quick and dirty way to visualize the hierarchy.

From here, you could use aggregate functions and rollup to calculate the costs of specific assemblies.

This example is definitely simplified from what I ended up using to actually solve this problem. Supporting part versioning and branching and other business requirements quickly complicates what this solution would look like in the real world. Recursive CTEs in SQL are not the only way to store and query hierarchical data, but for certain data types, sizes, and speed requirements, they are a quick and convenient feature to have in your wheelhouse.

Spread the word

Keep reading