Why Monitor PostgreSQL Database Health?
Just like monitoring our applications, it is very much necessary to monitor our database system. We want to monitor things substantially more granular than system-level processes—things such as:
- How many queries actually make use of the index?
- How effective is the database cache?
- The number of open connections.
And so on.
Hardware metrics are essential, but can only get us so far, and will not help us make concrete decisions—like which queries to optimize and how we are using our database in general. Since PostgreSQL is an open-source database, it offers a lot of insights for free without having to rely on costly tools. Most of the metrics we are going to track are available by default since they are required for the query planner, but there are other critical areas for which we have to enable certain extensions/contrib modules that ship along with PostgreSQL.
Introduction to the Arctype Tool
Arctype is a very neat database tool that can be used not just as an SQL editor but also as a platform upon which complex dashboards can be both custom-built and shared around with developers/users. To understand its true power, we are going to consider a practical use case—monitoring a PostgreSQL database health.
Getting Started with Arctype Dashboards
To get ourselves familiar with the dashboard capabilities we are going to build a very simple dashboard with just three components. The Arctype dashboard is built right into the Arctype tool itself. To access the functionality, just click on
Dashboards button as shown below.
Click on the plus icon and create a dashboard named -
Postgres Health Monitoring.
Now that the dashboard is created, let's create some components.
Creating a List of PostgreSQL Tables from Information_Schema
Adding a table component/chart component is covered in one of our previous articles. Drag and drop the
Table into the empty dashboard pane. Next, we will use a simple query that talks to the underlying database (PostgreSQL) and gets all of the tables with their schemas present excluding the
information_schema and the system tables.
SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema <> 'information_schema' AND table_name NOT LIKE 'pg_%' ORDER BY table_schema, table_name;
Visualizing Row Counts of Tables in a Database Schema
Let's add a Chart component (similar to table) next to our table component which uses the below query:
SELECT relname as table_name, reltuples as rows FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C .relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND relkind = 'r' ORDER BY reltuples DESC;
This query simply the already assembled statistics for all the tables in our databases. In the right pane, select
table_name for X-Axis, and
rows for Y-Axis.
The Initial Dashboard
After adding all of the above steps, our dashboard will look like this. NOTE: adding images to dashboards has been deprecated, although if this is a feature you need please let us know in the Arctype chat server.
At this point, you should have a decent feel for dashboards, so now, let's build a real health dashboard and see how it can help us visualize our database health.
Identifying Critical Areas in the PostgreSQL Database
PostgreSQL has a lot of areas upon which we can focus in order to understand the database's health. In the below sections, we will see a handful of them and demonstrate how you can use Arctype to visualize them.
Monitoring Average Postgres Query Performance
For this part, we need to enable the
pg_stat_statements extension. It is a built-in extension/contrib module, and in some cloud providers, it is enabled by default. If not, it can be easily enabled using the
shared_preload_libraries. This extension helps us capture a ton of information regarding query performance. Using this extension, we are going to build a query that outputs the query performance based on the number of rows the query has scanned:
SELECT mean_time as "timing in ms", rows FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100;
Limiting the rows and doing the
order by gives us the top 100 most consumed queries and the
rows column indicates the total number of rows retrieved or affected by the statement. We can also get the top 10 worst queries by running time using the above table. This can be used as an alternative for slow log analysis, as this is more real-time.
Querying to Check PostgreSQL Open Connections
It is always a good idea to monitor how many open connections are currently present in our database cluster. Postgres has a fairly low limit and outside tools like pgBouncer may be needed to pool connections together. We can get this from the
SELECT COUNT(*) as connections, backend_type FROM pg_stat_activity GROUP BY backend_type ORDER BY connections DESC
Determining the Size of a PostgreSQL Database
As our database expands over time, it is important to keep the database size in check. We can get a list of databases and their respective sizes using the below query:
SELECT datname as database_name, pg_database_size(datname)/1024/1024 as size FROM pg_database WHERE datistemplate = false;
We could use the
pg_size_pretty instead of calculating the MB directly, but we want to plot this as a bar chart to see and compare the sizes. Keeping tabs on individual database sizes helps us to get an overall picture. Once the size of your database starts to grow, you can apply techniques like TOAST to regain control of your storage utilization.
Note: These sizes can vary from disk size since PostgreSQL does a lot of compression behind the scenes.
Calculating Database Cache-Hit Ratios in PostgreSQL
Any relational database/operational database serves most of its traffic from its cache. If that's not happening, we definitely need to look into it. The PostgreSQL cache area is known as
shared_buffers, which acts as the database layer cache on top of the cache provided by the operating system. We can use the below query to understand the cache hit ratio:
SELECT sum(heap_blks_read) as reads, sum(heap_blks_hit) as hits, ROUND( sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)), 4 ) as hit_ratio FROM pg_statio_user_tables;
From the query plan blog, there are a couple of tuples that are coming from heap read and heap hits. The
pg_statio_user_tables has the information regarding this. In the above query, we are simply getting the values of hits (from the cache) and reads (from the disk) and calculating the ratio rounded off to four digits. It is important to note that in a healthy production database, this ratio should come to around 97+ or close to it—97% of the hits are coming from the cache.
Scanning for Unused Indexes
The statistics collector in PostgreSQL records which indexes were accessed how frequently. We can construct a query from one of the statistics tables called
SELECT COUNT(*) as count, relname as table_name FROM pg_stat_all_indexes WHERE idx_scan = 0 and schemaname = 'public' GROUP BY table_name
This kind of gives us an indication of tables that have the greatest number of unused indexes. Warning! Don't remove unused indexes blindly. Check to see if they are in use by any replicas before taking action. The
idx_scan=0 gives us the indexes that were never used even once. We can customize this value depending on our database traffic pattern.
Inspecting PostgreSQL Database Caches
In order to get more insight out of our cache, we need to enable another contrib module called
pg_buffercache. After enabling it and running a few queries we can now inspect the cache:
SELECT c .relname AS entity_name, count(*) AS buffers FROM pg_buffercache b INNER JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c .oid) AND b.reldatabase IN ( 0, ( SELECT oid FROM pg_database WHERE datname = current_database() ) ) WHERE c .relname NOT LIKE 'pg_%' GROUP BY c .relname ORDER BY 2 DESC;
This query was taken from the official documentation and then modified a little bit. It shows how many pages in
shared_buffers are occupied by different tables and indexes. You can assume pages as a metric directly proportional to the number of rows—the larger table being queried, the more pages get moved to the cache. This can also mean that some tables can be moved to an application cache or a fast key-value store, such as Redis, for faster querying—and to free up the database cache for indexes.
Building a Dashboard with PostgreSQL Metrics
If we combine all of the queries in the mentioned chart types, it should look like a proper analytics dashboard that can be easily visualized/consumed:
The mapping for each of the components is shown in the table below. Arctype is not limited to the below charts and has an incredibly feature-rich platform upon which we can represent our visualization.
|Open Connections||Pie Chart|
|Cache Hit Ratio||Normal Table|
|Unused Indexes||Bar Chart|
|Database Size||Bar Chart|
|Shared Buffers||Bar Chart|
|Query Response Times||Scatter Chart|
Each widget here has its own description and the graphs have details on what is present in X-Axis and Y-Axis. It is important that the dashboard is self-explanatory, after all, a picture is worth a thousand words. These charts might look different depending upon the underlying data. For example, we chose a pie chart for Open Connections but we may use a bar chart because most often we will care about the total number of connections rather than their source. They can be expanded and reduced by simply dragging their borders while in edit mode, which can be very useful if the dataset is larger—charts can overflow.
The above dashboard was created in an example database that I had used before. Let's create the dashboard over a more realistic database such as the
Covid DB one that comes inbuilt with Arctype (available during the Signup flow). Since the dataset is huge, I have created the dashboard in two parts:
As we have more data points, the dashboard looks a lot richer and more meaningful. The goal is not to be a "silver bullet" solution, but rather to provide some perspective as to potential areas of focus for an operational database. There are a lot of tools out there that offer different functionalities, but at the end of the day, I believe database developers should know their databases in more depth. What's better than a tool that easily lets us translate queries into really nice charts and graphs with almost no learning overhead? Answer: probably not much.
Insights and Intelligence from the Health Dashboard
This dashboard allows us to monitor the database in real-time and helps us to more fully understand some features which are unique to PostgreSQL. In doing database optimizations, it is always a rule that "you cannot improve what you cannot measure," and this dashboard solves the first (yet often overlooked) step, which is actually measuring things. The
Covid DB demo database has a 99% cache hit rate, which means that almost all of the traffic is served from the cache (shared_buffers) and will respond very quickly. This is one good sign of a well-optimized database.
Monitoring our database during peak traffic times will give us a lot of insight into which areas can be improved and how to measure this improvement. Some of the ways in which we can improve database performance based upon these metrics are:
- Using a client-side/server-side connection pool.
- Code optimizations to release connections back to the pool/close the connection once the work is done.
- Doing proper database modeling and splitting the traffic between different clusters.
- Increasing the size of the shared buffers to accommodate more data/indexes in memory.
- Optimizing poorly written queries by either adding an index or re-writing them entirely.
- Removing unused indexes to speed up writes.
Whatever we have seen so far is only the tip of the iceberg. Database maintenance and improvement is an art of its own, and one which truly takes several years of hands-on experience. But is always important to use a toolchain that is friendly, cost-effective, and can help us achieve our goal easily.
Keeping the Dashboard Up-to-Date
After building it, it is important to keep the dashboard up-to-date. This can be done using various ways depending on our needs.
Manually Refreshing Arctype Dashboards
At the top right beside the Edit Mode button, there is a reload button, which we can click to reload the entire dashboard.
Automatically Refreshing the Entire Dashboard
We can also configure the whole dashboard to be refreshed periodically.
Refreshing Individual Components Automatically
By clicking on an individual component and clicking on
Advanced, we can configure the component in question to be refreshed at specified and automated intervals.
We can choose what kind of refresh type we want, but keep in mind that the queries to monitor the database itself may be resource-intensive and can therefore degrade performance.
Sharing the Dashboard (and Closing Thoughts)
Another very important feature that is missing amongst many database tools, but one that is present in Arctype, is the ability to share dashboards.
By clicking the
Share button on the right top screen in the dashboard viewer, we get various options to share.
Sharing an Arctype Workspace via Email
This would invite the user to be part of the workspace. For more advanced controls, check out the teams functionality.
Linking to Your Workspace
This is the same as above, except that anyone with the link can join the workspace.
Creating a Public Read-Only Link
This will send a read-only link that can be viewed in the browser as well. This can be very handy for sharing dashboards with different (perhaps non-technical) stakeholders on a team.
Note: You cannot share a locally-hosted database (or any affiliated dashboards)
These dashboards are incredibly useful when:
- Running a performance test.
- Monitoring a live database for a specified period.
- General database monitoring.
Fine-grained control provided by Arctype makes this dashboard easily shareable across colleagues, teams, and even to the general internet. Go ahead and download Arctype if you haven't already, and start creating these dashboards—knowing database health and monitoring them should no longer be limited to certain tools or people!