There is no SQL standard. Well, there is but no database fully implements it. Become familiar with the wide variety of SQL dialects to get the most out of your queries on each system.
What's a SQL dialect?
SQL has a long history, filled with anecdotes. Before we learn about dialects we need to understand how SQL itself evolved. If you're not too familiar with this history, you might want to check out the detailed article below that explains how SQL came to be and its significance in data processing:
The word "dialect" means a variant of a language. In the real world, there are around 160 dialects of the English language. Just as human languages evolved over time, SQL has also changed over the year, spawning its own dialects.
Dialects exist for their own special purposes. For example, the PostgreSQL dialect
PL/pgSQL was created to incorporate PostgreSQL-specific syntax, which might not make sense in other database systems. Regardless of these dialects, many databases have common syntax due to the
ANSI SQL specification. Similar to JDBC, REST, and many others,
ANSI SQL was created so that vendors (both open source and commercial) can co-operate in a meaningful way.
Why are SQL dialects important to understand?
Speaking from experience, it is rare to write SQL that will perfectly work across many database platforms. There will be variations that exist among different databases. This makes it all the more important to learn dialects of one or more database systems, just like how learning more than one programming language helps us become better developers.
Structure of an SQL query
The word "structured" is pretty evident in the name
Structured Query Language itself. The structured nature of SQL is the reason it became so popular and widely adopted. SQL can be roughly split into sections as shown below:
- Data Query Language.
- SQL queries that fetch data from tables.
- Data Definition Language.
- SQL queries that alter the structure of tables.
- Data Manipulation Language.
- SQL queries that alter data present in tables.
- Data Control Language.
- SQL to Grant/Revoke user permissions.
- Transaction Control Language.
- SQL statements to control transactions.
A typical SQL DQL query consists of
FROM and a
WHERE clause. There are optional
GROUP BY clauses as well.
Most queries, even in NoSQL database systems, more or less follow the above structure. In addition to this, some database systems have something called "procedural SQL," which is a lot more powerful. When we are talking about dialects it means we are talking about the SQL features as a whole, which will include SQL + Custom Functions + Procedural SQL.
Aside: From a language perspective, SQL has a fairly simple grammar when compared to programming languages. Nevertheless, you might want to do some more analysis on a query from time to time. If you are interested in parsing and understanding SQL queries, check out the ANTLR tool.
Different dialects in SQL and examples
Terms we generally know already, like
ORDER BY, are just the basics that most software developers use in day-to-day development. Let's take a look at how some of the most famous dialects handle these basics and some custom examples around them.
PL/Pg SQL is the SQL dialect that the PostgreSQL database system uses. Unlike normal SQL, PL/PgSQL is Turing complete, i.e it has the power of a full programming language. A basic structure of a PL/Pg SQL module will look like the example below:
Most procedural code will be declared as functions and can be called from regular SQL queries. Here is an example that fetches a user id from the username, taken from the documentation itself:
This function can then be called as follow:
This is just one simple example. There is a world of possibilities that exists with these powerful functions. The full documentation/guide can be found here.
TSQL or Transact SQL is an SQL dialect for the Microsoft SQL Server database. Instead of looking at a function example as we did for PostgreSQL, we will take an example for a normal DQL query.
Let's assume that we have a table called
sales and we want to get the top 10 sales by
amount. A normal SQL query for it would look like -
This same query can be simplified a lot by using
We might think this is a small improvement but we can add more complexity to this by selecting the top 2% of the rows.
You can explore more about the
TOP query and other proprietary syntax offered by SQL Server in the documentation.
TSQL is also Turing Complete similar to
PL/Pg SQL. There is even an example of a language interpreter being completely written in
TSQL, although the language itself is not as complicated as a full-blown programming language.
PL/SQL is Oracle's proprietary database procedural language. It works very similar to other procedural languages. A base structure for this is as follows,
Let's take a look at an example where this procedure calculates users that have salaries higher than
We are declaring the variables and then selecting the values of them from the
users table. This can then be wrapped into a function and then called in a manner similar to our
PL/Pg SQL example.
PL/SQL also qualifies as a Turing complete language and is popularly used in financial institutions, where entire business applications are written on top of it.
These are real examples, but you can syntactical differences in this great answer by Prolay Chaudhury on Quora. Now that you have a flavor of the different dialects, let's compare and contrast them.
Which SQL dialect should I learn? Consider the following...
The major differences and similarities between these dialects are listed below.
PostgreSQL is open source, while two are proprietary. This can be a deciding factor when deciding on database systems. The majority of open-source software has extensive documentation and a broad user base, while proprietary software has better features and support. With that said, even open-source databases like PostgreSQL have enterprise support and features.
PL/Pg SQL tends to be polyglot in nature, i.e its users live in various programming language communities. On the other hand, TSQL is heavily used by the Microsoft developer community and, similarly, PL/SQL developers typically inhabit the Oracle developer community. These considerations tend to affect which dialect developers choose.
Many of these dialects have been in the industry for a long time. While syntax and user experience are subjective, the feature sets between these dialects/databases tend to overlap a lot. If you are deciding on this factor alone, then it is very hard to distinguish between the different dialects, since most support the vast majority of use cases.
There are no explicit performance differences between the dialects, but they can vary depending upon how optimized our code is and the amount of data. Most of these functions get compiled to native
C code and hence do not incur any cost as far as compilation is concerned.
Lists of All SQL Dialects
The examples given above are just some of the most famous ones. There are plenty of others that are used in database systems of their own. Below we give more resources compiled by database ORMs like Hibernate, SQL Alchemy, and Microsoft Entity Framework.
Since ORMs interact with the database systems and provide wrappers on top of SQL, they need to understand these dialects from the ground up.
- Here is a list of dialects that Hibernate supports.
- A list supported by SQL Alchemy.
- Finally a list of SQL dialects supported by Microsoft Entity Framework.
The references above are not exhaustive. If you are working on a database platform that is specialized, then it might have a dialect of its own. Be sure to look at the documentation of the database you are using.
A note: SQL in the big data world
When the NoSQL revolution began, a lot of developers were beating the drums on
Death to SQL. But SQL came back with even more force in the big data world. Here are some examples of the SQL dialects that exist with many big data systems.
- Apache Hive takes in a specific SQL dialect and converts it to map-reduce.
- Spark SQL, for use in Spark Big Data processing.
- CQL, for working with Cassandra.
- KSQL, for SQL on stream processing.
And a lot more. I would highly encourage readers to go through this video on SQL in the big data world by Michael Stonebraker:
SQL and its dialects are not just for relational databases. Their use extends far beyond RDBMS.
SQL dialects are beasts of their own. Learning an entire dialect could take years to master. But what we should keep in mind is that syntax basics are all the same. On this front, we should re-think how we learn SQL, and how it is even taught:
Additionally, the Turing completeness of a language should not be taken at face value. Some dialects could even replace traditional programming languages. There are plenty of situations where you might want to write business logic in database systems simply because it is a lot easier to maintain and execute. There are slight performance advantages as well since the data never leaves the database cluster and is executed without any network overhead.
SQL is very powerful. Hopefully, this blog has made you aware of the most famous dialects that exist and why they do. As database systems evolve, so do the complexity of our queries and, in turn, we'll continue to push the boundaries of the SQL language and of syntax itself.