Have you ever found yourself pulling out your hair trying to debug a SQL query that “should be working”?
These errors can be difficult to debug because they often involve a specific nuance in how a SQL command works. In this article, I’ll cover 7 of the most common mistakes that I’ve experienced and seen when writing SQL queries, and how to fix them.
If you're looking for a modern SQL editor to run your SQL queries, try Arctype today for free.
1. Using NOT IN with NULL values
Let's create two example tables to track the color of different products:
How would you write a query to retrieve every color that is not used by a current product?
At first glance,
NOT IN seems to be a great solution, but let's see what happens:
SELECT col.color_value FROM Colors AS col WHERE col.color_id NOT IN (SELECT prod.color_id FROM Products AS prod);
The query ends up returning an empty set instead of
white like we would expect.
This happens because SQL uses three-value logic, which means logical expressions can be true, false, or unknown.
In the example above,
color_id NOT IN (c001, c003, c004, NULL) can be simplified to:
NOT (color_id = c001 OR color_id = c003 OR color_id = c004 OR color_id = NULL).
In order for this to be true, every comparison inside the parentheses would have to be false, but
color_id = NULL always returns
unknown. Therefore, this expression is never evaluated to
TRUE, and an empty set is returned.
We can fix this by using
EXISTS instead of
EXISTS uses two-valued logic so there is no unknown value:
SELECT col.colour_value FROM Colours AS col WHERE NOT EXISTS (SELECT prod.product_colour FROM Products AS prod WHERE col.color_id = prod.color_id);
2. Incorrect order when writing queries
This is one of the more popular errors I see, especially among SQL beginners.
SQL has a predefined keyword order to properly execute queries. Can you spot the error in this SQL query?
SELECT empName FROM employees WHERE empCategory='DevOps' ORDER BY empName GROUP BY branchCode HAVING count(*) = 1;
To fix the query, we just have to move the
order by command to the end:
SELECT empName FROM employees WHERE empCategory='DevOps' GROUP BY branchCode HAVING count(*) = 1 ORDER BY empName;
This is often one of the most common mistakes after missing quotes, brackets, commas, and semicolons, but it's also one of the easiest to catch and fix.
3. Using UNION when UNION ALL should have been used
UNION ALL is common enough that I've seen it come up in several SQL interviews.
UNION ALL and
UNION both concatenate results from multiple SQL queries, but
UNION eliminates duplicate results when concatenating results.
Let's say we had a table of all basketball players at a high school,
basketball_players, and also a separate table for students in band,
band_students. Which command would be use to find every student who played basketball OR was in band?
UNION would be the right choice here, because it would remove duplicates for students who played basketball and were in band.
4. Using BETWEEN for exclusive ranges
BETWEEN operator is helpful for selecting data within a specified range.
What dates would this return?
SELECT * FROM products WHERE manuf_date BETWEEN ‘2020-04-01’ AND ‘2020-05-01’
If you said every day in April + the first day of May, you would be correct!
To select data truly between two dates, the most clear method is explicitly declaring the range using equality comparisons:
SELECT * FROM products WHERE manuf_date => ‘2020-04-01’ AND manuf_date < ‘2020-05-01’
5. Implicit field definition type conversion at runtime
Do you see anything wrong with this query?
SELECT * FROM myAccount acc WHERE acc.pin = 123654789286 AND acc.isPending IS null;
It depends on the data type of
pin. If pin is of type
varchar, this comparison would force a conversion for every value in the pin column. This would not give a runtime error, but it could severely impact the query performance.
SQL does an implicit conversion when comparing two values of different data types. The conversion of these values adds additional overhead and forces a full table scan.
To remove the implicit conversion, we can change the comparison to have matching data types:
acc.pin = '123654789286'
6. Misusing loops in SQL / Cursors
This is another situation that won't throw an explicit error, but can have a negative impact on query performance.
Loops are one of the go to tools in a developer's tool belt, so it's fair to want to use this same technique when writing SQL queries.
Let’s see an example of using a cursor to update the prices of items put on sale:
DECLARE @prodID INT DECLARE @prodPrice INT DECLARE updPrices CURSOR LOCAL STATIC READ_ONLY FOR SELECT prodID, prodPrice FROM onSalePrices OPEN updPrices FETCH NEXT FROM updPrices INTO @prodID, @prodPrice WHILE @@FETCH_STATUS = 0 BEGIN UPDATE products SET price = @prodPrice WHERE id = @prodID FETCH NEXT FROM updPrices INTO @prodID, @prodPrice END CLOSE updPrices DEALLOCATE updPrices
This solution works, but it is not very efficient and should generally be avoided. The reason for that is SQL is designed as a ‘set-based’ architecture. By following that set-based approach instead of cursors, you could accomplish the same task with a simpler and more efficient query.
Making mistakes is the first step to learning something new. After 5 years of software development, I still find myself scratching my head on certain SQL errors.
Having a good grasp of common "gotchas" with SQL will make it easier to avoid such possible issues. If you want a SQL editor that makes working with databases even easier, try out Arctype today: