Description:
These two queries are logically equivalent because they implement the same filtering logic, albeit with different expressions.
In the original query, filtering is done using NOT IN along with a series of IS NOT NULL conditions. The query selects rows from the lineitem table where l_shipmode is not in the subquery results and ensures that both l_shipmode and l_commitdate are not NULL.
In the rewritten query, the same logic is achieved using the EXCEPT operator. The first part of the query retrieves rows where l_shipmode IS NOT NULL and l_commitdate IS NOT NULL, while the second part selects rows where l_shipmode is in the subquery results, also combined with IS NOT NULL conditions. The EXCEPT operator removes the rows in the second query from the first, producing the same result as the original query.
The two SQL queries are logically equivalent, but they return different results, indicating the presence of a bug.
How to repeat:
This link provides the exported sql file for the database:
https://github.com/LLuopeiqi/newtpcd/blob/main/mysql/tpcd.sql
--original
SELECT DISTINCT l_shipinstruct, l_extendedprice, l_comment
FROM lineitem
WHERE l_shipmode NOT IN (
SELECT l_commitdate
FROM lineitem
) and l_shipmode is not null
and l_commitdate is not null
return 0 row
--rewritten
SELECT DISTINCT l_shipinstruct, l_extendedprice, l_comment
FROM lineitem
WHERE l_shipmode IS NOT NULL
AND l_commitdate IS NOT NULL
EXCEPT
SELECT DISTINCT l_shipinstruct, l_extendedprice, l_comment
FROM lineitem
WHERE l_shipmode IN (
SELECT l_commitdate
FROM lineitem
)AND l_shipmode IS NOT NULL
AND l_commitdate IS NOT NULL;
return 5905 rows