Description:
When executing the following two logically equivalent queries in the same database environment, the results are inconsistent:
SQL-SOURCE uses COUNT(*) aggregation and returns 0.
SQL-TARGET uses SUM(CASE WHEN EXISTS(...) THEN 1 ELSE 0 END) and returns 5.
Both queries share the same main logic:
They involve users, orders, and posts tables with LEFT JOIN and INNER JOIN.
Both contain an EXISTS subquery checking conditions on orders and posts.
SQL-TARGET wraps the EXISTS subquery in a CASE WHEN statement, while SQL-SOURCE directly uses COUNT(*).
Observed Behavior:
SQL-SOURCE’s COUNT(*) does not count any matching rows, resulting in 0.
SQL-TARGET’s SUM(CASE WHEN EXISTS...) counts 5 rows where the EXISTS condition evaluates to true, resulting in 5.
Logically, both queries should return the same row count. The discrepancy suggests that the query optimizer may handle EXISTS or JOIN conditions differently, leading to inconsistent results.
How to repeat:
```SQL
-- SCHEMA
CREATE TABLE users (
id INT,
username VARCHAR(100),
email VARCHAR(255),
age INT,
status VARCHAR(20),
created_at TIMESTAMP NULL,
score DOUBLE
);
CREATE TABLE posts (
id INT,
user_id INT,
title VARCHAR(255),
content VARCHAR(1000),
views INT,
likes INT,
created_at TIMESTAMP NULL,
rating DOUBLE
);
CREATE TABLE orders (
id INT,
user_id INT,
amount DOUBLE,
status VARCHAR(20),
created_at TIMESTAMP NULL
);
INSERT INTO users VALUES
(1, 'alice', 'alice@test.com', 20, 'active', '2022-01-01 10:00:00', 88.5),
(2, 'bob', 'bob@test.com', 30, 'active', '2022-01-02 11:00:00', 92.3),
(3, 'carol', NULL, NULL, 'banned','2022-01-03 12:00:00', NULL),
(4, 'dave', 'dave@test.com', 45, 'active', '2022-01-04 13:00:00', 65.2),
(5, NULL, 'null@test.com', 18, 'inactive','2022-01-05 14:00:00', 70.0);
INSERT INTO posts VALUES
(1, 1, 'Hello World', 'First post', 100, 10, '2022-01-10 10:00:00', 4.5),
(2, 1, 'Another Post', NULL, 150, 20, '2022-01-11 11:00:00', 3.0),
(3, 2, 'Bob Post', 'Content', NULL, 5, '2022-01-12 12:00:00', NULL),
(4, 3, NULL, 'Empty', 50, 2, '2022-01-13 13:00:00', 5.0),
(5, 4, 'Last Post', 'Last', 300, 30,'2022-01-14 14:00:00', 4.9);
INSERT INTO orders VALUES
(1, 1, 100.00, 'paid', '2022-02-01 09:00:00'),
(2, 1, 200.50, 'shipped', '2022-02-02 10:00:00'),
(3, 2, NULL, 'failed', '2022-02-03 11:00:00'),
(4, 3, 50.00, 'paid', '2022-02-04 12:00:00'),
(5, 5, 999.99, 'paid', '2022-02-05 13:00:00');
-- TRIGGER SQL:
SELECT COUNT(*)
FROM (
SELECT
ref_0.email AS c0
FROM
users AS ref_0
WHERE
TRUE
) AS subq_0
LEFT JOIN (
SELECT
ref_1.score AS c0,
ref_1.age AS c1
FROM
users AS ref_1
WHERE
TRUE
) AS subq_1
INNER JOIN orders AS ref_2
ON EXISTS (
SELECT 1
FROM users AS ref_5
WHERE TRUE
)
ON subq_0.c0 = ref_2.status
WHERE EXISTS (
SELECT 1
FROM orders AS ref_6
RIGHT JOIN orders AS ref_7
LEFT JOIN posts AS ref_8
ON ('3oq' < '6668ey')
LEFT JOIN orders AS ref_9
ON (23.69 < (SELECT STDDEV_SAMP(id) FROM orders))
ON (ref_6.status IS NULL)
WHERE TRUE
);
-- result: {0}
SELECT SUM(
CASE
WHEN EXISTS (
SELECT 1
FROM orders AS ref_6
RIGHT JOIN orders AS ref_7
LEFT JOIN posts AS ref_8
ON ('3oq' < '6668ey')
LEFT JOIN orders AS ref_9
ON (23.69 < (SELECT STDDEV_SAMP(id) FROM orders))
ON (ref_6.status IS NULL)
WHERE TRUE
)
THEN 1
ELSE 0
END + 0
)
FROM (
SELECT ref_0.email AS c0
FROM users AS ref_0
WHERE TRUE
) AS subq_0
LEFT JOIN (
SELECT ref_1.score AS c0,
ref_1.age AS c1
FROM users AS ref_1
WHERE TRUE
) AS subq_1
INNER JOIN orders AS ref_2
ON EXISTS (
SELECT 1
FROM users AS ref_5
WHERE TRUE
)
ON subq_0.c0 = ref_2.status;
-- result: {5}
```
Description: When executing the following two logically equivalent queries in the same database environment, the results are inconsistent: SQL-SOURCE uses COUNT(*) aggregation and returns 0. SQL-TARGET uses SUM(CASE WHEN EXISTS(...) THEN 1 ELSE 0 END) and returns 5. Both queries share the same main logic: They involve users, orders, and posts tables with LEFT JOIN and INNER JOIN. Both contain an EXISTS subquery checking conditions on orders and posts. SQL-TARGET wraps the EXISTS subquery in a CASE WHEN statement, while SQL-SOURCE directly uses COUNT(*). Observed Behavior: SQL-SOURCE’s COUNT(*) does not count any matching rows, resulting in 0. SQL-TARGET’s SUM(CASE WHEN EXISTS...) counts 5 rows where the EXISTS condition evaluates to true, resulting in 5. Logically, both queries should return the same row count. The discrepancy suggests that the query optimizer may handle EXISTS or JOIN conditions differently, leading to inconsistent results. How to repeat: ```SQL -- SCHEMA CREATE TABLE users ( id INT, username VARCHAR(100), email VARCHAR(255), age INT, status VARCHAR(20), created_at TIMESTAMP NULL, score DOUBLE ); CREATE TABLE posts ( id INT, user_id INT, title VARCHAR(255), content VARCHAR(1000), views INT, likes INT, created_at TIMESTAMP NULL, rating DOUBLE ); CREATE TABLE orders ( id INT, user_id INT, amount DOUBLE, status VARCHAR(20), created_at TIMESTAMP NULL ); INSERT INTO users VALUES (1, 'alice', 'alice@test.com', 20, 'active', '2022-01-01 10:00:00', 88.5), (2, 'bob', 'bob@test.com', 30, 'active', '2022-01-02 11:00:00', 92.3), (3, 'carol', NULL, NULL, 'banned','2022-01-03 12:00:00', NULL), (4, 'dave', 'dave@test.com', 45, 'active', '2022-01-04 13:00:00', 65.2), (5, NULL, 'null@test.com', 18, 'inactive','2022-01-05 14:00:00', 70.0); INSERT INTO posts VALUES (1, 1, 'Hello World', 'First post', 100, 10, '2022-01-10 10:00:00', 4.5), (2, 1, 'Another Post', NULL, 150, 20, '2022-01-11 11:00:00', 3.0), (3, 2, 'Bob Post', 'Content', NULL, 5, '2022-01-12 12:00:00', NULL), (4, 3, NULL, 'Empty', 50, 2, '2022-01-13 13:00:00', 5.0), (5, 4, 'Last Post', 'Last', 300, 30,'2022-01-14 14:00:00', 4.9); INSERT INTO orders VALUES (1, 1, 100.00, 'paid', '2022-02-01 09:00:00'), (2, 1, 200.50, 'shipped', '2022-02-02 10:00:00'), (3, 2, NULL, 'failed', '2022-02-03 11:00:00'), (4, 3, 50.00, 'paid', '2022-02-04 12:00:00'), (5, 5, 999.99, 'paid', '2022-02-05 13:00:00'); -- TRIGGER SQL: SELECT COUNT(*) FROM ( SELECT ref_0.email AS c0 FROM users AS ref_0 WHERE TRUE ) AS subq_0 LEFT JOIN ( SELECT ref_1.score AS c0, ref_1.age AS c1 FROM users AS ref_1 WHERE TRUE ) AS subq_1 INNER JOIN orders AS ref_2 ON EXISTS ( SELECT 1 FROM users AS ref_5 WHERE TRUE ) ON subq_0.c0 = ref_2.status WHERE EXISTS ( SELECT 1 FROM orders AS ref_6 RIGHT JOIN orders AS ref_7 LEFT JOIN posts AS ref_8 ON ('3oq' < '6668ey') LEFT JOIN orders AS ref_9 ON (23.69 < (SELECT STDDEV_SAMP(id) FROM orders)) ON (ref_6.status IS NULL) WHERE TRUE ); -- result: {0} SELECT SUM( CASE WHEN EXISTS ( SELECT 1 FROM orders AS ref_6 RIGHT JOIN orders AS ref_7 LEFT JOIN posts AS ref_8 ON ('3oq' < '6668ey') LEFT JOIN orders AS ref_9 ON (23.69 < (SELECT STDDEV_SAMP(id) FROM orders)) ON (ref_6.status IS NULL) WHERE TRUE ) THEN 1 ELSE 0 END + 0 ) FROM ( SELECT ref_0.email AS c0 FROM users AS ref_0 WHERE TRUE ) AS subq_0 LEFT JOIN ( SELECT ref_1.score AS c0, ref_1.age AS c1 FROM users AS ref_1 WHERE TRUE ) AS subq_1 INNER JOIN orders AS ref_2 ON EXISTS ( SELECT 1 FROM users AS ref_5 WHERE TRUE ) ON subq_0.c0 = ref_2.status; -- result: {5} ```