Bug #104824 Non-deterministic left join is filtering records from the result set
Submitted: 3 Sep 2021 17:56 Modified: 7 Sep 2021 13:16
Reporter: Bob Dusek Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S4 (Feature request)
Version:5.7.34 OS:Any
Assigned to: CPU Architecture:Any

[3 Sep 2021 17:56] Bob Dusek
Description:
We recently converted our 5.6 database to the latest 5.7 release available in AWS RDS (5.7.34). 

We had one of our queries start to return a random number of rows. 

It turns out that there is a left join in the query (actually multiple left joins) that has a subquery to satisfy part of the "ON" clause.  The subquery has an ORDER BY clause that does not guarantee a deterministic ordering of the records.  For example, the subquery matches 3 records, and two of the records have identical sort criteria. The subquery has a "LIMIT 1" clause so, due to the non-deterministic ORDER BY, the subquery potentially returns a different record each time it is run. 

The non-deterministic behavior of the subquery is our fault, and we are fixing the query to be deterministic. I will note that MySQL version 5.6.51 does not exhibit non-deterministic behavior for this sort of query, and so we did not see any issues with the query until we converted to MySQL 5.7.34.

The problem I'm reporting here is that the left join is removing records from our result set when there is non-deterministic behavior, even though nothing from ​the left join table makes its way into the WHERE clause of the query. 

From my perspective, a left join should never remove rows.  For example, let's look at a simple query joining two tables (see recreate section for definition):

SELECT tableA.id as tableAId, COALESCE(tableB.id, -1) as tableBId
FROM tableA LEFT JOIN tableB ON tableB.tableAId=tableA.id;

That query should always return EVERY record from tableA. Sometimes, the tableBId column will be -1, but you will always see all records from tableA.

Furthermore, here is another example:

SELECT tableA.id as tableAId, COALESCE(tableB.id, -1) as tableBId
FROM tableA LEFT JOIN tableB ON (tableB.tableAId=tableA.id AND 1=0);

That query should always return every record from tableA, and the tableBId column will be -1 for every record.

And, the final example:

SELECT tableA.id as tableAId, COALESCE(tableB.id, -1) as tableBId
FROM tableA LEFT JOIN tableB ON (tableB.tableAId=tableA.id AND 0.5 > rand());

That query should also always return every record from tableA, but randomly you would get -1 for tableBId, even when there is a record for which tableB.tableAId=tableA.id

However, this is where the bug seems to be. That query does not return every record from tableA in MySQL 5.7.34. It also does not return every record from tableA in MySQL 5.6.51.

Se recreate steps submitted below in the form.

How to repeat:
CREATE TABLE tableA (
  id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(10)
);

CREATE TABLE tableB (
  id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
  tableAId INTEGER NOT NULL,
  name VARCHAR(10),
  CONSTRAINT tableBFKtableAId FOREIGN KEY (tableAId) REFERENCES tableA (id)
);

INSERT INTO tableA (name)
VALUES ('he'),
       ('she'),
       ('it'),
       ('they');

INSERT INTO tableB (tableAId, name)
VALUES (1, 'hat'),
       (2, 'shoes'),
       (4, 'roof');

# run the following query many times and you will see varying numbers of records
SELECT COALESCE(b.id, -1) AS tableBId,
       a.id               AS tableAId
FROM tableA a
         LEFT JOIN tableB b ON (b.tableAId = a.id AND 0.5 > RAND());
[6 Sep 2021 13:30] MySQL Verification Team
Hi Mr. Dusek,

Thank you for your bug report.

However, we are not able to repeat your behaviour. We ran your test case hundred of times. Filtering of the records depended very much of the random element that you have introduced. Simply if random number was smaller or equal to 0.5, it can not show up in the result set !!!!

However, when we inspected the output, all the rows were correct. Each pair of values has always showed up in the output. That is what is relevant in the stochastic query that you have devised.

Here is the exceprt from our output :

================================================

t2Id	t1Id
1	1
2	2
-1	3
t2Id	t1Id
2	2
-1	3
3	4
t2Id	t1Id
2	2
3	4
t2Id	t1Id
1	1
t2Id	t1Id
-1	3
t2Id	t1Id
1	1
2	2
-1	3
3	4
t2Id	t1Id
1	1
2	2
t2Id	t1Id
1	1
2	2
-1	3
t2Id	t1Id
1	1
2	2
-1	3
3	4
t2Id	t1Id
1	1
-1	3
3	4
t2Id	t1Id
1	1
t2Id	t1Id
2	2
-1	3
3	4
t2Id	t1Id
-1	3
t2Id	t1Id
1	1
2	2
-1	3
t2Id	t1Id
1	1
-1	3
t2Id	t1Id
2	2
3	4
t2Id	t1Id
1	1
2	2
3	4
t2Id	t1Id
-1	3
t2Id	t1Id
2	2
t2Id	t1Id
1	1
2	2
3	4
t2Id	t1Id
-1	3
t2Id	t1Id
1	1
2	2
3	4
t2Id	t1Id
1	1
3	4
t2Id	t1Id
2	2
t2Id	t1Id
3	4
t2Id	t1Id
1	1
2	2
-1	3
t2Id	t1Id
2	2
-1	3

================================================

We can not repeat anything wrong in the output.
[6 Sep 2021 17:14] Bob Dusek
The excerpt you posted from your output demonstrates the problem.

Why doesn't every result set have all four records from table 1? 

I you run this query 100 times: 

SELECT tableA.id as tableAId, COALESCE(tableB.id, -1) as tableBId
FROM tableA LEFT JOIN tableB ON (tableB.tableAId=tableA.id AND 1=0);

You will get this result set 100 times:

1,-1
2,-1
3,-1
4,-1

And, if you run this query 100 times: 

SELECT tableA.id as tableAId, COALESCE(tableB.id, -1) as tableBId
FROM tableA LEFT JOIN tableB ON (tableB.tableAId=tableA.id AND 0=0);

You will get this result 100 times:

1,1
2,2
3,-1
4,3

Why should this query 100 times:

SELECT tableA.id as tableAId, COALESCE(tableB.id, -1) as tableBId
FROM tableA LEFT JOIN tableB ON (tableB.tableAId=tableA.id AND 0.5 > RAND());

Why wouldn't id's 1, 2, 3 and 4 show up in all 100 results? 

If it doesn't then the concept of a left join is completely broken.
[7 Sep 2021 11:51] MySQL Verification Team
Hi Mr. Dusek,

The answer is in the basic definitions of the SQL language. For this particular case, it is described in the standard in the chapters 4 and 7.

ON clause can take only expressions that define full or semi-join. If there is an expression that does not qualify as join definition expression, then it is treated as a filtering condition. That is to say, it is the same as if you put it in the WHERE clause ........

Not a bug.
[7 Sep 2021 12:27] Bob Dusek
So, why does this query not return an empty set? 

SELECT tableA.id as tableAId, COALESCE(tableB.id, -1) as tableBId
FROM tableA LEFT JOIN tableB ON (tableB.tableAId=tableA.id AND 1=0);

That query returns this EVERY time...

1,-1
2,-1
3,-1
4,-1
[7 Sep 2021 12:42] MySQL Verification Team
Hi Mr. Dusek,

Regarding your last example, you are correct.

That is a bug, but a very small one, since you can easily correct it by setting 1=0 in the WHERE clause.

Verified as a low priority bug.
[7 Sep 2021 13:07] Bob Dusek
The bug seems to be that MySQL is arbitrarily determining how to evaluate join expressions, making it difficult to predict how they will behave in practice.  Is there documentation available regarding how MySQL determines whether part of a join expression is evaluated with the join or moved to the WHERE clause?
[7 Sep 2021 13:16] Bob Dusek
And, unpredictable determination of join logic is NOT a minor bug. It can impact lots and lots of queries for lots and lots of people. The only time such a condition seems to get moved to a WHERE clause (from my experimentation) is when there is non-determinism.  

We have a sub query that became non-deterministic when we converted to 5.7 even though it was stable in 5.6. This subquery started filtering our results from our result set and caused some pretty serious trouble. 

I have not been able to reproduce the issue on a new data set, but we have 5 different servers that exhibit the behavior with a large data set.

We have resolved it by stabilizing the subquery, but if you "fix" something and suddenly move all such join logic to the WHERE clause the implications could be significant for many people.