Description:
Adding a WHERE condition to a (complex) query containing LEFT & RIGHT
outer joins may return incorrect result sets.
The occurrence of the error partly depends on the storage engine being used, but
I assume this is more related to different execution plans being produced for
the different engines.
Repeating the same query in the attached Cluster database results in the expected results being returned for this particular query:
w/ WHERE table3 . `pk` IS NULL
+-----+------+------+------+------+------+------+------+------+
| pk1 | pk2 | pk3 | pk4 | pk5 | pk6 | pk7 | pk8 | pk9 |
+-----+------+------+------+------+------+------+------+------+
| 4 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 9 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 6 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+-----+------+------+------+------+------+------+------+------+
3 rows in set (0.01 sec)
w/ WHERE table3 . `pk` IS NOT NULL
+-----+------+------+------+------+------+------+------+------+
| pk1 | pk2 | pk3 | pk4 | pk5 | pk6 | pk7 | pk8 | pk9 |
+-----+------+------+------+------+------+------+------+------+
| 7 | 7 | 2 | 7 | NULL | NULL | NULL | NULL | 16 |
| 3 | 3 | 13 | 3 | NULL | NULL | NULL | NULL | 15 |
| 10 | 10 | 10 | 10 | NULL | NULL | NULL | NULL | 18 |
| 5 | 5 | 4 | 5 | NULL | 10 | 10 | 10 | 10 |
| 11 | 11 | 6 | 11 | NULL | NULL | NULL | NULL | 20 |
| 1 | 1 | 11 | 1 | NULL | NULL | NULL | NULL | 14 |
| 2 | 2 | 12 | 2 | NULL | NULL | NULL | NULL | 11 |
| 8 | 8 | 7 | 8 | NULL | NULL | NULL | NULL | 19 |
+-----+------+------+------+------+------+------+------+------+
8 rows in set (0.01 sec)
However there are other queries running correctly on MyISAM and incorrectly on Cluster, see further below. The makes it likely to not be a SE related bug but rather incorrect execution plans generated by the optimizer.
==== Another example ===
SELECT table1.pk pk1, table1.int_key i, table2.pk pk2, table3.pk pk3,
table4.pk pk4, table5.pk pk5,
table6.pk pk6, table7.pk pk7, table8.pk pk8, table9.pk
pk9, table10.pk pk10, table11.pk pk11
FROM C AS table1 LEFT JOIN C AS table2 ON table1 . `int_key` = table2 . `int_key` LEFT JOIN C AS table3 ON table1 . `pk` = table3 . `pk` LEFT JOIN BB AS table4 LEFT JOIN CC AS table5 ON table4 . `int_key` = table5 . `pk` RIGHT JOIN BB AS table6 LEFT OUTER JOIN C AS table7 ON table6 . `pk` = table7 . `int_key` LEFT JOIN CC AS table8 LEFT JOIN CC AS table9 LEFT JOIN CC AS table10 ON table9 . `int_key` = table10 . `int_key` ON table8 . `int_key` = table10 . `int_key` LEFT JOIN A AS table11 ON table9 . `pk` = table11 . `pk` ON table6 . `int_key` = table10 . `pk` ON table4 . `int_key` = table10 . `pk` ON table3 . `pk` = table8 . `int_key`;
Then add the WHERE condition: 'WHERE table3 . `pk` > table1 . `int_key`'
Analyzing the results for this query is left as a user exercise.
How to repeat:
1) Load the attached mysqldumps myisam.sql and ndb.sql into
databases spj_myisam and spj_ndb;
2)
use spj_myisam;
SELECT
table1.pk pk1, table2.pk pk2, table3.pk pk3,
table4.pk pk4, table5.pk pk5, table6.pk pk6,
table7.pk pk7, table8.pk pk8, table9.pk pk9
FROM
C AS table1
LEFT JOIN
( C AS table2
LEFT JOIN
( D AS table3
RIGHT JOIN
( C AS table4
LEFT JOIN
A AS table5
ON table4 . `pk` = table5 . `int_key`
) ON table3 . `int_key` = table4 . `int_key`
) ON table2 . `int_key` = table3 . `int_key`
LEFT JOIN
( BB AS table6
LEFT JOIN
( CC AS table7
RIGHT JOIN
D AS table8
ON table7 . `pk` = table8 . `pk`
) ON table6 . `pk` = table7 . `pk`
) ON table3 . `int_key` = table6 . `int_key`
RIGHT JOIN
( CC AS table9
) ON table3 . `int_key` = table9 . `int_key`
) ON table1 . `int_key` = table3 . `int_key`;
Will return the resultset:
+-----+------+------+------+------+------+------+------+------+
| pk1 | pk2 | pk3 | pk4 | pk5 | pk6 | pk7 | pk8 | pk9 |
+-----+------+------+------+------+------+------+------+------+
| 1 | 1 | 11 | 1 | NULL | NULL | NULL | NULL | 14 |
| 2 | 2 | 12 | 2 | NULL | NULL | NULL | NULL | 11 |
| 3 | 3 | 13 | 3 | NULL | NULL | NULL | NULL | 15 |
| 4 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 5 | 5 | 4 | 5 | NULL | 10 | 10 | 10 | 10 |
| 6 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 7 | 7 | 2 | 7 | NULL | NULL | NULL | NULL | 16 |
| 8 | 8 | 7 | 8 | NULL | NULL | NULL | NULL | 19 |
| 9 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 10 | 10 | 10 | 10 | NULL | NULL | NULL | NULL | 18 |
| 11 | 11 | 6 | 11 | NULL | NULL | NULL | NULL | 20 |
+-----+------+------+------+------+------+------+------+------+
3)
Then add the WHERE-condition: (Complete query further below)
'WHERE table3.pk IS NULL'
to select only the 3 NULL rows from t1:
+-----+------+------+------+------+------+------+------+------+
| pk1 | pk2 | pk3 | pk4 | pk5 | pk6 | pk7 | pk8 | pk9 |
+-----+------+------+------+------+------+------+------+------+
| 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 2 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 3 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 4 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 6 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 7 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 8 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 9 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 10 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 11 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+-----+------+------+------+------+------+------+------+------+
10 rows in set (0.00 sec)
Which certainly is not as expected, Actually the entire result set
has changed.
You may also try with 'IS NOT NULL':
+-----+------+------+------+------+------+------+------+------+
| pk1 | pk2 | pk3 | pk4 | pk5 | pk6 | pk7 | pk8 | pk9 |
+-----+------+------+------+------+------+------+------+------+
| 5 | 5 | 4 | 5 | NULL | 10 | 10 | 10 | 10 |
+-----+------+------+------+------+------+------+------+------+
Which only return as single row instead of the 8 rows expected:
============= Complete query ===========
SELECT
table1.pk pk1, table2.pk pk2, table3.pk pk3,
table4.pk pk4, table5.pk pk5, table6.pk pk6,
table7.pk pk7, table8.pk pk8, table9.pk pk9
FROM
C AS table1
LEFT JOIN
( C AS table2
LEFT JOIN
( D AS table3
RIGHT JOIN
( C AS table4
LEFT JOIN
A AS table5
ON table4 . `pk` = table5 . `int_key`
) ON table3 . `int_key` = table4 . `int_key`
) ON table2 . `int_key` = table3 . `int_key`
LEFT JOIN
( BB AS table6
LEFT JOIN
( CC AS table7
RIGHT JOIN
D AS table8
ON table7 . `pk` = table8 . `pk`
) ON table6 . `pk` = table7 . `pk`
) ON table3 . `int_key` = table6 . `int_key`
RIGHT JOIN
( CC AS table9
) ON table3 . `int_key` = table9 . `int_key`
) ON table1 . `int_key` = table3 . `int_key`
WHERE table3 . `pk` IS NULL;