Bug #116334 Incorrect result with LEFT JOIN and small join_buffer_size
Submitted: 10 Oct 16:10 Modified: 6 Nov 22:02
Reporter: Alok Pathak Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.39 OS:Any
Assigned to: CPU Architecture:Any

[10 Oct 16:10] Alok Pathak
Description:
This query is returning empty result even if tbl table has two matching rows.

SELECT a.id, COUNT(*) AS n
FROM tbl AS a
LEFT JOIN fp_tbl AS f ON f.id = a.id
LEFT JOIN fp_pol AS p ON p.lead_id   = f.id
WHERE a.id IN (1,2) AND p.id IS NULL
GROUP BY a.id;
Empty set (0.39 sec)

mysql [localhost:8039] {msandbox} (test) > select * from tbl where id in ('1','2');
+----+------------+
| id | user_id    |
+----+------------+
|  1 |  583532949 |
|  2 | 1342458479 |
+----+------------+
2 rows in set (0.00 sec)

Expected results:
Query should return two rows.

How to repeat:
- Setup MySQL 8.0.39 version with default configuration.

mysql [localhost:8039] {msandbox} (test) > select @@version,@@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 8.0.39    | MySQL Community Server - GPL |
+-----------+------------------------------+
1 row in set (0.00 sec)

- Create three tables

use test;
CREATE TABLE `tbl` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `fp_tbl` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `fp_pol` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `lead_id` int unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

- Insert rows into all three tables, please make sure to insert large number of rows in fp_pol table.

./mysql_random_data_load --host=127.0.0.1 --port=8039 --user=msandbox --password=msandbox test tbl 10
./mysql_random_data_load --host=127.0.0.1 --port=8039 --user=msandbox --password=msandbox test fp_tbl 10
./mysql_random_data_load --host=127.0.0.1 --port=8039 --user=msandbox --password=msandbox test fp_pol 1000000

- Delete rows with id 1,2 from fp_tbl, fp_pol tables.
delete from fp_tbl WHERE id in ('1','2');
delete from fp_pol WHERE id in ('1','2');

- Verify that row id 1,2 are present in tbl only.
mysql [localhost:8039] {msandbox} (test) > select * from tbl where id in ('1','2');
+----+------------+
| id | user_id    |
+----+------------+
|  1 |  583532949 |
|  2 | 1342458479 |
+----+------------+
2 rows in set (0.00 sec)

mysql [localhost:8039] {msandbox} (test) > select * from fp_tbl where id in ('1','2');
Empty set (0.00 sec)

mysql [localhost:8039] {msandbox} (test) > select * from fp_pol where id in ('1','2');
Empty set (0.00 sec)

- Run the query and confirm it doesn't return any matching rows.
SELECT a.id, COUNT(*) AS n
FROM tbl AS a
LEFT JOIN fp_tbl AS f ON f.id = a.id
LEFT JOIN fp_pol AS p ON p.lead_id   = f.id
WHERE a.id IN (1,2) AND p.id IS NULL
GROUP BY a.id;
Empty set (0.39 sec)

- Run the explain and confirm that query uses join buffer for fp_pol table as there is no index on lead_id column.

mysql [localhost:8039] {msandbox} (test) > explain SELECT a.id, COUNT(*) AS n FROM tbl AS a LEFT JOIN fp_tbl AS f ON f.id = a.id LEFT JOIN fp_pol AS p ON p.lead_id   = f.id WHERE a.
id IN (1,2) AND p.id IS NULL GROUP BY a.id;
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+--------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref       | rows   | filtered | Extra                                                  |
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+--------+----------+--------------------------------------------------------+
|  1 | SIMPLE      | a     | NULL       | range  | PRIMARY       | PRIMARY | 4       | NULL      |      2 |   100.00 | Using where; Using index; Using temporary              |
|  1 | SIMPLE      | f     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.a.id |      1 |   100.00 | Using index                                            |
|  1 | SIMPLE      | p     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL      | 998566 |    10.00 | Using where; Not exists; Using join buffer (hash join) |
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+--------+----------+--------------------------------------------------------+
3 rows in set, 1 warning (0.01 sec)

Suggested fix:
Increasing the join_buffer_size to 64KB helps query to return correct result.

set session join_buffer_size=64*1024*1024;

SELECT a.id, COUNT(*) AS n
FROM tbl AS a
LEFT JOIN fp_tbl AS f ON f.id = a.id
LEFT JOIN fp_pol AS p ON p.lead_id   = f.id
WHERE a.id IN (1,2) AND p.id IS NULL
GROUP BY a.id;
+----+---+
| id | n |
+----+---+
|  1 | 1 |
|  2 | 1 |
+----+---+
2 rows in set (0.34 sec)
[11 Oct 7:51] MySQL Verification Team
Hello Alok,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[11 Oct 7:51] MySQL Verification Team
8.0.39 test results

Attachment: 116334_8.0.39.results.txt (text/plain), 5.73 KiB.

[6 Nov 22:02] Jon Stephens
Documented fix as follows in the MySQL 9.2.0 changelog:

    Incorrect results were returned by some queries that used hash
    antijoins when the hash table did not fit in the join buffer and
    spilled to disk. (The query triggering the issue used LEFT JOIN,
    but this was transformed internally from a left outer join to an
    antijoin.)

    The problem was that some rows in the probe table were skipped
    when writing the probe rows to chunk files, the skipped rows
    being those that had NULL in part of the join key. Such rows can
    be skipped for inner joins and semijoins, as they are known to
    have no match in the build table, but for outer joins and
    antijoins, rows in the probe table which have no matching row in
    the build table should be part of the join result, so they must
    be included in the chunk files.

    There was already logic for preserving these rows in the chunk
    files for outer joins. This fix extends that logic so that it
    also applies to antijoins.

Closed.
[22 Nov 13:33] Jon Stephens
Also fixed in MySQL 8.0.41 and 8.4.4.

Changelog entry as in previous comment.

Closed.