Bug #93491 Optimizer does not correctly consider attached conditions in planning
Submitted: 5 Dec 2018 17:40 Modified: 6 Dec 2018 5:21
Reporter: Morgan Tocker Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.13, 5.7.24, 5.6.42 OS:Any
Assigned to: CPU Architecture:Any

[5 Dec 2018 17:40] Morgan Tocker
Description:
Verified on 5.6.42, 5.7.24, 8.0.13

An inner join on a.b_id = b.id semantically means that neither side can be NULL.  The optimizer seems to understand this during execution:

EXPLAIN FORMAT=JSON shows "attached_condition": "(`test`.`a`.`b_id` is not null)")

But if I rewrite the query to include this condition I get a better execution plan (switching from an index scan to a range):

mysql [localhost:8013] {msandbox} (test) > EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id;
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref         | rows | filtered | Extra                    |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
|  1 | SIMPLE      | a     | NULL       | index  | b_id          | b_id    | 5       | NULL        |  258 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | b     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.a.b_id |    1 |   100.00 | Using index              |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql [localhost:8013] {msandbox} (test) > EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id WHERE b_id IS NOT NULL;
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref         | rows | filtered | Extra                    |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
|  1 | SIMPLE      | a     | NULL       | range  | b_id          | b_id    | 5       | NULL        |    2 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | b     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.a.b_id |    1 |   100.00 | Using index              |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

How to repeat:
DROP TABLE IF EXISTS a,b;

CREATE TABLE a (
 id INT NOT NULL PRIMARY KEY auto_increment,
 cola BLOB,
 b_id INT,
 INDEX (b_id)
);

CREATE TABLE b (
 id INT NOT NULL PRIMARY KEY auto_increment,
 cola BLOB 
);

INSERT INTO a VALUES (NULL, RANDOM_BYTES(1024), NULL);
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;

INSERT INTO b SELECT NULL, RANDOM_BYTES(1024) FROM a;

SELECT COUNT(*) FROM a;
SELECT COUNT(*) FROM b;

INSERT INTO a VALUES (NULL, RANDOM_BYTES(1024), 1),(NULL, RANDOM_BYTES(1024), 1);

ANALYZE TABLE a;
ANALYZE TABLE b;

EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id;
EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id WHERE b_id IS NOT NULL;

EXPLAIN FORMAT=JSON SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id\G
EXPLAIN FORMAT=JSON SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id WHERE b_id IS NOT NULL\G

Note: In the final EXPLAIN FORMAT=JSON you can see that the better plan is caused by a further transformation to propagate a.b_id is not null.

Suggested fix:
Is it possible to attach conditions and then plan?
[5 Dec 2018 18:06] Morgan Tocker
** Edit **

I should have said the last example is:
EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id WHERE b.id IS NOT NULL;

This correctly applies b_id IS NOT NULL also.
[6 Dec 2018 5:21] Umesh Shastry
Hello Morgan,

Thank you for the report and test case.

regards,
Umesh