Description:
Verified on 5.6.42, 5.7.24, 8.0.13
An inner join on a.b_id = b.id semantically means that neither side can be NULL. The optimizer seems to understand this during execution:
EXPLAIN FORMAT=JSON shows "attached_condition": "(`test`.`a`.`b_id` is not null)")
But if I rewrite the query to include this condition I get a better execution plan (switching from an index scan to a range):
mysql [localhost:8013] {msandbox} (test) > EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id;
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
| 1 | SIMPLE | a | NULL | index | b_id | b_id | 5 | NULL | 258 | 100.00 | Using where; Using index |
| 1 | SIMPLE | b | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.a.b_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql [localhost:8013] {msandbox} (test) > EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id WHERE b_id IS NOT NULL;
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
| 1 | SIMPLE | a | NULL | range | b_id | b_id | 5 | NULL | 2 | 100.00 | Using where; Using index |
| 1 | SIMPLE | b | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.a.b_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+-------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)
How to repeat:
DROP TABLE IF EXISTS a,b;
CREATE TABLE a (
id INT NOT NULL PRIMARY KEY auto_increment,
cola BLOB,
b_id INT,
INDEX (b_id)
);
CREATE TABLE b (
id INT NOT NULL PRIMARY KEY auto_increment,
cola BLOB
);
INSERT INTO a VALUES (NULL, RANDOM_BYTES(1024), NULL);
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO a SELECT NULL, RANDOM_BYTES(1024), NULL FROM a;
INSERT INTO b SELECT NULL, RANDOM_BYTES(1024) FROM a;
SELECT COUNT(*) FROM a;
SELECT COUNT(*) FROM b;
INSERT INTO a VALUES (NULL, RANDOM_BYTES(1024), 1),(NULL, RANDOM_BYTES(1024), 1);
ANALYZE TABLE a;
ANALYZE TABLE b;
EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id;
EXPLAIN SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id WHERE b_id IS NOT NULL;
EXPLAIN FORMAT=JSON SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id\G
EXPLAIN FORMAT=JSON SELECT COUNT(*) FROM a INNER JOIN b ON a.b_id = b.id WHERE b_id IS NOT NULL\G
Note: In the final EXPLAIN FORMAT=JSON you can see that the better plan is caused by a further transformation to propagate a.b_id is not null.
Suggested fix:
Is it possible to attach conditions and then plan?