Description:
table A has a unique key on nullable (b_id) - fk to table B's primary key. lots of rows are null (A has many more rows than B, and relation is 1-1).
select * from A join B on B.id=A.b_id; will optimize for FTS on large table A because when it calculates row estimates for the join, it uses nulls, and gets a large one. Rebuilding stats sometimes fixes this, but even in cases where it does, you have to run it very often, the optimizer really wants to use A first after only a few modifications. It's probably just a formula not accounting properly for nulls - it's not viewing the join on a unique key with a not-null condition as an eq_ref, which suggests something's wrong there
Basically, even with the not null codition, it does not consider that the key is unique, and that therefore the max rows for that table reference (per previous reference) is 1 and it could scan B much more efficiently.
see comments in example for more detail
How to repeat:
CREATE TABLE `a` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`b_id` int(10) unsigned DEFAULT NULL,
`other_field` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `a_id` (`b_id`)
) ENGINE=InnoDB;
CREATE TABLE `b` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`other_field` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
insert into a (id) values (null);
insert into a (id) select null from a; -- repeat lots of times, i made 8M rows
insert into b (id) values (null);
insert into b (id) select null from b; -- repeat a few times, i made 64k rows
update a set b_id=id where id<=(select max(id) from b); -- sloppy but you get the idea, leave most null
explain
select * from b join a on b.id=a.b_id;
-- 1 SIMPLE a ALL a_id NULL NULL NULL 8374392 Using where
-- 1 SIMPLE b eq_ref PRIMARY PRIMARY 4 ytest.a.b_id 1 NULL
select * from b join a on b.id=a.b_id;
-- 65536 rows in set (1.55 sec)
explain
select straight_join * from b join a on b.id=a.b_id;
-- 1 SIMPLE b ALL PRIMARY NULL NULL NULL 65895 NULL
-- 1 SIMPLE a ref a_id a_id 5 ytest.b.id 4097 NULL
-- see how it still considers it a 1-n join rather than 1-1, i'm guessing it's just a formula not accounting for this case, which would be the actual bug here
select straight_join * from b join a on b.id=a.b_id;
-- 65536 rows in set (0.16 sec)
go ahead and analyze table a; - in some examples this fixes the plan, in some it does not, i varied the row counts for each but couldn't find an exact threshold. either way, after that modify some records in a, not even a large % of them, but the stats will be off again very quickly.
Suggested fix:
make sure the optimizer knows how to handle unique keys with a not-null reference