Description:
When you have a situation where you're joining a small table (in the example here, it's just a single page dimension table, two records), and you need fields from it that aren't in the primary key definition (but are of course right there, as the primary key contains them), mysql will choose ALL/join buffer rather than the index. It does this because it thinks it will avoid reading the index record, and then going back to a second structure to get the field you need, even though when it's the primary key, it's all the same structure. This is not always a big hit to performance (though it's annoying) - however, when you have something like an indexed sort, it makes an enormous difference. A query that should run in a millisecond can take 10s, as with that join type, it will need to use a temp table/filesort instead (which is itself something that could be remediated but...not nearly as important).
How to repeat:
CREATE TABLE t1 (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
sort int(10) unsigned NOT NULL,
t2_id tinyint(3) unsigned NOT NULL,
user_id tinyint(3) unsigned NOT NULL,
PRIMARY KEY (id),
KEY user_id (user_id,sort)
) ENGINE=InnoDB;
CREATE TABLE t2 (
id tinyint(3) unsigned NOT NULL,
str varchar(1000) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
insert into t2 values (1,repeat('asdf',250)),(2,repeat('fsda',250));
insert into t1 set id=null;
insert into t1 (id) select null from t1; -- repeat until you have a bit of data, a million rows is fine
update t1 set sort=id,t2_id=if(mod(id,2)=1,1,2),user_id=ceil(rand()*100);
explain
select
t1.id,
t1.sort,
t1.t2_id,
t2.id,
t2.str
from t1
join t2 on t2.id=t1.t2_id
where t1.user_id=5
order by t1.sort
limit 10;
-- 1 SIMPLE t1 ref user_id user_id 1 const 10597 Using temporary; Using filesort
-- 1 SIMPLE t2 ALL PRIMARY NULL NULL NULL 2 Using where; Using join buffer (Block Nested Loop)
explain
select
t1.id,
t1.sort,
t1.t2_id,
t2.id
from t1
join t2 on t2.id=t1.t2_id
where t1.user_id=5
order by t1.sort
limit 10;
-- 1 SIMPLE t1 ref user_id user_id 1 const 10597 Using where
-- 1 SIMPLE t2 eq_ref PRIMARY PRIMARY 1 t1.t2_id 1 Using index
Suggested fix:
Let the optimizer be aware that any table reference using "PRIMARY" is also "using index" - realistically, it should also output that in "explain," but we only really care that it works internally. If it saw that as "using index" it would not attempt the alternate method, and the query would run fine.
Note that this is specific to IOT-type storage engines, primarily innodb. So, you might need a check on the engine. However, as the astounding majority of mysql installs in the world are using innodb, if you skipped that, the tiny loss of function for a very small number of people traded for a big win for every other customer is probably acceptable.