Bug #77464 optimizer unaware that PRIMARY implies "using index" (innodb, etc)
Submitted: 24 Jun 2015 0:41 Modified: 24 Jun 2015 14:53
Reporter: Trey Raymond Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.6.24 OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb, Optimizer, sort

[24 Jun 2015 0:41] Trey Raymond
Description:
When you have a situation where you're joining a small table (in the example here, it's just a single page dimension table, two records), and you need fields from it that aren't in the primary key definition (but are of course right there, as the primary key contains them), mysql will choose ALL/join buffer rather than the index.  It does this because it thinks it will avoid reading the index record, and then going back to a second structure to get the field you need, even though when it's the primary key, it's all the same structure.  This is not always a big hit to performance (though it's annoying) - however, when you have something like an indexed sort, it makes an enormous difference.  A query that should run in a millisecond can take 10s, as with that join type, it will need to use a temp table/filesort instead (which is itself something that could be remediated but...not nearly as important).

How to repeat:
CREATE TABLE t1 (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  sort int(10) unsigned NOT NULL,
  t2_id tinyint(3) unsigned NOT NULL,
  user_id tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (id),
  KEY user_id (user_id,sort)
) ENGINE=InnoDB;

CREATE TABLE t2 (
  id tinyint(3) unsigned NOT NULL,
  str varchar(1000) NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

insert into t2 values (1,repeat('asdf',250)),(2,repeat('fsda',250));
insert into t1 set id=null;
insert into t1 (id) select null from t1; -- repeat until you have a bit of data, a million rows is fine
update t1 set sort=id,t2_id=if(mod(id,2)=1,1,2),user_id=ceil(rand()*100);

explain
select
	t1.id,
	t1.sort,
	t1.t2_id,
	t2.id,
	t2.str
from t1
	join t2 on t2.id=t1.t2_id
where t1.user_id=5
order by t1.sort
limit 10;
-- 1	SIMPLE	t1	ref	user_id	user_id	1	const	10597	Using temporary; Using filesort
-- 1	SIMPLE	t2	ALL	PRIMARY	NULL	NULL	NULL	2	Using where; Using join buffer (Block Nested Loop)

explain
select
	t1.id,
	t1.sort,
	t1.t2_id,
	t2.id
from t1
	join t2 on t2.id=t1.t2_id
where t1.user_id=5
order by t1.sort
limit 10;
-- 1	SIMPLE	t1	ref	user_id	user_id	1	const	10597	Using where
-- 1	SIMPLE	t2	eq_ref	PRIMARY	PRIMARY	1	t1.t2_id	1	Using index

Suggested fix:
Let the optimizer be aware that any table reference using "PRIMARY" is also "using index" - realistically, it should also output that in "explain," but we only really care that it works internally.  If it saw that as "using index" it would not attempt the alternate method, and the query would run fine.

Note that this is specific to IOT-type storage engines, primarily innodb.  So, you might need a check on the engine.  However, as the astounding majority of mysql installs in the world are using innodb, if you skipped that, the tiny loss of function for a very small number of people traded for a big win for every other customer is probably acceptable.
[24 Jun 2015 0:45] Trey Raymond
Oh, and additionally, using "force index" on t2 will override this, but index hints aren't something people want to deploy to prod apps.  A second workaround, removing the join and using a dependent subquery instead, will give the same functionality without the problem - but, we all know that's a pretty bad solution.
[24 Jun 2015 14:53] MySQL Verification Team
This is one potentially important feature request for the Optimizer team.

Verified as a valid feature request.