Bug #56165 Bad Optimization
Submitted: 21 Aug 2010 8:12 Modified: 23 Aug 2010 18:22
Reporter: Serge Igitov Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.not_sure, 5.0, 5.1, 5.6.99 bzr OS:Any
Assigned to: CPU Architecture:Any

[21 Aug 2010 8:12] Serge Igitov
Description:
Hello.

"articles" is a table with a lots of rows. "kindid" is an indexed column. "title" is not.

The following query:

SELECT kindid,title FROM articles GROUP BY kindid ORDER BY NULL

is used to pick up all different article kinds from the table, together with one (any) title for each kind. The order is not important, thus "ORDER BY NULL".

The query works as expected. Although, if I examine it with EXPLAIN:

EXPLAIN SELECT kindid,title FROM articles GROUP BY kindid ORDER BY NULL

it shows:

id	select_type	table		type	possible_keys	key	key_len	ref	rows	Extra	
1	SIMPLE		articles	ALL	NULL		NULL	NULL	NULL	827	Using temporary

Why "type"="ALL"?

Why MySQL has to look up through all thousands of rows in table, while it's enough to look into BTREE index, pick up all different values for kindid (kindid is an indexed column), then quickly pick up a title for each (any one will work, but let it be, say, title from the first row covered by index subgroup)?

There are only 10-12 different values of kindid in a table, so the query should work much faster than it actually does.

Hope this case can be useful.

How to repeat:
EXPLAIN SELECT kindid,title FROM articles GROUP BY kindid ORDER BY NULL
[21 Aug 2010 13:02] Sveta Smirnova
Thank you for the report.

Please indicate accurate version number of MySQL server you use and send output of SHOW CREATE TABLE articles
[22 Aug 2010 15:43] Serge Igitov
Sorry about the incomplete report. (btw, Svetlana, you're doing a tremendous job carefully checking all those reports, after searching in the bug database I started to think my issue is not that important. ;))

This is the reproduce script:

DROP TABLE IF EXISTS articles;
CREATE TABLE articles (
  id INT NOT NULL AUTO_INCREMENT,
  kindid INT NOT NULL,
  title TINYTEXT NOT NULL,
  PRIMARY KEY (id),
  KEY kindid (kindid));

INSERT INTO articles VALUES
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),(),
  (),(),(),(),(),(),(),(),(),();

UPDATE articles SET kindid=FLOOR(RAND()*10)+1, title=CONCAT('Title #',kindid);
ANALYZE TABLE articles;

# Examining the query

EXPLAIN SELECT kindid,title FROM articles GROUP BY kindid ORDER BY NULL;

--- Expected Result ---

The query is optimized (type != ALL)

--- Actual Result ---

id	select_type	table		type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE		articles	ALL	NULL		NULL	NULL	NULL	100	Using temporary

Sadly, MySQL runs through all rows in a table.

Regards,
Serge
[23 Aug 2010 11:51] Sveta Smirnova
Thank you for the feedback.

This can be not a bug, because even if call FORCE INDEX and index is used all rows will be scanned and temporary can be just quicker than use index:

EXPLAIN SELECT kindid,title FROM articles force index(kindid) GROUP BY kindid ORDER BY NULL;
id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  articles        index   NULL    kindid  4       NULL    1600
EXPLAIN SELECT kindid,title FROM articles GROUP BY kindid ORDER BY NULL;
id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  articles        ALL     NULL    NULL    NULL    NULL    1600    Using temporary

But still verifying this as a bug to let Optimizer team decide if "index for group by" can be used.
[23 Aug 2010 18:23] Serge Igitov
Well, the idea I was suggesting above was, how to explain it - to make MySQL run through only 10 rows, that's quite enough to satisfy the query (there are only 10 different values of "kindid"), instead of running through all 1600 rows in a table, with or without using the index.

Making a full table scan, but according to the indexed order, can be even slower than regular sequential table scan. On the other hand, to do 10 quick "lookups" into .MYD would be enough.

If that can be done, of course...