Bug #71191 Loose index scan is used even though full table scan/tight index scan is faster
Submitted: 20 Dec 2013 15:30 Modified: 20 Dec 2013 20:19
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.5.33 OS:Any
Assigned to: CPU Architecture:Any
Tags: loose index scan, Optimizer

[20 Dec 2013 15:30] Valeriy Kravchuk
Description:
It seems optimizer does NOT estimate cost of loose index scan properly sometimes.

Look:

mysql> explain select sql_no_cache id1, max(id2) from tgb ignore index(primary)
where id2<10000 group by id1 having id1=6783;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra                                        |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------------------------------------+
|  1 | SIMPLE      | tgb   | ALL  | NULL          | NULL | NULL    | NULL | 7872
 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------------------------------------+
1 row in set (0.04 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select sql_no_cache id1, max(id2) from tgb ignore index(primary) where id
2<10000 group by id1 having id1=6783;
+------+----------+
| id1  | max(id2) |
+------+----------+
| 6783 |     9971 |
+------+----------+
1 row in set (0.08 sec)

mysql> show status like 'Handler_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 8192  |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 1     |
| Handler_read_rnd_next      | 13680 |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 2705  |
| Handler_write              | 5486  |
+----------------------------+-------+
16 rows in set (0.05 sec)

So, full table scan, then some small temporary table in memory is created and used, and it takes only 0.08 seconds to execute. Now, by default loose index scan is used for this query:

mysql> explain select sql_no_cache id1, max(id2) from tgb where id2<10000 group
by id1 having id1=6783;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | NULL          | PRIMARY | 8       | NULL |
7873 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
1 row in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select sql_no_cache id1, max(id2) from tgb where id2<10000 group by id1 h
aving id1=6783;
+------+----------+
| id1  | max(id2) |
+------+----------+
| 6783 |     9971 |
+------+----------+
1 row in set (0.24 sec)

mysql> show status like 'Handler_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 10976 |
| Handler_read_last          | 1     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
16 rows in set (0.05 sec)

and it takes longer to do 10976 (why? See more questions in bug #71189...) key lookups.

Updating statistics does NOT help:

mysql> analyze table tgb;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.tgb | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> explain select sql_no_cache id1, max(id2) from tgb where id2<10000 group
by id1 having id1=6783;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | NULL          | PRIMARY | 8       | NULL |
8111 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
1 row in set (0.05 sec)

How to repeat:
Load data from the file attached:

C:\Program Files\MySQL\MySQL Server 5.6\bin>mysql -uroot -proot -P3312 test <p:\
percona\tgb.sql
Warning: Using a password on the command line interface can be insecure.

Then run the following and make your own conclusions:

explain select sql_no_cache id1, max(id2) from tgb ignore index(primary)
where id2<10000 group by id1 having id1=6783;
flush status;
select sql_no_cache id1, max(id2) from tgb ignore index(primary) where id2<10000 group by id1 having id1=6783;
show status like 'Handler_%';

explain select sql_no_cache id1, max(id2) from tgb where id2<10000 group
by id1 having id1=6783;
flush status;
select sql_no_cache id1, max(id2) from tgb where id2<10000 group by id1 having id1=6783;
show status like 'Handler_%';

analyze table tgb;
explain select sql_no_cache id1, max(id2) from tgb where id2<10000 group
by id1 having id1=6783;

Suggested fix:
It seems MySQL 5.6.x is smarter in this case and may use more efficient tight index scan for this query:

C:\Program Files\MySQL\MySQL Server 5.6\bin>mysql -uroot -proot -P3314 test <p:\
percona\tgb.sql
Warning: Using a password on the command line interface can be insecure.

C:\Program Files\MySQL\MySQL Server 5.6\bin>mysql -uroot -proot -P3314 test
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 5.6.14-log MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> explain select sql_no_cache id1, max(id2) from tgb where id2<10000 group
by id1 having id1=6783;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+--------------------------+
|  1 | SIMPLE      | tgb   | index | PRIMARY       | PRIMARY | 8       | NULL |
  22 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+--------------------------+
1 row in set (0.03 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select sql_no_cache id1, max(id2) from tgb where id2<10000 group by id1 h
aving id1=6783;
+------+----------+
| id1  | max(id2) |
+------+----------+
| 6783 |     9971 |
+------+----------+
1 row in set (0.02 sec)

mysql> show status like 'Handler_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 8192  |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
18 rows in set (0.00 sec)

So, fix 5.5.x in a similar way maybe?
[20 Dec 2013 15:31] Valeriy Kravchuk
tgb.sql dump to load

Attachment: tgb.sql (application/octet-stream, text), 112.05 KiB.

[20 Dec 2013 20:19] Sveta Smirnova
Thank you for the report.

Slowdown verified as described with version 5.5. With version 5.6 and up it is not repeatable as you said. I still set this bug to "Verified", because I can not find bug about same issue in 5.5.