Bug #71199 Optimizer's estimated number of rows is 2 times wrong for (loose?) index scans
Submitted: 21 Dec 2013 18:16 Modified: 23 Dec 2013 7:33
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.14, 5.6.15, 5.5.35 OS:Any
Assigned to: CPU Architecture:Any
Tags: loose index scan, Optimizer

[21 Dec 2013 18:16] Valeriy Kravchuk
Description:
It seems optimizer underestimates number of rows in the range in some cases, producing the output that is exactly 2 times smaller than the real one. I wonder if this also means that internal cost calculations are incorrect.

Look:

mysql> analyze table tgb;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.tgb | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (1.43 sec)

mysql> select count(*) from tgb where id1=6783 and id2<100;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> explain select sql_no_cache id1, max(id2) from tgb where id1=6783 and id2
<100 group by id1;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+--------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |
   1 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+--------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from tgb where id1=6783 and id2<100000;
+----------+
| count(*) |
+----------+
|      204 |
+----------+
1 row in set (0.00 sec)

mysql> explain select sql_no_cache id1, max(id2) from tgb where id1=6783 and id2
<100000 group by id1;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |
 102 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
1 row in set (0.00 sec)

mysql> analyze table tgb;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.tgb | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.39 sec)

mysql> explain select id1, max(id2) from tgb where id1=6783 and id2<10000 group
by id1;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |
 102 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from tgb where id1=6783 and id2<10000 group by id1;
+----------+
| count(*) |
+----------+
|      204 |
+----------+
1 row in set (0.00 sec)

mysql> explain select id1, max(id2) from tgb where id1=6783 and id2<5000 group b
y id1;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |
  50 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from tgb where id1=6783 and id2<5000 group by id1;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

mysql> explain select id1, max(id2) from tgb where id1=6783 and id2<2000 group b
y id1;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |
  18 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from tgb where id1=6783 and id2<2000 group by id1;
+----------+
| count(*) |
+----------+
|       36 |
+----------+
1 row in set (0.00 sec)

mysql> select version();
+------------+
| version()  |
+------------+
| 5.6.14-log |
+------------+
1 row in set (0.04 sec)

All the correlations above do not look random. If this kind of a table access is just a part of the bigger query, wrong estimated number of rows may lead to wrong join order, for example.

How to repeat:
Load data from tgb2.sql attached and the run the statements like this:

analyze table tgb;
explain select id1, max(id2) from tgb where id1=6783 and id2<N group by id1;
select count(*) from tgb where id1=6783 and id2<N group by id1;

for N in 10...10000

Note that in all cases estimated number of rows is equal to real number of rows in the range divided by 2.

Surely, the real number of rows accessed in case of loose index scan access path has nothing in common with the estimated number:

mysql> analyze table tgb;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.tgb | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.46 sec)

mysql> explain select id1, max(id2) from tgb where id1=6783 and id2<5000 group b
y id1;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |
  50 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+---------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from tgb where id1=6783 and id2<5000 group by id1;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select id1, max(id2) from tgb where id1=6783 and id2<5000 group by id1;
+------+----------+
| id1  | max(id2) |
+------+----------+
| 6783 |     4993 |
+------+----------+
1 row in set (0.00 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 4     |
| Handler_read_last     | 1     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

Suggested fix:
Check if there is some (wrong) division by 2 happens.

or, maybe, change calculations somehow to make estimated number of rows close to the real number of rows accessed?

Document this properly (see bug #71189 also).
[21 Dec 2013 18:17] Valeriy Kravchuk
tgb2.sql with table data to load

Attachment: tgb2.sql (application/octet-stream, text), 5.64 KiB.

[23 Dec 2013 7:33] MySQL Verification Team
Hello Valeriy,

Thank you for the bug report and test case.
Verified as described.

Thanks,
Umesh
[23 Dec 2013 7:35] MySQL Verification Team
// 5.6.15

# bin/mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.15-log MySQL Community Server (GPL)

mysql> use test
Database changed
mysql> source /tmp/tgb2.sql
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.04 sec)

Query OK, 290 rows affected (0.01 sec)
Records: 290  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql>  analyze table tgb;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.tgb | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.01 sec)

mysql> explain select id1, max(id2) from tgb where id1=6783 and id2<5000 group by id1;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |   50 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
1 row in set (0.03 sec)

mysql> select count(*) from tgb where id1=6783 and id2<5000 group by id1;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

mysql>  flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select id1, max(id2) from tgb where id1=6783 and id2<5000 group by id1;
+------+----------+
| id1  | max(id2) |
+------+----------+
| 6783 |     4993 |
+------+----------+
1 row in set (0.00 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 4     |
| Handler_read_last     | 1     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)
[28 Jan 2014 17:45] MySQL Verification Team
// 5.5.35

# bin/mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.35-log MySQL Community Server (GPL)
mysql> use test
Database changed
mysql> source /tmp/tgb2.sql
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.29 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 290 rows affected (0.22 sec)
Records: 290  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> analyze table tgb;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.tgb | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> explain select id1, max(id2) from tgb where id1=6783 and id2<5000 group by id1;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
|  1 | SIMPLE      | tgb   | range | PRIMARY       | PRIMARY | 8       | NULL |   50 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
1 row in set (0.04 sec)

mysql> select count(*) from tgb where id1=6783 and id2<5000 group by id1;
+----------+
| count(*) |
+----------+
|      100 |
+----------+
1 row in set (0.00 sec)

mysql>  flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select id1, max(id2) from tgb where id1=6783 and id2<5000 group by id1;
+------+----------+
| id1  | max(id2) |
+------+----------+
| 6783 |     4993 |
+------+----------+
1 row in set (0.00 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 4     |
| Handler_read_last     | 1     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

mysql>