MySQL Bugs: #70629: InnoDB updated rec_per_key[] statis not published to the optimizer enough often

Bug #70629	InnoDB updated rec_per_key[] statis not published to the optimizer enough often
Submitted:	16 Oct 2013 8:27	Modified:	29 Nov 2013 11:48
Reporter:	Valeriy Kravchuk	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.6.14	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Based on http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_stats_auto_rec... and other manual pages explaining how persistent statistics works one may assume that if the table is created with persistent statistics and innodb_stats_auto_recalc is ON, like it happens with default settings in MySQL 5.6:

mysql> show variables like 'innodb_stats%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_sample_pages            | 8           |
| innodb_stats_transient_sample_pages  | 8           |
+--------------------------------------+-------------+
7 rows in set (0.00 sec)

there is no need to run ANALYZE TABLE after loading a lot of new data into the newly created table - if more than 10% of rows in the table are changed (insert is a change, right?) statistics will be updated properly. 

This is not always the case it seems (even though it's not obvious why) and plan based on persistent statistics that is updated may be wrong/non-optimal until explicit ANALYZE TABLE is executed after data load. See "How to repeat" section below.

How to repeat:
mysql> create table ti1(id int auto_increment primary key, c2 char(200)) engine=InnoDB;
Query OK, 0 rows affected (0.34 sec)

mysql> insert into ti1(c2) values ('a');
Query OK, 1 row affected (0.03 sec)

mysql> insert into ti1(c2) select repeat('b',200) from ti1;                     Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0

...

mysql> insert into ti1(c2) select repeat('b',200) from ti1;
Query OK, 512 rows affected (0.72 sec)
Records: 512  Duplicates: 0  Warnings: 0

mysql> create table ti2(id int auto_increment primary key, c2 char(200)) engine=InnoDB;
Query OK, 0 rows affected (0.38 sec)

mysql> alter table ti2 add key(c2);
Query OK, 0 rows affected (0.44 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into ti2 select * from ti1;
Query OK, 1024 rows affected (1.35 sec)
Records: 1024  Duplicates: 0  Warnings: 0

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:06:35
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 15
2 rows in set (0.00 sec)

So, we see in the related tables (I checked that index one also, had to remove the output to fit this stupid 8K limit) that persistent statistics for the ti2 table is there and it was updated after INSERT ... SELECT as we expected. Now, let's check the plan for one simple query:

mysql> explain select * from ti1 inner join ti2 on ti1.c2 = ti2.c2 where ti2.id < 10;
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref         | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
|  1 | SIMPLE      | ti1   | ALL  | NULL          | NULL | NULL    | NULL        | 1024 | Using where              |
|  1 | SIMPLE      | ti2   | ref  | PRIMARY,c2    | c2   | 201     | test.ti1.c2 |    1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
2 rows in set (0.00 sec)

For some smart reason optimizer decides, based on existing persistent (I assume) statistics that it makes sense to scan the entire ti1 table and check corresponding rows in ti2 table using index on c2 column (even though join condition is true for for every row in ti1, it's ti2.id that is really a limit).

One can easily check what it means for the runtime and number of access operations:

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from ti1 inner join ti2 on ti1.c2 = ti2.c2 where ti2.id < 10;   
...
7162 rows in set (0.80 sec)

mysql> show status like 'Handler_read%';
+-----------------------+---------+
| Variable_name         | Value   |
+-----------------------+---------+
| Handler_read_first    | 1       |
| Handler_read_key      | 1025    |
| Handler_read_last     | 0       |
| Handler_read_next     | 1046530 |
| Handler_read_prev     | 0       |
| Handler_read_rnd      | 0       |
| Handler_read_rnd_next | 1025    |
+-----------------------+---------+
7 rows in set (0.00 sec)

Awful, as expected. Now, what if we run ANALYZE on ti2?

mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2013-10-16 11:14:24 |
+---------------------+
1 row in set (0.00 sec)

mysql> analyze table ti2;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.ti2 | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.34 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G     *************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:14:30
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 15
2 rows in set (0.00 sec)

...

Statistics was updated (for me it looks the same though). Now let's check the plan for the same query:

mysql> explain select * from ti1 inner join ti2 on ti1.c2 = ti2.c2 where ti2.id < 10;
+----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                              |
+----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------------+
|  1 | SIMPLE      | ti2   | range | PRIMARY,c2    | PRIMARY | 4       | NULL |    8 | Using where                                        |
|  1 | SIMPLE      | ti1   | ALL   | NULL          | NULL    | NULL    | NULL | 1024 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------------+
2 rows in set (0.00 sec)

Looks notably more reasonable! Quick check:

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from ti1 inner join ti2 on ti1.c2 = ti2.c2 where ti2.id < 10;

...

7162 rows in set (0.01 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 2     |
| Handler_read_key      | 2     |
| Handler_read_last     | 0     |
| Handler_read_next     | 8     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 1025  |
+-----------------------+-------+
7 rows in set (0.00 sec)

That Handler_read% values above do show the difference.

Suggested fix:
Describe in the manual that one should still run ANALYZE TABLE after loading data, no mater what value of innodb_stats_auto_recalc he has.

Expose more details about persistent statistics to help troubleshooting reasons of bad query plans.

If there is actually an optimizer bug/known deficiency exposed by the above, report and fix it separately.

Thank you for the bug report.

The following documentation pages have been updated:

http://dev.mysql.com/doc/refman/5.6/en/innodb-performance.html#innodb-persistent-stats

http://dev.mysql.com/doc/refman/5.7/en/innodb-performance.html#innodb-persistent-stats

The changes will appear in the next published documentation, in the next 24 hours or so. 

Thank you for the bug report.

I closed this bug accidentally. Please disregard my comment above. It is for another bug (Bug #70741).

The problem here is that InnoDB's statistics rec_per_key[] (or cardinality or n_diff[]) are published to the optimizer when ha_innobase::info(HA_STATUS_CONST) is called and it is called only when a table object is created (usually only when the table is opened). So when InnoDB recalculates the statistics due to subsequent DML, the old stats are being used to calculate the execution plan.

We will do some experiments and will probably call ha_innobase::info(HA_STATUS_CONST) at the beginning of each query.

Bug # 71576 is a duplicate of this bug.