Bug #70630 Why one can access persistent statistics data while they are changing?
Submitted: 16 Oct 2013 8:46 Modified: 17 Oct 2013 23:02
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.14 OS:Any
Assigned to: CPU Architecture:Any

[16 Oct 2013 8:46] Valeriy Kravchuk
Description:
During my work on the test case for http://bugs.mysql.com/bug.php?id=70629 I've noted that one can actually get somewhat wrong/dynamically changing results for some time from persistent InnoDB statistics tables in mysql database:

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:39
                  n_rows: 1190
    clustered_index_size: 19
sum_of_other_index_sizes: 17
2 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:49
                  n_rows: 2024
    clustered_index_size: 32
sum_of_other_index_sizes: 29
2 rows in set (0.00 sec)

Note that no explicit data change happened in between these runs of the same query.

Surely it may take time to re-estimate statistics when  innodb_stats_auto_recalc=ON, but maybe it makes sense to block access to the persistent statistics while it is in the process of re-estimation? 

Assuming that optimizer also sees not completely updated statistics, it may end up with wrong row estimations and wrong plan. I wonder if this is the reason for cases like http://bugs.mysql.com/bug.php?id=70617, when bad plan may be generated just because query is executed too fast after big change of data?

How to repeat:
Start with a test case (ti1 and ti2 tables) from http://bugs.mysql.com/bug.php?id=70629, then:

mysql> insert into ti2(c2) select c2 from ti1;
Query OK, 1024 rows affected (1.71 sec)
Records: 1024  Duplicates: 0  Warnings: 0

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:39
                  n_rows: 1190
    clustered_index_size: 19
sum_of_other_index_sizes: 17
2 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:39
                  n_rows: 1190
    clustered_index_size: 19
sum_of_other_index_sizes: 17
2 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:39
                  n_rows: 1190
    clustered_index_size: 19
sum_of_other_index_sizes: 17
2 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:39
                  n_rows: 1190
    clustered_index_size: 19
sum_of_other_index_sizes: 17
2 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:49
                  n_rows: 2024
    clustered_index_size: 32
sum_of_other_index_sizes: 29
2 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:49
                  n_rows: 2024
    clustered_index_size: 32
sum_of_other_index_sizes: 29
2 rows in set (0.00 sec)

mysql> select * from mysql.innodb_table_stats where table_name like 'ti%'\G
*************************** 1. row ***************************
           database_name: test
              table_name: ti1
             last_update: 2013-10-16 11:03:02
                  n_rows: 1024
    clustered_index_size: 17
sum_of_other_index_sizes: 0
*************************** 2. row ***************************
           database_name: test
              table_name: ti2
             last_update: 2013-10-16 11:34:49
                  n_rows: 2024
    clustered_index_size: 32
sum_of_other_index_sizes: 29
2 rows in set (0.00 sec)

As you can see in the above, initial n_rows value is far from real one and it takes some time before it is updated to a more realistic one.

Suggested fix:
Explain this in the manual?

Block access to persistent statistics while it is still estimated in cases like the above (table is getting 2 times bigger because of some INSERT ... SELECT)?
[17 Oct 2013 23:02] MySQL Verification Team
Thank you for the bug report.