Bug #71576 InnoDB Table Statistics Is Not Updated on Bulk INSERT
Submitted: 4 Feb 2014 6:14 Modified: 4 Feb 2014 18:08
Reporter: Jervin R Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.5.36 OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb, statistics

[4 Feb 2014 6:14] Jervin R
Description:
When an InnoDB table is populated by INSERT INTO ... SELECT or CREATE TABLE ... SELECT (probably other conditions as well), the resulting table statistics is not reliable.

How to repeat:
mysql5536> SHOW CREATE TABLE t1 \G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `h_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `u_id` int(10) unsigned NOT NULL,
  `cn_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`h_id`)
) ENGINE=InnoDB AUTO_INCREMENT=87409 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql5536> SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|    50000 |
+----------+
1 row in set (0.01 sec)

mysql5536> SHOW INDEXES FROM t1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t1    |          0 | PRIMARY  |            1 | h_id        | A         |       50276 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

mysql5536> CREATE TABLE t2 (h_id BIGINT UNSIGNED NOT NULL, u_id INT UNSIGNED NOT NULL, cn_id INT UNSIGNED NOT NULL, KEY(u_id), KEY(cn_id)) ENGINE=InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql5536> INSERT INTO t2 SELECT * FROM t1;
Query OK, 50000 rows affected (0.34 sec)
Records: 50000  Duplicates: 0  Warnings: 0

mysql5536> SHOW INDEXES FROM t2;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t2    |          1 | u_id     |            1 | u_id        | A         |       50386 |     NULL | NULL   |      | BTREE      |         |               |
| t2    |          1 | cn_id    |            1 | cn_id       | A         |       50386 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

mysql5536> SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
|    50000 |
+----------+
1 row in set (0.00 sec)

mysql5536> ANALYZE TABLE t2;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t2 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.00 sec)

mysql5536> SHOW INDEXES FROM t2;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t2    |          1 | u_id     |            1 | u_id        | A         |        1749 |     NULL | NULL   |      | BTREE      |         |               |
| t2    |          1 | cn_id    |            1 | cn_id       | A         |        1389 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

mysql5536> SHOW GLOBAL VARIABLES LIKE '%stats%';
+---------------------------+---------------+
| Variable_name             | Value         |
+---------------------------+---------------+
| innodb_stats_method       | nulls_equal   |
| innodb_stats_on_metadata  | ON            |
| innodb_stats_sample_pages | 8             |
| myisam_stats_method       | nulls_unequal |
+---------------------------+---------------+
4 rows in set (0.00 sec)

mysql5536> CREATE TABLE t3 (h_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, u_id INT UNSIGNED NOT NULL, cn_id INT UNSIGNED NOT NULL, KEY(u_id), KEY(cn_id)) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

mysql5536> INSERT INTO t3 SELECT * FROM t1;
Query OK, 50000 rows affected (0.35 sec)
Records: 50000  Duplicates: 0  Warnings: 0

mysql5536> SHOW INDEXES FROM t3;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t3    |          0 | PRIMARY  |            1 | h_id        | A         |       50276 |     NULL | NULL   |      | BTREE      |         |               |
| t3    |          1 | u_id     |            1 | u_id        | A         |       50276 |     NULL | NULL   |      | BTREE      |         |               |
| t3    |          1 | cn_id    |            1 | cn_id       | A         |       50276 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

Suggested fix:
ANALYZE TABLE fixes the cardinality issue.
[4 Feb 2014 6:15] Jervin R
This looks related to http://bugs.mysql.com/bug.php?id=70629
[4 Feb 2014 6:35] Jervin R
I can repeat this at least up to 5.5.23.
[4 Feb 2014 11:54] MySQL Verification Team
Thank you for the bug report. This bug looks to me duplicate/related
to http://bugs.mysql.com/bug.php?id=70629 . Please see last comment
in that bug. Thanks.
[4 Feb 2014 13:51] Jervin R
Miguel,

Yes, I suspect that bug may be related but I cannot tell easily if its a duplicate as based on the description it may only involve persistent stats on 5.6. I will have to defer to the developer involved on that bug if its a duplicate or not.
[4 Feb 2014 17:11] Valeriy Kravchuk
Formally bug #70629 is reported (and verified) for 5.6 only and is related to the way persistent statistics is recalculated (not fast enough after such a change, even if it is set to be recalculated).

Here we have 5.5 and statistics is re-calculated based on different criteria. The root cause (as stated in the comment dated [29 Nov 2013 11:48] Vasil Dimov in my bug) may be the same though - outdated statistics is used. Workaround (explicit ANALYZE after bulk INSERT) is also the same. 

So, I'd say it's up to InnoDB developers to decide are these cases any different from the fix point of view.
[4 Feb 2014 18:08] MySQL Verification Team
This bug is indeed a duplicate of the bug # 70629. 

There were changes between 5.5 and 5.6 in this area, but most of them are related to persistent statistics and in the much larger configurability with --innodb-stats-* options.

Still, in 5.6, just as in 5.5, it is InnoDB Storage Engine which decides whether data has changed sufficiently in order to warrant new calculus of the persistent data. There will be a final solution to this problem when, one day, histograms are introduced for preserving index statistics.

The workaround remains the same, which is to run ANALYZE whenever you feel that statistics are significantly changed by any DML.