MySQL Bugs: #36513: Wildly incorrect cardinality stats causing poor execution plans

Bug #36513	Wildly incorrect cardinality stats causing poor execution plans
Submitted:	5 May 2008 16:45	Modified:	23 Sep 2009 0:36
Reporter:	Gordon Shannon	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.1	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
We have a table with 58.8 million rows. One int column -- dw_topic_dim_id -- has 13,684 distinct values.  After running Optimize table, the cardinality is now 18.  This severely low value causes the optimizer to avoid using this index, forcing us to use index hints.  Running Analyze table causes no changes.  The same data in a 5.0.51a database has cardinality of 364521.

How to repeat:
N/A

select count(distinct dw_topic_dim_id) from dw_hist_message_fact;
+---------------------------------+
| count(distinct dw_topic_dim_id) |
+---------------------------------+
|                           13684 | 
+---------------------------------+

Thank you for a problem report. Can you upload smallest set of table's data to demonstrate the behaviour described? Is it possible for you to alter storage engine to MyISAM and check if cardinality estimation will be more correct then?

I suspect it is NOT an optimizer bug but a problem in InnoDB's implementation of statistics estimation routines.

Regarding altering the table to MyIsam, that would probably take hours, and we require the table to be transactional.  Are you suggesting I do that because it would fix my problem, or because you want to just see what it does?

It's going to be very tedious and time consuming, if not impossible, for me to generate a data set that will consistently reproduce this behavior. From what I have heard, and from my own observations, I believe that this is most likely a result of the "8 random dives".  Is is not true that the cardinality is the result of a calculation made by 8 random samples of the index?  If so, it would not surprise me to see cardinality numbers way too high or way too low, and they would generally be worse the more rows in the table.    My guess about this one is that it happened to find the same topic id 8 times in a row, and assumed the "lowest" cardinality, which often I see as the magic number 18.

Yes, I wanted to check that it is related to InnoDB only and its random dives method. If the problem is repeatable with MyISAM then it is in MySQL server's code, not in storage engine's.

Surely, please, do not check this in production as MyISAM is not an option for you anyway.

I do not have a database to try this on at the moment, given that it would be unusable for 8 hours or so.  I will try to get it done as soon as I can.

Here's some possibly useful stats on the selectivity of the column in question (dw_topic_dim_id).

Rows in table: 59 million
Number of distinct values: 13,684
Average row count of each value: 4,350 rows
Number of values with > 100,000 rows: 98
Maximum number of rows with a single value: 6,070,442 !

Could the random dive be hitting this value all 8 times?

A simple statistical calculation of the probability that this would happen would go like 

((6.070.444/59.0000.000)^8) = 0.0000000000000126 %

Even if all distinct values were equally probable (what they are not) the total chance that same value would be hit 8 times would be 

((6.070.444/59.0000.000)^8)*13684 = 0.000000000172%.  

To put it another way: only for around 1000 billion times you would do this 'experiment' it would hit the same values 8 times.  The probability that your neighboring nuclear powever station will blow up tomorrow is much-much higher I guess!

Peter
(not a MySQL person)

I suppose it depends on how truly random the algorithm is.

I am seeing same problem on innodb table with 598982 rows.  Analyze the table multiple times give vastly different cardinality ranging from 17, ~3000, ~1500.  If your table have the same column in different indexes, u also get different cardinality across various indexes.

Once I convert that table to myisam, the cardinality is much closer and very consistent and close to the actual distinct value.

How can mysql rely on these unreliable innodb stats to come up with query plan?

Analyze on Innodb has got to do a better job in sampling.

Maggie

I loaded 84million rows into an identical myisam table and viewed statistics.  Cardinality here seems to be much more accurate.

[root@db2] 16:57:15 6> show index from messages;
+----------+------------+--------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+----------+
| Table    | Non_unique | Key_name                 | Seq_in_index | Column_name             | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment  |
+----------+------------+--------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+----------+
| messages |          0 | PRIMARY                  |            1 | dw_hist_message_fact_id | A         |    84806491 |     NULL | NULL   |      | BTREE      |          | 
| messages |          0 | message_topic_idx        |            1 | message_lookup_id       | A         |    28268830 |     NULL | NULL   |      | BTREE      |          | 
| messages |          0 | message_topic_idx        |            2 | dw_topic_dim_id         | A         |    84806491 |     NULL | NULL   | YES  | BTREE      |          | 
| messages |          1 | dw_date_idx              |            1 | dw_date_dim_id          | A         |        1323 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | dw_date_idx              |            2 | dw_topic_dim_id         | A         |     1367846 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | board_topic_density_idx  |            1 | dw_source_dim_id        | A         |    14134415 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | board_topic_density_idx  |            2 | dw_date_dim_id          | A         |    14134415 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | board_topic_density_idx  |            3 | valid                   | A         |    14134415 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | board_topic_density_idx  |            4 | dw_topic_dim_id         | A         |    84806491 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | topic_created_valid_idx2 |            1 | dw_topic_dim_id         | A         |     1367846 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | topic_created_valid_idx2 |            2 | ci_time                 | A         |    84806491 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | topic_created_valid_idx2 |            3 | valid                   | A         |    84806491 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | topic_created_valid_idx2 |            4 | dw_source_category_id   | A         |    84806491 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | topic_created_valid_idx2 |            5 | alg_ci_rank_rollup      | A         |    84806491 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | source_topic_date_idx    |            1 | dw_source_dim_id        | A         |    14134415 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | source_topic_date_idx    |            2 | dw_topic_dim_id         | A         |    84806491 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | source_topic_date_idx    |            3 | dw_date_dim_id          | A         |    84806491 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | message_lookup_idx       |            1 | message_lookup_id       | A         |    42403245 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | created_source_idx       |            1 | created_time            | A         |     7709681 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | created_source_idx       |            2 | source_type             | A         |     7709681 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | source_time_idx          |            1 | dw_source_dim_id        | A         |    14134415 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | source_time_idx          |            2 | ci_time                 | A         |    28268830 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | fk_message_language      |            1 | language_code           | A         |         220 |     NULL | NULL   |      | BTREE      | disabled | 
| messages |          1 | nk_parent                |            1 | parent_id               | A         |    84806491 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | nk_updated               |            1 | updated_on              | A         |    84806491 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | nk_updated               |            2 | source_type             | A         |    84806491 |     NULL | NULL   | YES  | BTREE      | disabled | 
| messages |          1 | nk_updated               |            3 | valid                   | A         |    84806491 |     NULL | NULL   |      | BTREE      | disabled | 
+----------+------------+--------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+----------+
27 rows in set (0.07 sec)

ANALYZE is for cleaning the statistics.

Means: when you have an index statistic values will be created. When you make lots of DML statements like update, insert, delete these values can get out of date. By using ANALYZE the statistics will be cleaned and you have proper statistics after that.

Neither I can reproduce your problem at the moment nor I can find abnormal source code.

But I will let check this from our InnoDB developers.

All the statistics I have reported to you were immediately after running analyze table.  That's part of the issue, that even after analyzing the table the cardinality is still way off on certain columns.

Just to let you know, this problem is not going away.  Today the table in question is 120 million rows, double the size when I first reported this problem.  Here's a query that InnoDB does a primary key scan and takes 20 minutes.  If I force it to use the correct index, it takes .1 seconds.

[root@db3] 21:50:33 2> explain 
    -> select distinct message_lookup_id from ci_sentiment_db.dw_hist_message_fact 
    -> where dw_date_dim_id=20080423 order by dw_hist_message_fact_id limit 200;
+----+-------------+----------------------+-------+---------------+---------+---------+------+-----------+------------------------------+
| id | select_type | table                | type  | possible_keys | key     | key_len | ref  | rows      | Extra                        |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-----------+------------------------------+
|  1 | SIMPLE      | dw_hist_message_fact | index | dw_date_idx   | PRIMARY | 8       | NULL | 133375624 | Using where; Using temporary | 
+----+-------------+----------------------+-------+---------------+---------+---------+------+-----------+------------------------------+
1 row in set (0.06 sec)

[root@db3] 21:50:36 3> explain 
    -> select distinct message_lookup_id from ci_sentiment_db.dw_hist_message_fact 
    -> use index (dw_date_idx) 
    -> where dw_date_dim_id=20080423 order by dw_hist_message_fact_id limit 200;
+----+-------------+----------------------+------+---------------+-------------+---------+-------+------+----------------------------------------------+
| id | select_type | table                | type | possible_keys | key         | key_len | ref   | rows | Extra                                        |
+----+-------------+----------------------+------+---------------+-------------+---------+-------+------+----------------------------------------------+
|  1 | SIMPLE      | dw_hist_message_fact | ref  | dw_date_idx   | dw_date_idx | 4       | const |  196 | Using where; Using temporary; Using filesort | 
+----+-------------+----------------------+------+---------------+-------------+---------+-------+------+----------------------------------------------+
1 row in set (0.15 sec)

The index in question:

+----------------------+------------+--------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table                | Non_unique | Key_name                 | Seq_in_index | Column_name             | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------------+------------+--------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
| dw_hist_message_fact |          0 | PRIMARY                  |            1 | dw_hist_message_fact_id | A         |   130170129 |     NULL | NULL   |      | BTREE      |         | 
| dw_hist_message_fact |          1 | dw_date_idx              |            1 | dw_date_dim_id          | A         |          17 |     NULL | NULL   |      | BTREE      |         | 
| dw_hist_message_fact |          1 | dw_date_idx              |            2 | dw_topic_dim_id         | A         |      110127 |     NULL | NULL   | YES  | BTREE      |         | 

Notice cardinality is 17 on the date, even though there are about 140 distinct dates.  Even so, I don't understand why it would choose to do a full primary key scan.

Godron,

this is a well known problem and a workaround will be available in the next InnoDB Plugin release (1.0.2) which will be released soon. Unfortunately 5.1 is frozen for such changes.

The cardinality of every index (the number of distinct values) is
calculated when the table is opened, at SHOW TABLE STATUS,
ANALYZE TABLE and on other circumstances (like when the table has
changed too much). Note that if the mysql client is running with the
auto-rehash setting turned on (default) this causes all tables to be
opened when it starts.

Previously InnoDB sampled 8 random pages from the index to get an
estimate of the cardinality. Now the number of sampled pages can be
changed via the global parameter innodb_stats_sample_pages which can
be tuned at runtime. The default value for this parameter is 8.

If the value of this parameter is changed, there may be serious problems:

- small values (say, 1) can cause an error in table stats;
- values much larger than 8 (say, 100, 1000), can cause a slowdown in
table opening time, SHOW TABLE status, etc.
- query plans may be different from the old ones.

So, large number of dives will lead to better stats but will cause some slow down (which could as well be insignificant), small number of dives will lead to worse stats, but there will be no slow down. It is a matter of experimenting to find the best number of dives for a particular workload.

One page is 16kb, so sampling N random pages means that the disk will do N seeks, reading 16kb at each.

Vasil,

That workaround will be extremely difficult to use because stats collection for InnoDB has to be fast as it is done each time a table is first opened.

When can we have optional persistent stats for InnoDB tables? This is something that will eventually be in a Google patch. By optional, I mean that the mysql schema has a stats table into which stats can be inserted for some tables and these override anything the storage engine would provide.

Could the root cause of this problem be related to the root cause of bug #43660 as well?

Change it to Documenting as the implementation is already in the innodb plugin (1.0.4 or earlier), hence fixed in 5.4.2.

Noted in 5.4.2 changelog.

For InnoDB tables, poor cardinality statistics could result in poor
query execution plans.