Bug #34488 Cardinality stats wrong after first analyze against MyISAM table
Submitted: 12 Feb 2008 12:50 Modified: 31 Oct 2008 19:24
Reporter: Mark Leith Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S3 (Non-critical)
Version:5.0.54 OS:Any
Assigned to: V Venkateswaran CPU Architecture:Any

[12 Feb 2008 12:50] Mark Leith
Description:
After the first ANALYZE TABLE command against a MyISAM table, the index cardinality stats for secondary indexes are updated incorrectly for DML against the table. 

An example:

mysql> analyze table t1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status   | OK       | 
+---------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> show indexes from t1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| t1    |          0 | PRIMARY  |            1 | i           | A         |           5 |     NULL | NULL   |      | BTREE      |         | 
| t1    |          1 | j        |            1 | j           | A         |           1 |     NULL | NULL   | YES  | BTREE      |         | 
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)

mysql> insert into t1 (j) values (1), (1), (1), (1), (1);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> insert into t1 (j) values (1), (1), (1), (1), (1);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> insert into t1 (j) values (1), (1), (1), (1), (1);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> insert into t1 (j) values (1), (1), (1), (1), (1);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> show indexes from t1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| t1    |          0 | PRIMARY  |            1 | i           | A         |          25 |     NULL | NULL   |      | BTREE      |         | 
| t1    |          1 | j        |            1 | j           | A         |           5 |     NULL | NULL   | YES  | BTREE      |         | 
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)

The cardinality on the j index should remain as 1, as the only value that is inserted in to that column is 1. It seems that the cardinality is simply increased by 1 for each insert statement - even on multi-row inserts. 

After the second ANALYZE TABLE stats are kept up to date properly.

How to repeat:
drop table if exists t1;
create table t1 (i int auto_increment, j int, primary key (i), key (j)) engine = myisam;
insert into t1 (j) values (1), (1), (1), (1), (1);
show table status\G
show indexes from t1;
analyze table t1;
show indexes from t1;
insert into t1 (j) values (1), (1), (1), (1), (1);
insert into t1 (j) values (1), (1), (1), (1), (1);
insert into t1 (j) values (1), (1), (1), (1), (1);
insert into t1 (j) values (1), (1), (1), (1), (1);
show indexes from t1;
insert into t1 (j) values (1), (1), (1), (1), (1);
show indexes from t1;
analyze table t1;
show indexes from t1;
[31 Oct 2008 19:24] V Venkateswaran
Statistics is documented to be updated on ANALYZE. Between ANALYZE calls the engine doesn't have to keep statistics up to date. Not a bug.
[31 Oct 2008 19:46] Valeriy Kravchuk
"Between ANALYZE calls the engine doesn't have to keep statistics up to date."

I agree with the above. Statistics may (and should!) remain the same as it was immediately after previous ANALYZE. Why it changes for MyISAM table, becoming totally incorrect? It should just not change at all. So, I still think this is a bug.
[31 Oct 2008 19:50] Mark Callaghan
I think this is still a bug. If index stats are allowed to become bogus, then analyze needs to be run all of the time.

The alternative is to update the docs to describe the current behavior.
[31 Oct 2008 20:35] Sergei Golubchik
Valeriy:

There's some piece of statistics that MyISAM updates only on ANALYZE. And another one - number of records in the table - that it keeps always up-to-date.

Cardinality depends on both, that's why you see it changing.

Mark:

No. What MyISAM remembers value distribution, and it is far more stable than cardinality, if the table is reasonably big, one doesn't need to run ANALYZE often. Once a year is enough, perhaps.

Of course, if the table is small - 5 rows in the test case - and you add 4 times as much rows with completely different value distribution, you have to re-run ANALYZE.