Bug #109334 Incorrect index cardinality when innodb_stats_method=nulls_ignored
Submitted: 12 Dec 2022 0:33 Modified: 13 Dec 2022 18:54
Reporter: Manuel Ung Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.31 OS:Any
Assigned to: CPU Architecture:Any

[12 Dec 2022 0:33] Manuel Ung
Description:
When innodb persistent stats are used, the calculated index cardinality stats are wrong when innodb_stats_method=nulls_ignored is used.

The problem is in innodb_rec_per_key where we try to calculate distinctness using index->stat_n_non_null_key_vals. In the persistent stats code path, stat_n_non_null_key_vals is never populated, meaning it is always 0. This means that any calculations will assume that there are 0 non-null key vals (ie. all keys are null). We even assume this for PRIMARY keys, where by definition, all keys must be null.

https://bugs.mysql.com/bug.php?id=73598 seems to be describing the same issue, though the reporter did not provide a repro/code analysis.

How to repeat:
create table t (i int, j int, k int, primary key(i, j),  key(k, j, i)) engine=innodb;
insert into t (WITH RECURSIVE a (i) AS ( SELECT 0  union all SELECT i+1  from a where i < 9  ), b (i) AS  (SELECT x.i + y.i * 10 + z.i * 100 + w.i * 1000 +  v.i * 10000 FROM a x, a y, a z, a w, a v) SELECT 1, b.i, null from b order by i);

insert into t (WITH RECURSIVE a (i) AS ( SELECT 0  union all SELECT i+1  from a where i < 9  ), b (i) AS  (SELECT x.i + y.i * 10 + z.i * 100 + w.i * 1000  FROM a x, a y, a z, a w) SELECT 2, b.i, b.i from b order by i);
set global innodb_stats_method='nulls_ignored';
analyze table t;
show index from t;
drop table t;

For me, this gives these results (the actual cardinality will vary due to differences in random sampling):
Table   Non_unique      Key_name        Seq_in_index    Column_name     Collation       Cardinality     Sub_part        Packed  Null    Index_type      Comment Index_comment   Visible Expression
t       0       PRIMARY 1       i       A       110635  NULL    NULL            BTREE                   YES     NULL
t       0       PRIMARY 2       j       A       110635  NULL    NULL            BTREE                   YES     NULL
t       1       k       1       k       A       110635  NULL    NULL    YES     BTREE                   YES     NULL
t       1       k       2       j       A       110635  NULL    NULL            BTREE                   YES     NULL
t       1       k       3       i       A       110635  NULL    NULL            BTREE                   YES     NULL

You can see that the PRIMARY key has max cardinality on the first column already, which is wrong for this table. There are only two distinct 'i' values (1 and 2).

Suggested fix:
Populate stat_n_non_null_key_vals in the persistent paths as well.
[12 Dec 2022 0:33] Manuel Ung
As a side note, https://bugs.mysql.com/bug.php?id=95507 describes a bug where innodb_stats_method=nulls_unequal doesn't work, because it behaves exactly the same as nulls_equal. This can be easily verified by reading the code.

This means that of all 3 settings (nulls_equal, nulls_unequal, nulls_ignored), there's only one setting (the default nulls_equal) that isn't broken for persistent stats.
[12 Dec 2022 8:35] MySQL Verification Team
Hello Manuel Ung,

Thank you for the report and feedback.

regards,
Umesh
[13 Dec 2022 9:02] huahua xu
Hi Manuel Ung:

For the table and index statistics collection in InnoDB, you need to known that the approximate number of non-null key values will not calculated, If persistent statistics is enabled.

1. set innodb_stats_persistent = on.

mysql> set global innodb_stats_persistent = on;
mysql>  analyze table t;
mysql>  show index from t;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| t     |          0 | PRIMARY  |            1 | i           | A         |      110157 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| t     |          0 | PRIMARY  |            2 | j           | A         |      110157 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| t     |          1 | k        |            1 | k           | A         |      110157 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| t     |          1 | k        |            2 | j           | A         |      110157 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| t     |          1 | k        |            3 | i           | A         |      110157 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

1. set innodb_stats_persistent = off.

mysql> set global innodb_stats_persistent = off;
mysql> analyze table t;
mysql> show index from t;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| t     |          0 | PRIMARY  |            1 | i           | A         |           1 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| t     |          0 | PRIMARY  |            2 | j           | A         |      112262 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| t     |          1 | k        |            1 | k           | A         |      100148 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | NULL       |
| t     |          1 | k        |            2 | j           | A         |      100148 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| t     |          1 | k        |            3 | i           | A         |      101171 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
[13 Dec 2022 18:54] Manuel Ung
Yes, I understand that non-null key values are not calculated for persistent stats. That is the bug I am reporting.