Description:
The combination of innodb_stats_persistent=0 and innodb_stats_method='nulls_ignored' gets wrong cardinalities computed for the table (which appear to be computed as if they are all different, much like a PK would show).
As a consequence, the optimizer will in some cases choose sub-optimal plans for some queries.
Tested with the following version:
mysql [localhost:23900] {msandbox} ((none)) > SELECT @@version, @@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment |
+-----------+------------------------------+
| 8.0.19 | MySQL Community Server - GPL |
+-----------+------------------------------+
1 row in set (0.00 sec)
This may be related to https://bugs.mysql.com/bug.php?id=95507 (although it's not entirely the same behavior described), and seems to be exactly like https://bugs.mysql.com/bug.php?id=73598. I'm choosing to open a new one to present a clear reproducible test case in this format, instead of writing it as comment. Feel free to close as duplicate and reopen the other, if you think it's a better fit.
Note that in the following test case there will be no rows with NULL on `g`. However, the behavior is the same when NULL rows are present (I just omitted the outputs for brevity, but can include them if needed).
Thanks in advance!
How to repeat:
-- Create and populate the table:
USE test;
CREATE TABLE `t1` (
`i` int(11) NOT NULL AUTO_INCREMENT,
`s` varchar(64) DEFAULT NULL,
`t` time NOT NULL,
`g` int(11) NOT NULL,
PRIMARY KEY (`i`),
KEY key_g (`g`)
) ENGINE=InnoDB;
INSERT INTO t1 VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )));
INSERT INTO t1 SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM t1;
[... repeat the same query as necessary, knowing it will duplicate the amount of rows each time ...]
INSERT INTO t1 SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM t1;
-- Check for NULL values and cardinalities (for completeness):
mysql [localhost:23900] {msandbox} (test) > SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
| 2097152 |
+----------+
1 row in set (0.09 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT COUNT(*) FROM t1 WHERE ISNULL(i);
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT COUNT(*) FROM t1 WHERE ISNULL(g);
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT COUNT(DISTINCT g) FROM t1;
+-------------------+
| COUNT(DISTINCT g) |
+-------------------+
| 60 |
+-------------------+
1 row in set (0.33 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT COUNT(DISTINCT i) FROM t1;
+-------------------+
| COUNT(DISTINCT i) |
+-------------------+
| 2097152 |
+-------------------+
1 row in set (1.35 sec)
-- Check initial stats (note that innodb_stats_method is still at its default value)
mysql [localhost:23899] {root} (test) > SELECT @@GLOBAL.innodb_stats_persistent, @@GLOBAL.innodb_stats_method;
+----------------------------------+------------------------------+
| @@GLOBAL.innodb_stats_persistent | @@GLOBAL.innodb_stats_method |
+----------------------------------+------------------------------+
| 1 | nulls_equal |
+----------------------------------+------------------------------+
1 row in set (0.00 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT * FROM information_schema.statistics WHERE table_name='t1'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 1
INDEX_SCHEMA: test
INDEX_NAME: key_g
SEQ_IN_INDEX: 1
COLUMN_NAME: g
COLLATION: A
CARDINALITY: 57
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 0
INDEX_SCHEMA: test
INDEX_NAME: PRIMARY
SEQ_IN_INDEX: 1
COLUMN_NAME: i
COLLATION: A
CARDINALITY: 2088128
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
2 rows in set (0.01 sec)
-- Analyze table and check stats again
mysql [localhost:23900] {msandbox} (test) > ANALYZE TABLE t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+
1 row in set (0.02 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT * FROM information_schema.statistics WHERE table_name='t1'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 1
INDEX_SCHEMA: test
INDEX_NAME: key_g
SEQ_IN_INDEX: 1
COLUMN_NAME: g
COLLATION: A
CARDINALITY: 57
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 0
INDEX_SCHEMA: test
INDEX_NAME: PRIMARY
SEQ_IN_INDEX: 1
COLUMN_NAME: i
COLLATION: A
CARDINALITY: 2088128
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
2 rows in set (0.00 sec)
-- So far, everything is OK. Now we use nulls_ignored:
mysql [localhost:23900] {msandbox} (test) > SET GLOBAL innodb_stats_method = 'nulls_ignored';
Query OK, 0 rows affected (0.00 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT @@GLOBAL.innodb_stats_persistent, @@GLOBAL.innodb_stats_method;
+----------------------------------+------------------------------+
| @@GLOBAL.innodb_stats_persistent | @@GLOBAL.innodb_stats_method |
+----------------------------------+------------------------------+
| 1 | nulls_ignored |
+----------------------------------+------------------------------+
1 row in set (0.00 sec)
-- And analyze and check stats again:
mysql [localhost:23900] {msandbox} (test) > ANALYZE TABLE t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+
1 row in set (0.04 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT * FROM information_schema.statistics WHERE table_name='t1'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 1
INDEX_SCHEMA: test
INDEX_NAME: key_g
SEQ_IN_INDEX: 1
COLUMN_NAME: g
COLLATION: A
CARDINALITY: 2088128
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 0
INDEX_SCHEMA: test
INDEX_NAME: PRIMARY
SEQ_IN_INDEX: 1
COLUMN_NAME: i
COLLATION: A
CARDINALITY: 2088128
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
2 rows in set (0.00 sec)
-- See how CARDINALITY just went up on key_g index above (showing the same as PK)
-- Disable innodb_stats_persistent:
mysql [localhost:23900] {msandbox} (test) > SET GLOBAL innodb_stats_persistent = 0;
Query OK, 0 rows affected (0.00 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT @@GLOBAL.innodb_stats_persistent, @@GLOBAL.innodb_stats_method;
+----------------------------------+------------------------------+
| @@GLOBAL.innodb_stats_persistent | @@GLOBAL.innodb_stats_method |
+----------------------------------+------------------------------+
| 0 | nulls_ignored |
+----------------------------------+------------------------------+
1 row in set (0.00 sec)
-- And analyze and check stats again:
mysql [localhost:23900] {msandbox} (test) > ANALYZE TABLE t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+
1 row in set (0.01 sec)
mysql [localhost:23900] {msandbox} (test) > SELECT * FROM information_schema.statistics WHERE table_name='t1'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 1
INDEX_SCHEMA: test
INDEX_NAME: key_g
SEQ_IN_INDEX: 1
COLUMN_NAME: g
COLLATION: A
CARDINALITY: 236
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: test
TABLE_NAME: t1
NON_UNIQUE: 0
INDEX_SCHEMA: test
INDEX_NAME: PRIMARY
SEQ_IN_INDEX: 1
COLUMN_NAME: i
COLLATION: A
CARDINALITY: 2097459
SUB_PART: NULL
PACKED: NULL
NULLABLE:
INDEX_TYPE: BTREE
COMMENT:
INDEX_COMMENT:
IS_VISIBLE: YES
EXPRESSION: NULL
2 rows in set (0.01 sec)
-- It's back to correct output for key_g index. It's not exact, but that is to be expected from the random dive, I guess (at least they are about the same order of magnitude, unlike previous outputs where it was 2M)
-- However, note that if we issue subsequent ANALYZE TABLE commands, the values may fluctuate, and may even appear as 2M again (in my tests it seemed to fluctuate between 9, 236 and 2088128 so I'm not sure it will avoid issues in 100% of the cases)
Suggested fix:
Make sure the cardinalities are computed the same as if innodb_stats_persistent were disabled.
I'm not exactly sure where issues may be, since it is also kind of reproducible with persistent stats disabled, as mentioned above (although it seems to have a more random behavior).
I searched the code, and could pin-point the following:
https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/dict/dict0stats.cc#L2715
I executed GDB and tried to step into the code around these, but couldn't find much, unfortunately. Let me know if there is anything else I can do here.