Bug #103210 InnoDB index cardinality stats are racy
Submitted: 5 Apr 2021 21:07 Modified: 7 Apr 2021 5:24
Reporter: Manuel Ung Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.23 OS:Any
Assigned to: CPU Architecture:Any

[5 Apr 2021 21:07] Manuel Ung
Description:
When updating statistics, InnoDB tends to call dict_stats_empty_index to zero out old stats, and then fills in new statistics, and this is done under an exclusive lock.

However, there are a few places where we read from stats without any locking, resulting bad statistics returned. This can cause spurious bad plans.

How to repeat:
This test case shows lack of lacking in the read path in `innobase_get_index_column_cardinality` (since I use "show indexes"), but via code inspection, you can also see that this is true in `ha_innobase::open` when we call info with HA_STATUS_NO_LOCK.

diff --git a/mysql-test/suite/innodb/t/innodb_stats_race.test b/mysql-test/suite/innodb/t/innodb_stats_race.test
new file mode 100644
index 00000000000..cf1d0ed9f57
--- /dev/null
+++ b/mysql-test/suite/innodb/t/innodb_stats_race.test
@@ -0,0 +1,52 @@
+--source include/have_debug_sync.inc
+--source include/count_sessions.inc
+
+CREATE TABLE t (a int,
+                b int,
+                c int,
+                d int,
+                e int,
+                PRIMARY KEY(a),
+                KEY abc (b, c, d)) ENGINE=InnoDB;
+
+insert into t values (1, 1, 1, 1, 1), (2, 2, 2, 2, 2), (3, 3, 3, 3, 3), (4, 4, 4, 4, 4);
+insert into t select a+4, b+4, c, d, e from t;
+insert into t select a+8, b+8, c, d, e from t;
+insert into t select a+16, b+16, c, d, e from t;
+insert into t select a+32, b+32, c, d, e from t;
+insert into t select a+64, b+64, c, d, e from t;
+insert into t select a+128, b+128, c, d, e from t;
+insert into t select a+256, b+256, c, d, e from t;
+insert into t select a+512, b+512, c, d, e from t;
+insert into t select a+1024, b+1024, c, d, e from t;
+
+analyze table t;
+flush tables;
+
+show indexes in t;
+
+set information_schema_stats_expiry = 0;
+
+set debug_sync = "after_empty_index SIGNAL parked WAIT_FOR go";
+send analyze table t;
+
+connect (con1,localhost,root,,);
+set information_schema_stats_expiry = 0;
+set debug_sync='now WAIT_FOR parked';
+show indexes in t;
+
+connect (con2,localhost,root,,);
+set debug_sync='now SIGNAL go';
+disconnect con2;
+
+disconnect con1;
+
+connection default;
+--disable_result_log
+reap;
+--enable_result_log
+
+drop table t;
+set debug_sync = 'RESET';
+
+--source include/wait_until_count_sessions.inc
diff --git a/storage/innobase/dict/dict0stats.cc b/storage/innobase/dict/dict0stats.cc
index 0b75d0c8a73..0944990f86e 100644
--- a/storage/innobase/dict/dict0stats.cc
+++ b/storage/innobase/dict/dict0stats.cc
@@ -25,6 +25,8 @@ this program; if not, write to the Free Software Foundation, Inc.,
 *****************************************************************************/

 #include "my_dbug.h"
+#include "current_thd.h"
+#include "debug_sync.h"

 /** @file dict/dict0stats.cc
  Code used for calculating and manipulating table statistics.
@@ -2051,6 +2053,11 @@ static dberr_t dict_stats_update_persistent(
     }

     dict_stats_empty_index(index);
+#ifndef DBUG_OFF
+    if (current_thd && strcmp(index->name, "abc") == 0) {
+      DEBUG_SYNC(current_thd, "after_empty_index");
+    }
+#endif

     if (dict_stats_should_ignore_index(index)) {
       continue;

Suggested fix:
Either add locking, or update stats without zero'ing array.
[5 Apr 2021 22:22] Manuel Ung
I just noticed that https://bugs.mysql.com/bug.php?id=82968 is pretty much the same issue, but it's closed?
[6 Apr 2021 6:24] Øystein Grøvlen
Is this the same as Bug#98132?
[6 Apr 2021 10:09] MySQL Verification Team
Thank you, Øystein.

Hello Manuel Ung,

Thank you for the report and feedback.
As Øystein pointed, this seems to be the same as reported in Bug #98132.
If you have no objections then I'll mark this as a duplicate of Bug #98132. Please confirm!

regards,
Umesh
[6 Apr 2021 19:45] Manuel Ung
I don't think the problem is exactly the same and I don't think the suggested fix there will fix this issue. In #98132, we seem to persisting the incorrect stats because we end up analyzing a table that is still being rebuilt.

In this case, we don't persist the incorrect stats and there's no DDL, and it's just a race condition where we read out bad stats temporarily while an analyze is happening, but the stats will be correct after the analyze finishes.
[6 Apr 2021 20:06] Øystein Grøvlen
I agree with Manuel that this is not quite the same problem as Bug#98132.
[6 Apr 2021 20:10] Øystein Grøvlen
A possible solution is to let dict_stats_analyze_index() use a private copy for the statistics gathering, and replace the old statistics when the the new statistics have been computed.
[7 Apr 2021 5:24] MySQL Verification Team
Thank you, Manual and Øystein.

regards,
Umesh
[11 May 2024 3:14] WANG GUANGYOU
hi Manuel,
   I add this in the test case you provide

set debug_sync = "after_empty_index SIGNAL parked WAIT_FOR go";
send analyze table t;

connect (con1,localhost,root,,);
set information_schema_stats_expiry = 0;
set debug_sync='now WAIT_FOR parked';
show indexes in t;
+set session eq_range_index_dive_limit=0;
+explain select b,c,d  from t where b = 1 and c=1 and d = 1 ; 

although ths index statistics looks not good. but the SQL choose the right plan. So the race condition really hurt performance?

set session eq_range_index_dive_limit=0;
explain select b,c,d  from t where b = 1 and c=1 and d = 1 ;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t	NULL	ref	abc	abc	15	const,const,const	1	100.00	Using index
Warnings:
Note	1003	/* select#1 */ select `test`.`t`.`b` AS `b`,`test`.`t`.`c` AS `c`,`test`.`t`.`d` AS `d` from `test`.`t` where ((`test`.`t`.`d` = 1) and (`test`.`t`.`c` = 1) and (`test`.`t`.`b` = 1))
[11 May 2024 3:30] WANG GUANGYOU
btw, I test on 8.0.36 percona version