MySQL Bugs: #117652: Assertion failure: NDB_SHARE::~NDB_SHARE(): Assertion `index_stat

Bug #117652	Assertion failure: NDB_SHARE::~NDB_SHARE(): Assertion `index_stat_list == nullptr' failed
Submitted:	9 Mar 20:23	Modified:	12 Mar 3:28
Reporter:	Congyu Liu (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	8.4.2	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi! This is a follow-up of my last bug report.

Here's the analysis. This bug is triggered when there are two API nodes, and requires an unfortunate interleaving. Let's say there is a table x.

1. API node 1 receives an SQL update command on table x and processes it.
2. API node 2 updates the schema on x.
3. When it is still processing the update command, API node 1 receives an schema event from the data node. After receiving the event, mark_share_dropped_and_release() is called to mark the ndb_share as NSS_DROPPED. However, the SQL processing thread still holds this share.
4. API node 1 adds a new index_stat to this dropped share's index_stat_list when `ndb_index_stat_query` is invoked, which makes it non-NULL.
5. API node 1 closes the table after completing the processing, `intern_close_table()` is called and eventually calls `real_free_share()` on the stale ndb_share, but now the index_stat_list is not NULL, and thus triggers the assertion failure.

Log snippet from API node 1 (debugging enabled):

```
......

# Step 1: T12 receives an SQL update command of tnkugncy and starts processing
T@12: THD::decide_logging_format: info: query: SELECT tnkugncy.ccdqe,tnkugncy.kphwizb,tnkugncy.tekjbsu,tnkugncy.gyhxn,tnkugncy.iexga,tnkugncy.id,tnkugncy.pjcob,tnkugncy.uuid,tnkugncy.tloaa,tnkugncy.cvlegrbb,tnkugncy.fwmohx,tnkugncy.artxual FROM tnkugncy WHERE tnkugncy.gyhxn<_utf8mb4"UTJGPTLOAATUEDCVLEGRBBRVVNPJCOBCOR" ORDER BY uuid DESC,tloaa,cvlegrbb DESC,fwmohx,artxual,pjcob DESC,tekjbsu DESC,gyhxn,iexga,ccdqe DESC for update

......

# Step 3: T51 receives an schema event of tnkugncy from the data node and marks the NDB_SHARE as dropped
T@51: NdbEventBuffer::alloc_mem: info: ptr sz 5 + 5 + 0 0x80030006 schema change monitoring
dropped_share: NDB_SHARE {
  db: 'dstestdb',
  table_name: 'tnkugncy',
  key: './dstestdb/tnkugncy',
  use_count: 2,
  state: NSS_DROPPED,
  op: 0,
  handlers: 1 [ '0x7ffeb41ad8f0' ],
  strings: 1 [ 'offline_alter_table_commit' ],
}

......

# Step 4: T12 updates the index_stat_list of the NDB_SHARE, which now becomes non-NULL
T@12: ha_ndbcluster::ndb_index_stat_query: index_stat: index: 0 name: PRIMARY

......

# Step 5: T12 closes the NDB_SHARE using real_free_share()
T@12: closefrm: enter: table: 0x7ffeb41ac6e0
mysqld: /data/congyu/ds-fuzzer-targets/mysql-server/storage/ndb/plugin/ndb_share.cc:115: NDB_SHARE::~NDB_SHARE(): Assertion `index_stat_list == nullptr' failed.

......
```

How to repeat:
Here's my cluster setup: 1 management node, 2 (replica) data nodes and 2 API nodes.

The key to trigger this bug is to let the index_stat_list updating happens after dropping the NDB_SHARE. This can be acheieved with the following patch which injects a sleep:

```
diff --git a/storage/ndb/plugin/ha_ndb_index_stat.cc b/storage/ndb/plugin/ha_ndb_index_stat.cc
index ee2ed91368f..6fca38beb21 100644
--- a/storage/ndb/plugin/ha_ndb_index_stat.cc
+++ b/storage/ndb/plugin/ha_ndb_index_stat.cc
@@ -2421,6 +2421,13 @@ int ha_ndbcluster::ndb_index_stat_query(uint inx, const key_range *min_key,
   const KEY *key_info = table->key_info + inx;
   const NDB_INDEX_DATA &data = m_index[inx];
   const NDBINDEX *index = data.index;
+
+  DBUG_EXECUTE_IF("my_slow_down", {
+    DBUG_PRINT("info", ("Sleeping for 1s"));
+    my_sleep(time_t(1000000));  // sleep 1s
+    DBUG_SET("-d,my_slow_down");
+  });
+
   DBUG_PRINT("index_stat", ("index: %u name: %s", inx, index->getName()));
 
   int err = 0;
```

Here's the input (a shell script). Shorter sql commands might still be able to trigger this bug, though I haven't tried.

```
# To API node 1
mysql --socket="/tmp/mysql1.sock" -u "root" --password="" -e "CREATE DATABASE dstestdb; USE dstestdb; CREATE TABLE tnkugncy (kphwizb varchar(1349) NOT NULL,artxual int(19) NULL,iexga int(18) NOT NULL,id int(11) NOT NULL AUTO_INCREMENT,tekjbsu varchar(289) DEFAULT _utf8mb4\"ISSXDELEEPGTCIPJEARPRBUBQYCNY\",tloaa text NULL,cvlegrbb text,fwmohx int(18) NULL,gyhxn varchar(2001) NOT NULL DEFAULT _utf8mb4\"KWZZQRHBHVTFQXWFXEIWJKGVREQKBLVHNKJDGHEWYMPWEWJLTQCFVECHZQGTNGZPAFFXPSCHQOQEGKSWNMHWFMA\",uuid varchar(253),ccdqe text NOT NULL,pjcob datetime DEFAULT '2025-05-31 00:55:40',PRIMARY KEY(id)) ENGINE = NDB DEFAULT CHARACTER SET = utf8mb4 DEFAULT COLLATE = UTF8MB4_BIN;"
# To API node 1
mysql --socket="/tmp/mysql1.sock" -u "root" --password="" -e "SET GLOBAL debug = 'd,my_slow_down,index_stat,info,enter,exit,warning:i'; SELECT @@debug; USE dstestdb; SELECT tnkugncy.ccdqe,tnkugncy.kphwizb,tnkugncy.tekjbsu,tnkugncy.gyhxn,tnkugncy.iexga,tnkugncy.id,tnkugncy.pjcob,tnkugncy.uuid,tnkugncy.tloaa,tnkugncy.cvlegrbb,tnkugncy.fwmohx,tnkugncy.artxual FROM tnkugncy WHERE tnkugncy.gyhxn<_utf8mb4\"UTJGPTLOAATUEDCVLEGRBBRVVNPJCOBCOR\" ORDER BY uuid DESC,tloaa,cvlegrbb DESC,fwmohx,artxual,pjcob DESC,tekjbsu DESC,gyhxn,iexga,ccdqe DESC for update;" &
# To API node 2
mysql --socket="/tmp/mysql2.sock" -u "root" --password="" -e "USE dstestdb; ALTER TABLE tnkugncy ADD COLUMN (fqjstz varchar(771) NOT NULL DEFAULT _utf8mb4\"BUPCLGVBHKSSOLMVKPRAZHPXEOGAETTL\");" &
wait
```

Suggested fix:
A straightforward fix would be calling `ndb_index_stat_free()` in `real_free_share()` when index_stat_list is not NULL. But I am not sure if this can address other potential issues (if any) caused by this invariant violation.

Hi,

What is the bug# you are referring to?

Have you tried with 8.4.4?

Hi,

> What is the bug# you are referring to?

My previous ticket 117621 is for the same bug.

> Have you tried with 8.4.4?

Just tried with 8.4.4 using the same patch. Still reproduced.