Description:
Hi! This is a follow-up of my last bug report.
Here's the analysis. This bug is triggered when there are two API nodes, and requires an unfortunate interleaving. Let's say there is a table x.
1. API node 1 receives an SQL update command on table x and processes it.
2. API node 2 updates the schema on x.
3. When it is still processing the update command, API node 1 receives an schema event from the data node. After receiving the event, mark_share_dropped_and_release() is called to mark the ndb_share as NSS_DROPPED. However, the SQL processing thread still holds this share.
4. API node 1 adds a new index_stat to this dropped share's index_stat_list when `ndb_index_stat_query` is invoked, which makes it non-NULL.
5. API node 1 closes the table after completing the processing, `intern_close_table()` is called and eventually calls `real_free_share()` on the stale ndb_share, but now the index_stat_list is not NULL, and thus triggers the assertion failure.
Log snippet from API node 1 (debugging enabled):
```
......
# Step 1: T12 receives an SQL update command of tnkugncy and starts processing
T@12: THD::decide_logging_format: info: query: SELECT tnkugncy.ccdqe,tnkugncy.kphwizb,tnkugncy.tekjbsu,tnkugncy.gyhxn,tnkugncy.iexga,tnkugncy.id,tnkugncy.pjcob,tnkugncy.uuid,tnkugncy.tloaa,tnkugncy.cvlegrbb,tnkugncy.fwmohx,tnkugncy.artxual FROM tnkugncy WHERE tnkugncy.gyhxn<_utf8mb4"UTJGPTLOAATUEDCVLEGRBBRVVNPJCOBCOR" ORDER BY uuid DESC,tloaa,cvlegrbb DESC,fwmohx,artxual,pjcob DESC,tekjbsu DESC,gyhxn,iexga,ccdqe DESC for update
......
# Step 3: T51 receives an schema event of tnkugncy from the data node and marks the NDB_SHARE as dropped
T@51: NdbEventBuffer::alloc_mem: info: ptr sz 5 + 5 + 0 0x80030006 schema change monitoring
dropped_share: NDB_SHARE {
db: 'dstestdb',
table_name: 'tnkugncy',
key: './dstestdb/tnkugncy',
use_count: 2,
state: NSS_DROPPED,
op: 0,
handlers: 1 [ '0x7ffeb41ad8f0' ],
strings: 1 [ 'offline_alter_table_commit' ],
}
......
# Step 4: T12 updates the index_stat_list of the NDB_SHARE, which now becomes non-NULL
T@12: ha_ndbcluster::ndb_index_stat_query: index_stat: index: 0 name: PRIMARY
......
# Step 5: T12 closes the NDB_SHARE using real_free_share()
T@12: closefrm: enter: table: 0x7ffeb41ac6e0
mysqld: /data/congyu/ds-fuzzer-targets/mysql-server/storage/ndb/plugin/ndb_share.cc:115: NDB_SHARE::~NDB_SHARE(): Assertion `index_stat_list == nullptr' failed.
......
```
How to repeat:
Here's my cluster setup: 1 management node, 2 (replica) data nodes and 2 API nodes.
The key to trigger this bug is to let the index_stat_list updating happens after dropping the NDB_SHARE. This can be acheieved with the following patch which injects a sleep:
```
diff --git a/storage/ndb/plugin/ha_ndb_index_stat.cc b/storage/ndb/plugin/ha_ndb_index_stat.cc
index ee2ed91368f..6fca38beb21 100644
--- a/storage/ndb/plugin/ha_ndb_index_stat.cc
+++ b/storage/ndb/plugin/ha_ndb_index_stat.cc
@@ -2421,6 +2421,13 @@ int ha_ndbcluster::ndb_index_stat_query(uint inx, const key_range *min_key,
const KEY *key_info = table->key_info + inx;
const NDB_INDEX_DATA &data = m_index[inx];
const NDBINDEX *index = data.index;
+
+ DBUG_EXECUTE_IF("my_slow_down", {
+ DBUG_PRINT("info", ("Sleeping for 1s"));
+ my_sleep(time_t(1000000)); // sleep 1s
+ DBUG_SET("-d,my_slow_down");
+ });
+
DBUG_PRINT("index_stat", ("index: %u name: %s", inx, index->getName()));
int err = 0;
```
Here's the input (a shell script). Shorter sql commands might still be able to trigger this bug, though I haven't tried.
```
# To API node 1
mysql --socket="/tmp/mysql1.sock" -u "root" --password="" -e "CREATE DATABASE dstestdb; USE dstestdb; CREATE TABLE tnkugncy (kphwizb varchar(1349) NOT NULL,artxual int(19) NULL,iexga int(18) NOT NULL,id int(11) NOT NULL AUTO_INCREMENT,tekjbsu varchar(289) DEFAULT _utf8mb4\"ISSXDELEEPGTCIPJEARPRBUBQYCNY\",tloaa text NULL,cvlegrbb text,fwmohx int(18) NULL,gyhxn varchar(2001) NOT NULL DEFAULT _utf8mb4\"KWZZQRHBHVTFQXWFXEIWJKGVREQKBLVHNKJDGHEWYMPWEWJLTQCFVECHZQGTNGZPAFFXPSCHQOQEGKSWNMHWFMA\",uuid varchar(253),ccdqe text NOT NULL,pjcob datetime DEFAULT '2025-05-31 00:55:40',PRIMARY KEY(id)) ENGINE = NDB DEFAULT CHARACTER SET = utf8mb4 DEFAULT COLLATE = UTF8MB4_BIN;"
# To API node 1
mysql --socket="/tmp/mysql1.sock" -u "root" --password="" -e "SET GLOBAL debug = 'd,my_slow_down,index_stat,info,enter,exit,warning:i'; SELECT @@debug; USE dstestdb; SELECT tnkugncy.ccdqe,tnkugncy.kphwizb,tnkugncy.tekjbsu,tnkugncy.gyhxn,tnkugncy.iexga,tnkugncy.id,tnkugncy.pjcob,tnkugncy.uuid,tnkugncy.tloaa,tnkugncy.cvlegrbb,tnkugncy.fwmohx,tnkugncy.artxual FROM tnkugncy WHERE tnkugncy.gyhxn<_utf8mb4\"UTJGPTLOAATUEDCVLEGRBBRVVNPJCOBCOR\" ORDER BY uuid DESC,tloaa,cvlegrbb DESC,fwmohx,artxual,pjcob DESC,tekjbsu DESC,gyhxn,iexga,ccdqe DESC for update;" &
# To API node 2
mysql --socket="/tmp/mysql2.sock" -u "root" --password="" -e "USE dstestdb; ALTER TABLE tnkugncy ADD COLUMN (fqjstz varchar(771) NOT NULL DEFAULT _utf8mb4\"BUPCLGVBHKSSOLMVKPRAZHPXEOGAETTL\");" &
wait
```
Suggested fix:
A straightforward fix would be calling `ndb_index_stat_free()` in `real_free_share()` when index_stat_list is not NULL. But I am not sure if this can address other potential issues (if any) caused by this invariant violation.