Bug #116760 Page deadlock occured when searching BTREE after nullable cols dropped by INSTAN
Submitted: 23 Nov 2024 13:44 Modified: 25 Nov 2024 7:23
Reporter: Ruyi Zhang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.40,8.4.3,9.1.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution, instant ddl

[23 Nov 2024 13:44] Ruyi Zhang
Description:
Our team(Bytedance NDB) noticed that after many nullable columns are dropped on a large table through ALGORITHM=INSTANT, page deadlocks frequently occur when searching for BTREE in the table. The error is:

```
########################################
DEADLOCK of threads detected!
--Thread 140447652452096 has waited at row0sel.cc line 4900 for 3 seconds the semaphore:
S-lock on RW-latch at 0x7fbcb9858fa0 created in file buf0buf.cc line 795
a writer (thread id 140448251184896) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: ffffffffdfffffff
Last time read locked in file row0row.cc line 846
Last time write locked in file storage/innobase/btr/btr0cur.cc line 1615
Locked: thread 140448251184896 file btr0cur.cc line 1615  X-LOCK
--Thread 140448251184896 has waited at btr0cur.cc line 1615 for 0 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x7fbcb97e1358 created in file buf0buf.cc line 795
a writer (thread id 140448251184896) has reserved it in mode wait exclusive
number of readers 1 (thread id 140447652452096), waiters flag 1, lock_word: ffffffffefffffff
Last time read locked in file row0sel.cc line 4900
Locked: thread 140447652452096 file row0sel.cc line 4900  S-LOCK
2024-11-23T12:47:41.708502Z 9 [ERROR] [MY-012982] [InnoDB] [FATAL] ######################################## Deadlock Detected!
2024-11-23T12:47:41.708545Z 9 [ERROR] [MY-013183] [InnoDB] Assertion failure: sync0arr.cc:678:ib::fatal triggered thread 140448251184896
InnoDB: We intentionally generate a memory trap.
```

We wrote the following MTR to reproduce this issue.

How to repeat:
diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
index 66def59c08f..65fee7e9107 100644
--- a/storage/innobase/btr/btr0cur.cc
+++ b/storage/innobase/btr/btr0cur.cc
@@ -4650,6 +4650,8 @@ bool btr_cur_pessimistic_delete(dberr_t *err, bool has_reserved_extents,
   ut_a(!page_zip || page_zip_validate(page_zip, page, index));
 #endif /* UNIV_ZIP_DEBUG */

+  DEBUG_SYNC_C("btr_cur_before_pessimistic_delete");
+
   offsets = rec_get_offsets(rec, index, nullptr, ULINT_UNDEFINED,
                             UT_LOCATION_HERE, &heap);

```
--source include/have_debug.inc
--source include/have_debug_sync.inc

--connect(con1, localhost, root,,)
--connect(con2, localhost, root,,)
--connection default
create table t(id varbinary(2000) primary key, c1 varbinary(2000), c2 binary(1),c3 binary(1),c4 binary(1),c5 binary(1),c6 binary(1),c7 binary(1),c8 binary(1),c9 binary(1),c10 binary(1),c11 binary(1),c12 binary(1),c13 binary(1),c14 binary(1),c15 binary(1),c16 binary(1),c17 binary(1),c18 binary(1),c19 binary(1),c20 binary(1),c21 binary(1),c22 binary(1),c23 binary(1),c24 binary(1),c25 binary(1),c26 binary(1),c27 binary(1),c28 binary(1),c29 binary(1),c30 binary(1),c31 binary(1),c32 binary(1),c33 binary(1),c34 binary(1),c35 binary(1),c36 binary(1),c37 binary(1),c38 binary(1),c39 binary(1),c40 binary(1),c41 binary(1),c42 binary(1),c43 binary(1),c44 binary(1),c45 binary(1),c46 binary(1),c47 binary(1),c48 binary(1),c49 binary(1),c50 binary(1),c51 binary(1),c52 binary(1),c53 binary(1),c54 binary(1),c55 binary(1),c56 binary(1),c57 binary(1),c58 binary(1),c59 binary(1),c60 binary(1),c61 binary(1),c62 binary(1),c63 binary(1),c64 binary(1),c65 binary(1),c66 binary(1),c67 binary(1),c68 binary(1),c69 binary(1),c70 binary(1),c71 binary(1),c72 binary(1),c73 binary(1),c74 binary(1),c75 binary(1),c76 binary(1),c77 binary(1),c78 binary(1),c79 binary(1),c80 binary(1),c81 binary(1),c82 binary(1),c83 binary(1),c84 binary(1),c85 binary(1),c86 binary(1),c87 binary(1),c88 binary(1),c89 binary(1),c90 binary(1),c91 binary(1),c92 binary(1),c93 binary(1),c94 binary(1),c95 binary(1),c96 binary(1),c97 binary(1),c98 binary(1),c99 binary(1),c100 binary(1),c101 binary(1),c102 binary(1),c103 binary(1),c104 binary(1),c105 binary(1),c106 binary(1),c107 binary(1),c108 binary(1),c109 binary(1),c110 binary(1),c111 binary(1),c112 binary(1),c113 binary(1),c114 binary(1),c115 binary(1),c116 binary(1),c117 binary(1),c118 binary(1),c119 binary(1),c120 binary(1),c121 binary(1),c122 binary(1),c123 binary(1),c124 binary(1),c125 binary(1),c126 binary(1),c127 binary(1),c128 binary(1),c129 binary(1),c130 binary(1),c131 binary(1),c132 binary(1),c133 binary(1),c134 binary(1),c135 binary(1),c136 binary(1),c137 binary(1),c138 binary(1),c139 binary(1),c140 binary(1),c141 binary(1),c142 binary(1),c143 binary(1),c144 binary(1),c145 binary(1),c146 binary(1),c147 binary(1),c148 binary(1),c149 binary(1),c150 binary(1),c151 binary(1),c152 binary(1),c153 binary(1),c154 binary(1),c155 binary(1),c156 binary(1),c157 binary(1),c158 binary(1),c159 binary(1),c160 binary(1),c161 binary(1),c162 binary(1),c163 binary(1),c164 binary(1),c165 binary(1),c166 binary(1),c167 binary(1),c168 binary(1),c169 binary(1),c170 binary(1),c171 binary(1),c172 binary(1),c173 binary(1),c174 binary(1),c175 binary(1),c176 binary(1),c177 binary(1),c178 binary(1),c179 binary(1),c180 binary(1),c181 binary(1),c182 binary(1),c183 binary(1),c184 binary(1),c185 binary(1),c186 binary(1),c187 binary(1),c188 binary(1),c189 binary(1),c190 binary(1),c191 binary(1),c192 binary(1),c193 binary(1),c194 binary(1),c195 binary(1),c196 binary(1),c197 binary(1),c198 binary(1),c199 binary(1),c200 binary(1),c201 binary(1),c202 binary(1),c203 binary(1),c204 binary(1),c205 binary(1),c206 binary(1),c207 binary(1),c208 binary(1),c209 binary(1),c210 binary(1),c211 binary(1),c212 binary(1),c213 binary(1),c214 binary(1),c215 binary(1),c216 binary(1),c217 binary(1),c218 binary(1),c219 binary(1),c220 binary(1),c221 binary(1),c222 binary(1),c223 binary(1),c224 binary(1),c225 binary(1),c226 binary(1),c227 binary(1),c228 binary(1),c229 binary(1),c230 binary(1),c231 binary(1),c232 binary(1),c233 binary(1)) engine=innodb,row_format=dynamic;
insert into t (id, c1) values (repeat('A', 2000), repeat('A', 2000));
insert into t (id, c1) values (repeat('B', 2000), repeat('B', 2000));
insert into t (id, c1) values (repeat('C', 2000), repeat('C', 2000));
insert into t (id, c1) values (repeat('D', 2000), repeat('D', 2000));
insert into t (id, c1) values (repeat('E', 2000), repeat('E', 2000));
insert into t (id, c1) values (repeat('F', 2000), repeat('F', 2000));
insert into t (id, c1) values (repeat('G', 2000), repeat('G', 2000));
insert into t (id, c1) values (repeat('H', 2000), repeat('H', 2000));
insert into t (id, c1) values (repeat('I', 2000), repeat('I', 2000));
insert into t (id, c1) values (repeat('J', 2000), repeat('J', 2000));
insert into t (id, c1) values (repeat('K', 2000), repeat('K', 2000));
insert into t (id, c1) values (repeat('L', 2000), repeat('L', 2000));
insert into t (id, c1) values (repeat('M', 2000), repeat('M', 2000));
insert into t (id, c1) values (repeat('N', 2000), repeat('N', 2000));
insert into t (id, c1) values (repeat('O', 2000), repeat('O', 2000));
insert into t (id, c1) values (repeat('P', 2000), repeat('P', 2000));
insert into t (id, c1) values (repeat('Q', 2000), repeat('Q', 2000));
insert into t (id, c1) values (repeat('R', 2000), repeat('R', 2000));
insert into t (id, c1) values (repeat('S', 2000), repeat('S', 2000));
insert into t (id, c1) values (repeat('T', 2000), repeat('T', 2000));
alter table t drop column c1,drop column c3,drop column c4,drop column c5,drop column c6,drop column c7,drop column c8,drop column c9,drop column c10,drop column c11,drop column c12,drop column c13,drop column c14,drop column c15,drop column c16,drop column c17,drop column c18,drop column c19,drop column c20,drop column c21,drop column c22,drop column c23,drop column c24,drop column c25,drop column c26,drop column c27,drop column c28,drop column c29,drop column c30,drop column c31,drop column c32,drop column c33,drop column c34,drop column c35,drop column c36,drop column c37,drop column c38,drop column c39,drop column c40,drop column c41,drop column c42,drop column c43,drop column c44,drop column c45,drop column c46,drop column c47,drop column c48,drop column c49,drop column c50,drop column c51,drop column c52,drop column c53,drop column c54,drop column c55,drop column c56,drop column c57,drop column c58,drop column c59,drop column c60,drop column c61,drop column c62,drop column c63,drop column c64,drop column c65,drop column c66,drop column c67,drop column c68,drop column c69,drop column c70,drop column c71,drop column c72,drop column c73,drop column c74,drop column c75,drop column c76,drop column c77,drop column c78,drop column c79,drop column c80,drop column c81,drop column c82,drop column c83,drop column c84,drop column c85,drop column c86,drop column c87,drop column c88,drop column c89,drop column c90,drop column c91,drop column c92,drop column c93,drop column c94,drop column c95,drop column c96,drop column c97,drop column c98,drop column c99,drop column c100,drop column c101,drop column c102,drop column c103,drop column c104,drop column c105,drop column c106,drop column c107,drop column c108,drop column c109,drop column c110,drop column c111,drop column c112,drop column c113,drop column c114,drop column c115,drop column c116,drop column c117,drop column c118,drop column c119,drop column c120,drop column c121,drop column c122,drop column c123,drop column c124,drop column c125,drop column c126,drop column c127,drop column c128,drop column c129,drop column c130,drop column c131,drop column c132,drop column c133,drop column c134,drop column c135,drop column c136,drop column c137,drop column c138,drop column c139,drop column c140,drop column c141,drop column c142,drop column c143,drop column c144,drop column c145,drop column c146,drop column c147,drop column c148,drop column c149,drop column c150,drop column c151,drop column c152,drop column c153,drop column c154,drop column c155,drop column c156,drop column c157,drop column c158,drop column c159,drop column c160,drop column c161,drop column c162,drop column c163,drop column c164,drop column c165,drop column c166,drop column c167,drop column c168,drop column c169,drop column c170,drop column c171,drop column c172,drop column c173,drop column c174,drop column c175,drop column c176,drop column c177,drop column c178,drop column c179,drop column c180,drop column c181,drop column c182,drop column c183,drop column c184,drop column c185,drop column c186,drop column c187,drop column c188,drop column c189,drop column c190,drop column c191,drop column c192,drop column c193,drop column c194,drop column c195,drop column c196,drop column c197,drop column c198,drop column c199,drop column c200,drop column c201,drop column c202,drop column c203,drop column c204,drop column c205,drop column c206,drop column c207,drop column c208,drop column c209,drop column c210,drop column c211,drop column c212,drop column c213,drop column c214,drop column c215,drop column c216,drop column c217,drop column c218,drop column c219,drop column c220,drop column c221,drop column c222,drop column c223,drop column c224,drop column c225,drop column c226,drop column c227,drop column c228,drop column c229,drop column c230,drop column c231,drop column c232,drop column c233;

begin;
insert into t (id, c2) values (repeat('V', 2000), 'a');
SET DEBUG_SYNC= "btr_cur_before_pessimistic_delete SIGNAL before_delete WAIT_FOR continue";
send rollback;
--connection con1
SET DEBUG_SYNC= "now WAIT_FOR before_delete";
send select substring(id,1,1) from t where id = repeat('V', 2000);
--connection con2
sleep 3;
SET DEBUG_SYNC= "now SIGNAL continue";
--connection con1
reap;
--connection default
reap;
drop table t;
```

After the execution of this MTR, MySQL will go down due to a deadlock.

Suggested fix:
According to our analysis, we believe that the cause of this problem is that btr_cur_will_modify_tree does not correctly calculate whether the page will occur btr_compress after a rec deleted, causing the latch of the parent page to be released early. 

The dict_index_node_ptr_max_size(index) use the index->n_nullable instead of get_nullable_before_instant_add_drop to calculate the nulls-size in btr_cur_search_to_nth_level now. After dropping many nullable columns through ALGORITHM=INSTANT, the calculated result of dict_index_node_ptr_max_size will be too small to correctly hold the latches of some parent pages that may be involved in SMO, resulting in page deadlocks.
[25 Nov 2024 4:49] Ruyi Zhang
This patch using index->get_nullable_before_instant_add_drop to replace index->n_nullable for comp format

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: instant_drop_ddl.patch (application/octet-stream, text), 19.85 KiB.

[25 Nov 2024 7:23] MySQL Verification Team
Hello Zhang Ruyi,

Thank you for the report and contribution.

regards,
Umesh