| Bug #114777 | lost connection in MySQL NDB cluster | ||
|---|---|---|---|
| Submitted: | 25 Apr 2024 7:09 | Modified: | 8 May 2024 6:26 |
| Reporter: | CunDi Fang | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | 8.0.35-cluster MySQL Cluster Community S | OS: | Any (20.04) |
| Assigned to: | CPU Architecture: | Any | |
[25 Apr 2024 7:58]
CunDi Fang
Architecture Information: ''' [NDBD DEFAULT] NoOfReplicas =2 DataMemory = 512M IndexMemory = 64M [NDB_MGMD] NodeId=1 hostname =192.172.10.8 datadir =/var/lib/mysql-cluster [NDBD] NodeId =2 hostname =192.172.10.9 datadir =/usr/local/mysql-cluster/data NodeGroup=0 [NDBD] NodeId =3 hostname =192.172.10.10 datadir =/usr/local/mysql-cluster/data NodeGroup=1 [NDBD] NodeId =4 hostname =192.172.10.11 datadir =/usr/local/mysql-cluster/data NodeGroup=0 [NDBD] NodeId =5 hostname =192.172.10.12 datadir =/usr/local/mysql-cluster/data NodeGroup=1 [mysqld] NodeId =6 hostname =192.172.10.9 [mysqld] NodeId =7 hostname =192.172.10.10 [mysqld] NodeId =8 hostname =192.172.10.11 [mysqld] NodeId =9 hostname =192.172.10.12 ''' Attempted and successfully reproduced!
[29 Apr 2024 23:05]
MySQL Verification Team
Please upload ndb_error_report output
[30 Apr 2024 12:31]
CunDi Fang
Okay, so despite some effort, I did manage to surface the bug in my environment and packaged up the log file, which will be uploaded later. In the meantime, I have an updated understanding of the bug.
Instead of just 2 nodes, he needs the synchronized synergy of 4 nodes.
Poc in node1:
```
ALTER TABLE mytest90.test5 ALTER COLUMN column2 SET DEFAULT 479;
```
result:
```
SQL executed successfully.
start time: 1714479495.92 end time:1714479495.426890
```
Poc in node2:
```
delete from mytest90.test5
where
EXISTS (
select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c1 as c2,
mytest90.test5.column1 as c3,
case when (ref_0.column6 is NULL)
and (EXISTS (
select
ref_2.column1 as c0,
mytest90.test5.column2 as c1,
ref_0.column2 as c2,
ref_2.column1 as c3,
ref_2.column2 as c4
from
mytest90.test1 as ref_2
where false
limit 37)) then mytest90.test5.column2 else mytest90.test5.column2 end
as c4
from
mytest90.test3 as ref_0
right join (select
ref_1.column4 as c0,
mytest90.test5.column2 as c1
from
mytest90.test4 as ref_1
where ref_1.column5 is not NULL
limit 54) as subq_0 on (false)
where 9 is not NULL
limit 60);
```
result:
```
MySQL query failed: Lost connection to MySQL server during query
start time: 1714479495.18168 end time:1714479495.194724
```
Poc in node3:
```
ALTER TABLE mytest90.test5 RENAME COLUMN column2 TO column64;
```
result:
```
SQL executed successfully.
start time: 1714479495.106 end time:1714479495.775025
```
Poc in node4:
```
insert into mytest90.test5 values (
cast(nullif(cast(coalesce(cast(null as float),
cast(null as float)) as float),
cast(null as float)) as float),
26);
```
result:
```
SQL executed successfully.
start time: 1714479495.11671 end time:1714479495.16158
```
The bug is indeed reproducible, but it is very demanding in terms of time points and sequences.
[30 Apr 2024 12:32]
CunDi Fang
time line of this bug
Attachment: Figure_114777.png (image/png, text), 31.15 KiB.
[30 Apr 2024 13:30]
MySQL Verification Team
There are no log files in this archive. Check if you have log files. If you delete log files while server is already running it will not work, you need to shutdown all nodes, delete logs and then start cluster.
[7 May 2024 8:33]
MySQL Verification Team
No, this is not a complete log with crash data.
[7 May 2024 11:47]
CunDi Fang
I'm pretty sure I can reproduce the bug consistently with my cluster configuration and databases, plus at certain points in time in a certain chronological order, and then this zip is the one that I get from the "db_error_reporter /var/lib/mysqcluster/config.ini root " command. Be sure to note that sql statements are executed on the corresponding nodes in a specific order and at a specific point in time.
[8 May 2024 5:17]
MySQL Verification Team
No crash in the logs that I can see so cluster did not crash, can you upload logs from the SQL node you were connected to. Thanks
[8 May 2024 5:43]
CunDi Fang
Did you mean this one? ``` 2024-05-08T05:42:49Z UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=1135254ad6e65d8565c7a1bb98c0a03b430d0f46 Thread pointer: 0x7f1068000fd0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f110c108b70 thread_stack 0x100000 /usr/local/mysql-cluster/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x2134dc1] /usr/local/mysql-cluster/bin/mysqld(print_fatal_signal(int)+0x2a2) [0xff07a2] /usr/local/mysql-cluster/bin/mysqld(handle_fatal_signal+0xa5) [0xff0955] /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f11377ee520] /usr/local/mysql-cluster/bin/mysqld(ndb_pushed_builder_ctx::make_pushed_join()+0xaa) [0x2640c0a] /usr/local/mysql-cluster/bin/mysqld(ndbcluster_push_to_engine(THD*, AccessPath*, JOIN*)+0x264) [0x2623fb4] /usr/local/mysql-cluster/bin/mysqld(JOIN::push_to_engines()+0x71) [0xe5ec01] /usr/local/mysql-cluster/bin/mysqld(JOIN::optimize(bool)+0x1cf8) [0xe78978] /usr/local/mysql-cluster/bin/mysqld(Query_block::optimize(THD*, bool)+0xc1) [0xedac91] /usr/local/mysql-cluster/bin/mysqld(Query_expression::optimize(THD*, TABLE*, bool, bool)+0xad) [0xf5990d] /usr/local/mysql-cluster/bin/mysqld(Item_subselect::exec(THD*)+0x285) [0x120de95] /usr/local/mysql-cluster/bin/mysqld(Item_exists_subselect::val_bool()+0x38) [0x12095d8] /usr/local/mysql-cluster/bin/mysqld(Item_exists_subselect::val_int()+0x11) [0x12052e1] /usr/local/mysql-cluster/bin/mysqld(Sql_cmd_delete::delete_from_single_table(THD*)+0xdf1) [0x136aa81] /usr/local/mysql-cluster/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x1e4) [0xee4ef4] /usr/local/mysql-cluster/bin/mysqld(mysql_execute_command(THD*, bool)+0xae7) [0xe80bc7] /usr/local/mysql-cluster/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x51b) [0xe843bb] /usr/local/mysql-cluster/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2351) [0xe86d01] /usr/local/mysql-cluster/bin/mysqld(do_command(THD*)+0x15b) [0xe8787b] /usr/local/mysql-cluster/bin/mysqld() [0xfe09b8] /usr/local/mysql-cluster/bin/mysqld() [0x2848944] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f1137840ac3] /lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f11378d1bf4] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f106800abd0): is an invalid pointer Connection ID (thread ID): 19 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 2024-05-08T05:42:49.491645Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release. 2024-05-08T05:42:49.491796Z 0 [System] [MY-010116] [Server] /usr/local/mysql-cluster/bin/mysqld (mysqld 8.0.35-cluster) starting as process 11357 2024-05-08T05:42:49.500615Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2024-05-08T05:42:49.784563Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2024-05-08T05:42:49.877916Z 0 [ERROR] [MY-010901] [Server] Can't open shared library '/usr/local/mysql-cluster/lib/plugin/' (errno: 22 /usr/local/mysql-cluster/lib/plugin/: cannot read file data: Is a directory). 2024-05-08T05:42:49.877950Z 0 [Warning] [MY-010736] [Server] Couldn't load plugin named '' with soname ''. 2024-05-08T05:42:49.879457Z 0 [System] [MY-010866] [NDB] Metadata: Initialization ```
[8 May 2024 6:26]
MySQL Verification Team
Bug #114850 is marked as duplicate of this one

Description: Hello, I found a bug in 8.0.35-cluster version of MYSQL cluster. It will cause the current mysql service to crash. The detail is as follow. OS version and name: Ubuntu 22.04.3 LTS (Jammy Jellyfish) Linux eb1f47b08982 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 x86_64 x86_64 GNU/Linux I start by executing the alter statement on a node: Poc 1: ``` ALTER TABLE mytest90.test5 ALTER COLUMN column2 SET DEFAULT 479; ``` While this statement is in the process of executing, note that it is in the process of executing, not at the end or not yet started, the following delete statement is executed on another node. Poc 2: ``` delete from mytest90.test5 where EXISTS ( select subq_0.c0 as c0, subq_0.c1 as c1, subq_0.c1 as c2, mytest90.test5.column1 as c3, case when (ref_0.column6 is NULL) and (EXISTS ( select ref_2.column1 as c0, mytest90.test5.column2 as c1, ref_0.column2 as c2, ref_2.column1 as c3, ref_2.column2 as c4 from mytest90.test1 as ref_2 where false limit 37)) then mytest90.test5.column2 else mytest90.test5.column2 end as c4 from mytest90.test3 as ref_0 right join (select ref_1.column4 as c0, mytest90.test5.column2 as c1 from mytest90.test4 as ref_1 where ref_1.column5 is not NULL limit 54) as subq_0 on (false) where 9 is not NULL limit 60); ``` This time it was found that the mysql service of the node that tried to execute the delete statement crashed: failed:Lost connection to MySQL server during query How to repeat: Importing the database file I give later, and then executing Poc Suggested fix: It should be the process of the alter statement, a series of locking operations, resulting in the delete statement can not be executed properly, but why it will cause the mysql service crash, this I have no clue at the moment.