Bug #114777 | lost connection in MySQL NDB cluster | ||
---|---|---|---|
Submitted: | 25 Apr 7:09 | Modified: | 8 May 6:26 |
Reporter: | CunDi Fang | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 8.0.35-cluster MySQL Cluster Community S | OS: | Any (20.04) |
Assigned to: | CPU Architecture: | Any |
[25 Apr 7:09]
CunDi Fang
[25 Apr 7:58]
CunDi Fang
Architecture Information: ''' [NDBD DEFAULT] NoOfReplicas =2 DataMemory = 512M IndexMemory = 64M [NDB_MGMD] NodeId=1 hostname =192.172.10.8 datadir =/var/lib/mysql-cluster [NDBD] NodeId =2 hostname =192.172.10.9 datadir =/usr/local/mysql-cluster/data NodeGroup=0 [NDBD] NodeId =3 hostname =192.172.10.10 datadir =/usr/local/mysql-cluster/data NodeGroup=1 [NDBD] NodeId =4 hostname =192.172.10.11 datadir =/usr/local/mysql-cluster/data NodeGroup=0 [NDBD] NodeId =5 hostname =192.172.10.12 datadir =/usr/local/mysql-cluster/data NodeGroup=1 [mysqld] NodeId =6 hostname =192.172.10.9 [mysqld] NodeId =7 hostname =192.172.10.10 [mysqld] NodeId =8 hostname =192.172.10.11 [mysqld] NodeId =9 hostname =192.172.10.12 ''' Attempted and successfully reproduced!
[29 Apr 23:05]
MySQL Verification Team
Please upload ndb_error_report output
[30 Apr 12:31]
CunDi Fang
Okay, so despite some effort, I did manage to surface the bug in my environment and packaged up the log file, which will be uploaded later. In the meantime, I have an updated understanding of the bug. Instead of just 2 nodes, he needs the synchronized synergy of 4 nodes. Poc in node1: ``` ALTER TABLE mytest90.test5 ALTER COLUMN column2 SET DEFAULT 479; ``` result: ``` SQL executed successfully. start time: 1714479495.92 end time:1714479495.426890 ``` Poc in node2: ``` delete from mytest90.test5 where EXISTS ( select subq_0.c0 as c0, subq_0.c1 as c1, subq_0.c1 as c2, mytest90.test5.column1 as c3, case when (ref_0.column6 is NULL) and (EXISTS ( select ref_2.column1 as c0, mytest90.test5.column2 as c1, ref_0.column2 as c2, ref_2.column1 as c3, ref_2.column2 as c4 from mytest90.test1 as ref_2 where false limit 37)) then mytest90.test5.column2 else mytest90.test5.column2 end as c4 from mytest90.test3 as ref_0 right join (select ref_1.column4 as c0, mytest90.test5.column2 as c1 from mytest90.test4 as ref_1 where ref_1.column5 is not NULL limit 54) as subq_0 on (false) where 9 is not NULL limit 60); ``` result: ``` MySQL query failed: Lost connection to MySQL server during query start time: 1714479495.18168 end time:1714479495.194724 ``` Poc in node3: ``` ALTER TABLE mytest90.test5 RENAME COLUMN column2 TO column64; ``` result: ``` SQL executed successfully. start time: 1714479495.106 end time:1714479495.775025 ``` Poc in node4: ``` insert into mytest90.test5 values ( cast(nullif(cast(coalesce(cast(null as float), cast(null as float)) as float), cast(null as float)) as float), 26); ``` result: ``` SQL executed successfully. start time: 1714479495.11671 end time:1714479495.16158 ``` The bug is indeed reproducible, but it is very demanding in terms of time points and sequences.
[30 Apr 12:32]
CunDi Fang
time line of this bug
Attachment: Figure_114777.png (image/png, text), 31.15 KiB.
[30 Apr 13:30]
MySQL Verification Team
There are no log files in this archive. Check if you have log files. If you delete log files while server is already running it will not work, you need to shutdown all nodes, delete logs and then start cluster.
[7 May 8:33]
MySQL Verification Team
No, this is not a complete log with crash data.
[7 May 11:47]
CunDi Fang
I'm pretty sure I can reproduce the bug consistently with my cluster configuration and databases, plus at certain points in time in a certain chronological order, and then this zip is the one that I get from the "db_error_reporter /var/lib/mysqcluster/config.ini root " command. Be sure to note that sql statements are executed on the corresponding nodes in a specific order and at a specific point in time.
[8 May 5:17]
MySQL Verification Team
No crash in the logs that I can see so cluster did not crash, can you upload logs from the SQL node you were connected to. Thanks
[8 May 5:43]
CunDi Fang
Did you mean this one? ``` 2024-05-08T05:42:49Z UTC - mysqld got signal 11 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=1135254ad6e65d8565c7a1bb98c0a03b430d0f46 Thread pointer: 0x7f1068000fd0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f110c108b70 thread_stack 0x100000 /usr/local/mysql-cluster/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x2134dc1] /usr/local/mysql-cluster/bin/mysqld(print_fatal_signal(int)+0x2a2) [0xff07a2] /usr/local/mysql-cluster/bin/mysqld(handle_fatal_signal+0xa5) [0xff0955] /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f11377ee520] /usr/local/mysql-cluster/bin/mysqld(ndb_pushed_builder_ctx::make_pushed_join()+0xaa) [0x2640c0a] /usr/local/mysql-cluster/bin/mysqld(ndbcluster_push_to_engine(THD*, AccessPath*, JOIN*)+0x264) [0x2623fb4] /usr/local/mysql-cluster/bin/mysqld(JOIN::push_to_engines()+0x71) [0xe5ec01] /usr/local/mysql-cluster/bin/mysqld(JOIN::optimize(bool)+0x1cf8) [0xe78978] /usr/local/mysql-cluster/bin/mysqld(Query_block::optimize(THD*, bool)+0xc1) [0xedac91] /usr/local/mysql-cluster/bin/mysqld(Query_expression::optimize(THD*, TABLE*, bool, bool)+0xad) [0xf5990d] /usr/local/mysql-cluster/bin/mysqld(Item_subselect::exec(THD*)+0x285) [0x120de95] /usr/local/mysql-cluster/bin/mysqld(Item_exists_subselect::val_bool()+0x38) [0x12095d8] /usr/local/mysql-cluster/bin/mysqld(Item_exists_subselect::val_int()+0x11) [0x12052e1] /usr/local/mysql-cluster/bin/mysqld(Sql_cmd_delete::delete_from_single_table(THD*)+0xdf1) [0x136aa81] /usr/local/mysql-cluster/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x1e4) [0xee4ef4] /usr/local/mysql-cluster/bin/mysqld(mysql_execute_command(THD*, bool)+0xae7) [0xe80bc7] /usr/local/mysql-cluster/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x51b) [0xe843bb] /usr/local/mysql-cluster/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2351) [0xe86d01] /usr/local/mysql-cluster/bin/mysqld(do_command(THD*)+0x15b) [0xe8787b] /usr/local/mysql-cluster/bin/mysqld() [0xfe09b8] /usr/local/mysql-cluster/bin/mysqld() [0x2848944] /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f1137840ac3] /lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f11378d1bf4] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f106800abd0): is an invalid pointer Connection ID (thread ID): 19 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 2024-05-08T05:42:49.491645Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release. 2024-05-08T05:42:49.491796Z 0 [System] [MY-010116] [Server] /usr/local/mysql-cluster/bin/mysqld (mysqld 8.0.35-cluster) starting as process 11357 2024-05-08T05:42:49.500615Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2024-05-08T05:42:49.784563Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2024-05-08T05:42:49.877916Z 0 [ERROR] [MY-010901] [Server] Can't open shared library '/usr/local/mysql-cluster/lib/plugin/' (errno: 22 /usr/local/mysql-cluster/lib/plugin/: cannot read file data: Is a directory). 2024-05-08T05:42:49.877950Z 0 [Warning] [MY-010736] [Server] Couldn't load plugin named '' with soname ''. 2024-05-08T05:42:49.879457Z 0 [System] [MY-010866] [NDB] Metadata: Initialization ```
[8 May 6:26]
MySQL Verification Team
Bug #114850 is marked as duplicate of this one