MySQL Bugs: #114391: Time out bug in mysql NDB cluster

Bug #114391	Time out bug in mysql NDB cluster
Submitted:	18 Mar 2024 13:24	Modified:	15 Apr 2024 12:52
Reporter:	CunDi Fang	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	8.0.35-cluster MySQL Cluster Community S	OS:	Any (20.04)
Assigned to:		CPU Architecture:	Any
Tags:	lock, Stuck, timeout

Description:
Hello, I found a timeout bug in 8.0.35-cluster version of MYSQL cluster. The detail is as follow.

OS version and name:
Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Linux eb1f47b08982 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 x86_64 x86_64 GNU/Linux

PoC:
'''
I will give additional files to the bug after it has been opened.
'''

GDB Trace:
'''
#0  Join_nest::get_inner_nest () at ../../../mysql-cluster-gpl-8.0.35/storage/ndb/plugin/ha_query_plan.cc:971
#1  0x0000000002670031 in pushed_table::get_full_inner_nest () at ../../../mysql-cluster-gpl-8.0.35/storage/ndb/plugin/ha_query_plan.cc:1221
#2  0x000000000263fad5 in ndb_pushed_builder_ctx::is_pushable_with_root ()
    at ../../../mysql-cluster-gpl-8.0.35/storage/ndb/plugin/ha_ndbcluster_push.cc:884
#3  ndb_pushed_builder_ctx::is_pushable_with_root () at ../../../mysql-cluster-gpl-8.0.35/storage/ndb/plugin/ha_ndbcluster_push.cc:790
#4  0x0000000002640ac1 in ndb_pushed_builder_ctx::make_pushed_join ()
    at ../../../mysql-cluster-gpl-8.0.35/storage/ndb/plugin/ha_ndbcluster_push.cc:622
#5  0x0000000002640bcb in ndb_pushed_builder_ctx::make_pushed_join ()
    at ../../../mysql-cluster-gpl-8.0.35/storage/ndb/plugin/ha_ndbcluster_push.cc:660
#6  0x0000000002623fb4 in ndbcluster_push_to_engine () at ../../../mysql-cluster-gpl-8.0.35/storage/ndb/plugin/ha_ndbcluster.cc:14361
#7  0x0000000000e5ec01 in JOIN::push_to_engines () at ../../mysql-cluster-gpl-8.0.35/sql/sql_optimizer.cc:1148
#8  0x0000000000e78978 in JOIN::optimize () at ../../mysql-cluster-gpl-8.0.35/sql/sql_optimizer.cc:1062
#9  0x0000000000edac91 in Query_block::optimize () at ../../mysql-cluster-gpl-8.0.35/sql/sql_select.cc:2013
#10 0x0000000000f5990d in Query_expression::optimize () at ../../mysql-cluster-gpl-8.0.35/sql/sql_union.cc:1006
#11 0x0000000000ed9bb4 in Sql_cmd_dml::execute_inner () at ../../mysql-cluster-gpl-8.0.35/sql/sql_select.cc:1007
#12 0x0000000000ee4ef4 in Sql_cmd_dml::execute () at ../../mysql-cluster-gpl-8.0.35/sql/sql_select.cc:793
#13 0x0000000000e80bc7 in mysql_execute_command () at ../../mysql-cluster-gpl-8.0.35/sql/sql_parse.cc:4719
#14 0x0000000000e843bb in dispatch_sql_command () at ../../mysql-cluster-gpl-8.0.35/sql/sql_parse.cc:5368
#15 0x0000000000e86d01 in dispatch_command () at ../../mysql-cluster-gpl-8.0.35/sql/sql_parse.cc:2054
#16 0x0000000000e8787b in do_command () at ../../mysql-cluster-gpl-8.0.35/sql/sql_parse.cc:1439
#17 0x0000000000fe09b8 in handle_connection () at ../../mysql-cluster-gpl-8.0.35/sql/conn_handler/connection_handler_per_thread.cc:302
#18 0x0000000002848944 in pfs_spawn_thread () at ../../../mysql-cluster-gpl-8.0.35/storage/perfschema/pfs.cc:3042
#19 0x00007f52a9c40ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#20 0x00007f52a9cd1bf4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
'''

Architecture Information:
'''
[NDBD DEFAULT]
NoOfReplicas =2
DataMemory = 512M
IndexMemory = 64M

[NDB_MGMD]
NodeId=1
hostname =192.172.10.8
datadir =/var/lib/mysql-cluster

[NDBD]
NodeId =2
hostname =192.172.10.9
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =3
hostname =192.172.10.10
datadir =/usr/local/mysql-cluster/data
NodeGroup=1
[NDBD]
NodeId =4
hostname =192.172.10.11
datadir =/usr/local/mysql-cluster/data
NodeGroup=0
[NDBD]
NodeId =5
hostname =192.172.10.12
datadir =/usr/local/mysql-cluster/data
NodeGroup=1

[mysqld]
NodeId =6
hostname =192.172.10.9
[mysqld]
NodeId =7
hostname =192.172.10.10
[mysqld]
NodeId =8
hostname =192.172.10.11
[mysqld]
NodeId =9
hostname =192.172.10.12
'''

Attempted and successfully reproduced!

How to repeat:
Simply execute that PoC to trigger it.

It is rendered in the "show processlist" as follows:
'''
MySQL root@(none):(none)> show processlist
+----+-----------------+-----------+--------+---------+-------+-----------------------------------+-----------------------------------------------------------------------------------------------------------+
| Id | User            | Host      | db     | Command | Time  | State                             | Info                                                                                                      |
+----+-----------------+-----------+--------+---------+-------+-----------------------------------+-----------------------------------------------------------------------------------------------------------+
| 2  | system user     |           |        | Daemon  | 0     | Waiting for event from ndbcluster | <null>                                                                                                    |
| 6  | event_scheduler | localhost | <null> | Daemon  | 19680 | Waiting on empty queue            | <null>                                                                                                    |
| 12 | root            | localhost | <null> | Killed  | 3907  | preparing                         | select\n  ref_2.column4 as c0,\n  ref_8.column3 as c1,\n  case when EXISTS (\n      select\n          sub |
| 17 | root            | localhost | <null> | Sleep   | 14    |                                   | <null>                                                                                                    |
| 18 | root            | localhost | <null> | Query   | 0     | init                              | show processlist                                                                                          |
+----+-----------------+-----------+--------+---------+-------+-----------------------------------+-----------------------------------------------------------------------------------------------------------+

5 rows in set
Time: 0.018s
'''

Suggested fix:
It seems to be a problem with the "optimiser" part, which seems like a dead loop, I suggest to check the code of that part. I can provide the database at that time if you need.

Hi,

I cannot reproduce this with available data. Can you give us a full test case with all the required create statements and data ?

Thanks

I have uploaded the PcC, the create statements and data, let me know if there's anything else you need.

Thanks for the data I managed to reproduce the problem

Duplicate of Bug#114464