Bug #81166 ndb_binlog_ddl_multi deadlock when shutting down mysqld
Submitted: 21 Apr 2016 5:55 Modified: 3 May 2016 11:00
Reporter: Magnus Blåudd Email Updates:
Status: Closed Impact on me:
None 
Category:Tests Severity:S7 (Test Cases)
Version:7.5.2 OS:Any
Assigned to: CPU Architecture:Any

[21 Apr 2016 5:55] Magnus Blåudd
Description:
The ndb_binlog_ddl_multi test fails somewhat randomly but still frequently with error:

"mysqltest: In included file ./include/restart_mysqld.inc at line 35: included from /export/home2/pb2/test/sb_2-18627569-1461150834.67/mysql-cluster-gpl-7.5.2-linux-i686/mysql-test/suite/ndb_binlog/t/ndb_binlog_ddl_multi.test at line 319: At line 35: command "shutdown_server" failed with error 2. my_errno=175"

This error occurs when the "shutdown_server" command in the test fails to shutdown the mysqld before starting it up again. When this error occurs it's deteced by mtr.pl who goes in and forcefully kills all process in order to get a backtrace of what they are doing(ie. why they don't react). In the resulting stacktrace which is printed to the log one can clearly see how to threads have deadlocked. One thread is in THD::set_open_tables() trying to lock a mutex while the signal handler trying to shutdown the mysqld is waiting for that mutex to be released(or at least to access the same mutex and the list it protects).

How to repeat:
Analyze logs from "mtr ndb_binlog_ddl_multi" and notice the to threads:

Thread 36 (Thread 0x8c497b70 (LWP 28944)):
#3  0x08aa43f1 in my_cond_wait (cond=0xc4b91a0, mp=0xc4b91d4) at mysql-cluster-gpl-7.5.2/include/thr_cond.h:191
#4  0x08aa468f in inline_mysql_cond_wait (that=0xc4b91a0, mutex=0xc4b91d4, src_file=0x9b55360 "mysql-cluster-gpl-7.5.2/sql/mysqld_thd_manager.cc", src_line=238) at mysql-cluster-gpl-7.5.2/include/mysql/psi/mysql_thread.h:1184
#5  0x08aa4fed in Global_THD_manager::wait_till_no_thd (this=0xc4b7a10) at mysql-cluster-gpl-7.5.2/sql/mysqld_thd_manager.cc:238
#6  0x08a925fd in close_connections () at mysql-cluster-gpl-7.5.2/sql/mysqld.cc:1058
#7  0x08a942e8 in signal_hand (arg=0x0) at mysql-cluster-gpl-7.5.2/sql/mysqld.cc:2135

Thread 20 (Thread 0x90237b70 (LWP 28924)):
#4  0x09062bc0 in native_mutex_lock (mutex=0x95708d88) at mysql-cluster-gpl-7.5.2/include/thr_mutex.h:84
#5  0x09062c37 in my_mutex_lock (mp=0x95708d88) at mysql-cluster-gpl-7.5.2/include/thr_mutex.h:182
#6  0x09062e21 in inline_mysql_mutex_lock (that=0x95708d88, src_file=0x9cbdbc4 "mysql-cluster-gpl-7.5.2/sql/sql_class.h", src_line=4179) at mysql-cluster-gpl-7.5.2/include/mysql/psi/mysql_thread.h:715
#7  0x090768df in THD::set_open_tables (this=0x957085e8, open_tables_arg=0x9574c770) at mysql-cluster-gpl-7.5.2/sql/sql_class.h:4179
#8  0x090694b9 in open_table (thd=0x957085e8, table_list=0x9023675c, ot_ctx=0x9023663c) at mysql-cluster-gpl-7.5.2/sql/sql_base.cc:3527
#9  0x0906ba73 in open_and_process_table (thd=0x957085e8, lex=0x95709b80, tables=0x9023675c, counter=0x902366d8, flags=2048, prelocking_strategy=0x9023670c, has_prelocking_list=false, ot_ctx=0x9023663c) at mysql-cluster-gpl-7.5.2/sql/sql_base.cc:5052
#10 0x0906c915 in open_tables (thd=0x957085e8, start=0x902366f4, counter=0x902366d8, flags=2048, prelocking_strategy=0x9023670c) at mysql-cluster-gpl-7.5.2/sql/sql_base.cc:5663
#11 0x0906d8eb in open_and_lock_tables (thd=0x957085e8, tables=0x9023675c, flags=2048, prelocking_strategy=0x9023670c) at mysql-cluster-gpl-7.5.2/sql/sql_base.cc:6367
#12 0x08ac3d59 in open_and_lock_tables (thd=0x957085e8, tables=0x9023675c, flags=2048) at mysql-cluster-gpl-7.5.2/sql/sql_base.h:477
#13 0x094dba96 in ndb_binlog_index_table__open (thd=0x957085e8, ndb_binlog_index=0x90236c2c) at mysql-cluster-gpl-7.5.2/sql/ha_ndbcluster_binlog.cc:4022
#14 0x094dbc9e in ndb_binlog_index_table__write_rows (thd=0x957085e8, row=0x90237018) at mysql-cluster-gpl-7.5.2/sql/ha_ndbcluster_binlog.cc:4063
#15 0x094e5013 in Ndb_binlog_thread::do_run (this=0xa7f84a0) at mysql-cluster-gpl-7.5.2/sql/ha_ndbcluster_binlog.cc:7258

Suggested fix:
The second thread is clearly waiting to write to the mysql.ndb_binlog_index table, trying to lock a mutex. Perhaps it need to check thd->killed or be registered in some list so that it's properly woken up?
[3 May 2016 11:00] Jon Stephens
Documented fix in the NDB 7.5.2 changelog, as follows:

    When a write to the ndb_binlog_index table failed during a MySQL
    Server shutdown, mysqld killed the ndb binary logging thread.

Closed.