Bug #56299 | mutex deadlock with COM_BINLOG_DUMP, binlog purge, and processlist/kill | ||
---|---|---|---|
Submitted: | 26 Aug 2010 16:16 | Modified: | 7 Nov 2011 17:24 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
Version: | 5.1.50,5.5.5 | OS: | Any |
Assigned to: | Sven Sandberg | CPU Architecture: | Any |
[26 Aug 2010 16:16]
Shane Bester
[27 Aug 2010 5:25]
MySQL Verification Team
testcase. run against a master with --max-binlog-size=4k. slave will reconnect a few times due to kill. hangup in a few seconds.
Attachment: bug56299.c (text/plain), 6.35 KiB.
[27 Aug 2010 5:31]
MySQL Verification Team
testcase will not work against 5.0 since it uses information_schema.processlist. master is running like this: mysqld --no-defaults --console --skip-gr --skip-na --log-bin --expire-logs-days=1 --server-id=1 --max-binlog-size=4096 slave is running like this: mysqld --no-defaults --server-id=2 --port=3307 --skip-gr --skip-na --console make sure the slave binlog dump thread is killed many times, and slave reconnects by itself.
[5 Nov 2010 17:00]
Sven Sandberg
The deadlock happens when the following four things happen at the same time: - An old dump thread is waiting for the binlog to grow. - The slave server that replicates from the old dump thread tries to reconnect. During reconnection, the new dump thread tries to kill the old dump thread. - A KILL statement tries to kill the old dump thread. - An INSERT statement causes a binlog rotation. The following lists the stacks for the four threads. Control flow go from top to bottom in the list, so the stack top is in the bottom. Lines that either do function calls or acquire a lock are listed. old dump thread: sql_parse:1440:dispatch_command sql_repl.cc:668:mysql_binlog_send log.cc:4879:MYSQL_BIN_LOG::wait_for_update This thread is waiting without holding any mutexes. It has just set THD::mysys_var::current_mutex = LOCK_log INSERT thread: sql_insert.cc:921:mysql_insert sql_class.cc:3913:THD::binlog_query log.cc:4263:MYSQL_BIN_LOG::write LOCK_log * log.cc:4411:MYSQL_BIN_LOG::write log.cc:4524:MYSQL_BIN_LOG::rotate_and_purge log.cc:3631:MYSQL_BIN_LOG::purge_logs_before_date LOCK_index log.cc:3639:MYSQL_BIN_LOG::purge_logs_before_date sql_repl.cc:215:log_in_use LOCK_thread_count * COM_BINLOG_DUMP thread: sql_parse.cc:1435:dispatch_command sql_repl.cc:1096:kill_zombie_dump_threads LOCK_thread_count * sql_repl.cc:1106:kill_zombie_dump_threads old_dump_thd.LOCK_thd_data * KILL thread: sql_parse.cc:4031:mysql_execute_command sql_parse.cc:7063:sql_kill sql_parse.cc:7014:kill_one_thread old_dump_thd.LOCK_thd_data * sql_parse.cc:7038:kill_one_thread sql_class.cc:1092:old_dump_thd.awake old_dump_thd.mysys_var.mutex sql_class.cc:1117:old_dump_thd.awake old_dump_thd.mysys_var.current_mutex = LOCK_log * Deadlock summary: - INSERT takes LOCK_log before LOCK_thread_count - COM_BINLOG_DUMP takes LOCK_thread_count before old_dump_thd.LOCK_thd_data - KILL takes old_dump_thd.LOCK_thd_data before LOCK_log Which thread has wrong lock order? - It's probably necessary that kill_zombie_dump_threads takes LOCK_thread_count before old_dump_thd.LOCK_thd_data, because that prevents old_dump_thd from being deleted under the feet of kill_zombie_dump_threads. - KILL is a generic statement that probably has a reason to take locks in this order. - So the lock order should be: LOCK_thread_count, THD::LOCK_thd_data, LOCK_log So it's the log rotation that violates lock order by taking LOCK_log before LOCK_thread count. And in fact, it seems unnecessary to block creating or destroying threads just because we rotate the log. We can fix the bug as follows: - Associate a counter with each binlog file. - Increase and decrease the counter as dump threads start or stop reading a log. - In log_in_use, just read the counter instead of iterating through all active threads in the system. - Protect the counters with LOCK_index. - Enforce this lock order: LOCK_thread_count, THD::LOCK_thd_data, LOCK_log, LOCK_index It seems that LOCK_index is always held anyways in all places when we need to access the counters, so we are only removing complexity, not adding complexity.
[7 Nov 2011 17:24]
Jon Stephens
Fixed in 5.6.4. Closed.
[7 Nov 2011 18:25]
Paul DuBois
Changelog entry: Deadlock could occur when these four things happened at the same time: 1) An old dump thread was waiting for the binary log to grow. 2) The slave server that replicates from the old dump thread tried to reconnect. During reconnection, the new dump thread tried to kill the old dump thread. 3) A KILL statement tried to kill the old dump thread. 4) An INSERT statement caused a binary log rotation.
[28 Mar 2012 11:08]
Simon Mudd
Even though 5.6 may be out "soon", 5.5 is GA. So how can you close this ticket when you fix the problem in the current "development version", without commenting on the problem that's reported in the GA MySQL 5.5 version? You even mark this as S1: Critical...
[28 Mar 2012 11:30]
MySQL Verification Team
I checked the code, and it really looks fixed in 5.5.19. Patch is visible here: http://lists.mysql.com/commits/141642 And listed here: http://dev.mysql.com/doc/refman/5.5/en/news-5-5-19.html
[29 Mar 2012 5:42]
Simon Mudd
Shane, thanks for checking and confirming that this is fixed in the current GA (5.5) version. That missing info is important and it is good it's now provided in this bug report.