Bug #68569 | deadlock with group commit | ||
---|---|---|---|
Submitted: | 5 Mar 2013 4:31 | Modified: | 28 May 2013 6:56 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.6.10 | OS: | Any |
Assigned to: | Bill Qu | CPU Architecture: | Any |
[5 Mar 2013 4:31]
Mark Callaghan
[5 Mar 2013 4:32]
Mark Callaghan
Show processlist: +------+------+----------------------------------+------+---------+-------+-------------+------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+------+----------------------------------+------+---------+-------+-------------+------------------------------------------------------------------------------------------+ | 612 | root | udb10061.frc1.facebook.com:25896 | test | Query | 10121 | query end | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=2307474 | | 613 | root | udb10061.frc1.facebook.com:25897 | test | Query | 10121 | query end | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=8642682 | | 614 | root | udb10061.frc1.facebook.com:25898 | test | Query | 10121 | query end | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=11122844 | | 615 | root | udb10061.frc1.facebook.com:25899 | test | Query | 10121 | query end | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=8353335 | | 616 | root | udb10061.frc1.facebook.com:25900 | test | Query | 10121 | query end | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=3409834 | | 617 | root | udb10061.frc1.facebook.com:25901 | test | Query | 10121 | query end | UPDATE sbtest4 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=7204922 | | 618 | root | udb10061.frc1.facebook.com:25902 | test | Query | 10121 | query end | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=6968611 | | 619 | root | udb10061.frc1.facebook.com:25903 | test | Query | 10121 | query end | UPDATE sbtest4 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=556993 | | 620 | root | udb10061.frc1.facebook.com:25904 | test | Query | 10121 | query end | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=13036692 | ...
[5 Mar 2013 4:37]
Mark Callaghan
Full thread stacks ... 124 threads with this... Thread 130 (Thread 0x7fc2fa944700 (LWP 30427)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x000000000081a253 in inline_mysql_cond_wait (mutex=<optimized out>, that=<optimized out>) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:1162 #2 Stage_manager::enroll_for (this=0x12496c0, stage=<optimized out>, thd=0x7fb187fa6000, stage_mutex=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:1359 #3 0x000000000081fcd0 in MYSQL_BIN_LOG::change_stage (this=<optimized out>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>, enter_mutex=0x12487c8) at /s/bld/orig5610/sql/binlog.cc:6156 #4 0x0000000000820c6c in MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb187fa6000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6352 #5 0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb187fa6000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910 #6 0x000000000059591c in ha_commit_trans (thd=0x7fb187fa6000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420 #7 0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb187fa6000) at /s/bld/orig5610/sql/transaction.cc:373 #8 0x00000000006a81fc in mysql_execute_command (thd=0x7fb187fa6000) at /s/bld/orig5610/sql/sql_parse.cc:4932 #9 0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb187fa6000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149 #10 mysql_parse (thd=0x7fb187fa6000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022 #11 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb187fa6000, packet=0x7fc2fa943350 "", packet_length=3317866512) at /s/bld/orig5610/sql/sql_parse.cc:1334 #12 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969 #13 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885 #14 0x00007fc2fa59bf81 in start_thread (arg=0x7fc2fa944700) at pthread_create.c:301 #15 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 2 threads with this: Thread 75 (Thread 0x7fb1b93be700 (LWP 7869)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007fc2fa59e3d1 in _L_lock_885 () from /usr/local/fbcode/gcc-4.6.2-glibc-2.13/lib/libpthread.so.0 #2 0x00007fc2fa59e16c in __pthread_mutex_lock (mutex=0x12487c8) at pthread_mutex_lock.c:61 #3 0x0000000000820e8c in inline_mysql_mutex_lock (that=0x12487c8) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:688 #4 MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb1bbd8b000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6458 #5 0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb1bbd8b000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910 #6 0x000000000059591c in ha_commit_trans (thd=0x7fb1bbd8b000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420 #7 0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb1bbd8b000) at /s/bld/orig5610/sql/transaction.cc:373 #8 0x00000000006a81fc in mysql_execute_command (thd=0x7fb1bbd8b000) at /s/bld/orig5610/sql/sql_parse.cc:4932 #9 0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb1bbd8b000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149 #10 mysql_parse (thd=0x7fb1bbd8b000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022 #11 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb1bbd8b000, packet=0x7fb1b93bd350 "", packet_length=3263250448) at /s/bld/orig5610/sql/sql_parse.cc:1334 #12 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969 #13 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885 #14 0x00007fc2fa59bf81 in start_thread (arg=0x7fb1b93be700) at pthread_create.c:301 #15 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
[5 Mar 2013 4:37]
Mark Callaghan
Full thread stacks ... 1 thread with this: Thread 76 (Thread 0x7fb1b87ff700 (LWP 7868)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x000000000081f65d in inline_mysql_cond_wait (mutex=0x1248c48, that=0x1249680) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:1162 #2 MYSQL_BIN_LOG::new_file_impl (this=0x12487c0, need_lock_log=false, extra_description_event=0x0) at /s/bld/orig5610/sql/binlog.cc:4353 #3 0x000000000081fff0 in MYSQL_BIN_LOG::rotate (this=0x12487c0, force_rotate=<optimized out>, check_purge=0x7fb1b87fbc6f) at /s/bld/orig5610/sql/binlog.cc:4878 #4 0x0000000000820e9d in MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb1bbd82000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6459 #5 0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb1bbd82000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910 #6 0x000000000059591c in ha_commit_trans (thd=0x7fb1bbd82000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420 #7 0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb1bbd82000) at /s/bld/orig5610/sql/transaction.cc:373 #8 0x00000000006a81fc in mysql_execute_command (thd=0x7fb1bbd82000) at /s/bld/orig5610/sql/sql_parse.cc:4932 #9 0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb1bbd82000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149 #10 mysql_parse (thd=0x7fb1bbd82000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022 #11 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb1bbd82000, packet=0x7fb1b87fe350 "", packet_length=3246800912) at /s/bld/orig5610/sql/sql_parse.cc:1334 #12 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969 #13 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885 #14 0x00007fc2fa59bf81 in start_thread (arg=0x7fb1b87ff700) at pthread_create.c:301 #15 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 1 thread with this: Thread 102 (Thread 0x7fc2fa8c2700 (LWP 7842)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007fc2fa59e3d1 in _L_lock_885 () from /usr/local/fbcode/gcc-4.6.2-glibc-2.13/lib/libpthread.so.0 #2 0x00007fc2fa59e16c in __pthread_mutex_lock (mutex=0x12487c8) at pthread_mutex_lock.c:61 #3 0x000000000081fce1 in inline_mysql_mutex_lock (that=0x12487c8) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:688 #4 change_stage (enter_mutex=0x12487c8, this=<optimized out>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6161 #5 MYSQL_BIN_LOG::change_stage (this=<optimized out>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>, enter_mutex=0x12487c8) at /s/bld/orig5610/sql/binlog.cc:6141 #6 0x0000000000820c6c in MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb1bbc78000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6352 #7 0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb1bbc78000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910 #8 0x000000000059591c in ha_commit_trans (thd=0x7fb1bbc78000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420 #9 0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb1bbc78000) at /s/bld/orig5610/sql/transaction.cc:373 #10 0x00000000006a81fc in mysql_execute_command (thd=0x7fb1bbc78000) at /s/bld/orig5610/sql/sql_parse.cc:4932 #11 0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb1bbc78000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149 #12 mysql_parse (thd=0x7fb1bbc78000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022 #13 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb1bbc78000, packet=0x7fc2fa8c1350 "", packet_length=2923560976) at /s/bld/orig5610/sql/sql_parse.cc:1334 #14 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969 #15 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885 #16 0x00007fc2fa59bf81 in start_thread (arg=0x7fc2fa8c2700) at pthread_create.c:301 #17 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
[5 Mar 2013 4:44]
Mark Callaghan
is this the same as http://bugs.mysql.com/bug.php?id=68251
[5 Mar 2013 4:48]
Mark Callaghan
my guess is that the thread in new_file_impl still has LOCK_log and that blocks the thread in change_state for which enter_mutex==LOCK_log. So I think this is the same as Yoshi's bug. This is pretty easy to reproduce. Can we get code coverage into the regression suite?
[5 Mar 2013 4:59]
Mark Callaghan
Not sure whether this is ironic, but group commit for the Facebook patch in MySQL 5.1 doesn't have this problem. A variable is set ("stop_new_xids") in ::new_file_impl and then LOCK_log is released. I added code to avoid it. My memory is vague, but I though Kristian solved this as well.
[5 Mar 2013 6:03]
Mark Callaghan
Looking at more code make me less sure this is a dup of Yoshi's bug. But that would mean there were 2 deadlock bugs in new replication code.
[11 Mar 2013 20:18]
MySQL Verification Team
having trouble repeating this (at least on windows). This happens after a while which probably indicates something bad. Version: '5.6.10-log' socket: '' port: 3307 MySQL Community Server (GPL) 2013-03-11 22:14:40 10600 [ERROR] MYSQL_BIN_LOG::open_purge_index_file failed to open register file. 2013-03-11 22:14:40 10600 [ERROR] MYSQL_BIN_LOG::open failed to sync the index file. 2013-03-11 22:14:40 10600 [ERROR] Could not open .\test-bin.000139 for logging (error 13). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
[12 Mar 2013 7:38]
MySQL Verification Team
Sorry, I had no luck repeating this yet!
[14 Mar 2013 12:55]
Bill Qu
The problem can be reproduced occasionally by a stress test case. The root cause is that some session were missed between stage 1 and stage 3 of the ordered commit.
[14 Mar 2013 12:58]
Bill Qu
I'm looking into the problem.
[26 Mar 2013 19:11]
NICOLAS KUCHARSKI
Dao / Shane, any updates on this?
[28 May 2013 6:56]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html
[28 May 2013 7:14]
Jon Stephens
Fixed in 5.6+, together with BUG#68521. Documented fixes in the 5.6.12 and 5.7.2 changelogs, as follows: Deadlocks could sometimes occur on group commits with a high number of concurrent updates, as well as when one client held a lock from a commit while another client imposed a lock while rotating the binary log. Closed.