Bug #68569 deadlock with group commit
Submitted: 5 Mar 2013 4:31 Modified: 28 May 2013 6:56
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.6.10 OS:Any
Assigned to: Bill Qu CPU Architecture:Any

[5 Mar 2013 4:31] Mark Callaghan
Description:
I get a deadlock when running sysbench for an update-only workload with all data cached in innodb. it occurs with 128 concurrent clients. The workload is auto-commit update statements that each update 1 row. The database has 8 tables & 16M rows/table. Clients are uniformly distributed across tables.

The problem stack traces are:

124 threads with:
pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,Stage_manager::enroll_for,MYSQL_BIN_LOG::change_stage,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,start_thread,clone

2 threads with:
__lll_lock_wait,_L_lock_885,__pthread_mutex_lock,inline_mysql_mutex_lock,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,start_thread,clone

1 thread with:
pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,MYSQL_BIN_LOG::new_file_impl,MYSQL_BIN_LOG::rotate,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,start_thread,clone

1 thread with:
__lll_lock_wait,_L_lock_885,__pthread_mutex_lock,inline_mysql_mutex_lock,change_stage,MYSQL_BIN_LOG::change_stage,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,start_thread,clone

How to repeat:
my.cnf:

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES 
innodb_log_file_size=1900M
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
table-definition-cache=1000
table-open-cache=2000
max_connections=2000
key_buffer_size=200M
innodb_io_capacity=1000
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
query_cache_size=0
query_cache_type=0
innodb_thread_concurrency=0
metadata_locks_hash_instances=256
innodb_checksum_algorithm=CRC32
log_bin
sync_binlog=1

query_cache_size=0
query_cache_type=0
innodb_thread_concurrency=32
innodb_buffer_pool_size=64g
innodb_io_capacity=8192
loose-innodb_compression_level=1
loose-innodb_fast_free_list=0
loose-innodb_buffer_pool_instances=8
innodb_adaptive_hash_index=1
loose-table_open_cache_instances=1
loose-innodb_lru_scan_depth=8192
binlog_max_flush_queue_time=0

Suggested fix:
?
[5 Mar 2013 4:32] Mark Callaghan
Show processlist:

+------+------+----------------------------------+------+---------+-------+-------------+------------------------------------------------------------------------------------------+
| Id   | User | Host                             | db   | Command | Time  | State       | Info                                                                                     |
+------+------+----------------------------------+------+---------+-------+-------------+------------------------------------------------------------------------------------------+
|  612 | root | udb10061.frc1.facebook.com:25896 | test | Query   | 10121 | query end   | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=2307474  |
|  613 | root | udb10061.frc1.facebook.com:25897 | test | Query   | 10121 | query end   | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=8642682  |
|  614 | root | udb10061.frc1.facebook.com:25898 | test | Query   | 10121 | query end   | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=11122844 |
|  615 | root | udb10061.frc1.facebook.com:25899 | test | Query   | 10121 | query end   | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=8353335  |
|  616 | root | udb10061.frc1.facebook.com:25900 | test | Query   | 10121 | query end   | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=3409834  |
|  617 | root | udb10061.frc1.facebook.com:25901 | test | Query   | 10121 | query end   | UPDATE sbtest4 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=7204922  |
|  618 | root | udb10061.frc1.facebook.com:25902 | test | Query   | 10121 | query end   | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=6968611  |
|  619 | root | udb10061.frc1.facebook.com:25903 | test | Query   | 10121 | query end   | UPDATE sbtest4 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=556993   |
|  620 | root | udb10061.frc1.facebook.com:25904 | test | Query   | 10121 | query end   | UPDATE sbtest1 set c= (case when c > 1000000000 then 0 else c end) + 1 where id=13036692 |
...
[5 Mar 2013 4:37] Mark Callaghan
Full thread stacks ...

124 threads with this...

Thread 130 (Thread 0x7fc2fa944700 (LWP 30427)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000081a253 in inline_mysql_cond_wait (mutex=<optimized out>, that=<optimized out>) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:1162
#2  Stage_manager::enroll_for (this=0x12496c0, stage=<optimized out>, thd=0x7fb187fa6000, stage_mutex=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:1359
#3  0x000000000081fcd0 in MYSQL_BIN_LOG::change_stage (this=<optimized out>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>, enter_mutex=0x12487c8) at /s/bld/orig5610/sql/binlog.cc:6156
#4  0x0000000000820c6c in MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb187fa6000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6352
#5  0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb187fa6000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910
#6  0x000000000059591c in ha_commit_trans (thd=0x7fb187fa6000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420
#7  0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb187fa6000) at /s/bld/orig5610/sql/transaction.cc:373
#8  0x00000000006a81fc in mysql_execute_command (thd=0x7fb187fa6000) at /s/bld/orig5610/sql/sql_parse.cc:4932
#9  0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb187fa6000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149
#10 mysql_parse (thd=0x7fb187fa6000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022
#11 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb187fa6000, packet=0x7fc2fa943350 "", packet_length=3317866512) at /s/bld/orig5610/sql/sql_parse.cc:1334
#12 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969
#13 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885
#14 0x00007fc2fa59bf81 in start_thread (arg=0x7fc2fa944700) at pthread_create.c:301
#15 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

2 threads with this:

Thread 75 (Thread 0x7fb1b93be700 (LWP 7869)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007fc2fa59e3d1 in _L_lock_885 () from /usr/local/fbcode/gcc-4.6.2-glibc-2.13/lib/libpthread.so.0
#2  0x00007fc2fa59e16c in __pthread_mutex_lock (mutex=0x12487c8) at pthread_mutex_lock.c:61
#3  0x0000000000820e8c in inline_mysql_mutex_lock (that=0x12487c8) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:688
#4  MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb1bbd8b000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6458
#5  0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb1bbd8b000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910
#6  0x000000000059591c in ha_commit_trans (thd=0x7fb1bbd8b000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420
#7  0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb1bbd8b000) at /s/bld/orig5610/sql/transaction.cc:373
#8  0x00000000006a81fc in mysql_execute_command (thd=0x7fb1bbd8b000) at /s/bld/orig5610/sql/sql_parse.cc:4932
#9  0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb1bbd8b000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149
#10 mysql_parse (thd=0x7fb1bbd8b000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022
#11 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb1bbd8b000, packet=0x7fb1b93bd350 "", packet_length=3263250448) at /s/bld/orig5610/sql/sql_parse.cc:1334
#12 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969
#13 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885
#14 0x00007fc2fa59bf81 in start_thread (arg=0x7fb1b93be700) at pthread_create.c:301
#15 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
[5 Mar 2013 4:37] Mark Callaghan
Full thread stacks ...

1 thread with this:

Thread 76 (Thread 0x7fb1b87ff700 (LWP 7868)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000081f65d in inline_mysql_cond_wait (mutex=0x1248c48, that=0x1249680) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:1162
#2  MYSQL_BIN_LOG::new_file_impl (this=0x12487c0, need_lock_log=false, extra_description_event=0x0) at /s/bld/orig5610/sql/binlog.cc:4353
#3  0x000000000081fff0 in MYSQL_BIN_LOG::rotate (this=0x12487c0, force_rotate=<optimized out>, check_purge=0x7fb1b87fbc6f) at /s/bld/orig5610/sql/binlog.cc:4878
#4  0x0000000000820e9d in MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb1bbd82000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6459
#5  0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb1bbd82000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910
#6  0x000000000059591c in ha_commit_trans (thd=0x7fb1bbd82000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420
#7  0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb1bbd82000) at /s/bld/orig5610/sql/transaction.cc:373
#8  0x00000000006a81fc in mysql_execute_command (thd=0x7fb1bbd82000) at /s/bld/orig5610/sql/sql_parse.cc:4932
#9  0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb1bbd82000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149
#10 mysql_parse (thd=0x7fb1bbd82000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022
#11 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb1bbd82000, packet=0x7fb1b87fe350 "", packet_length=3246800912) at /s/bld/orig5610/sql/sql_parse.cc:1334
#12 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969
#13 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885
#14 0x00007fc2fa59bf81 in start_thread (arg=0x7fb1b87ff700) at pthread_create.c:301
#15 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

1 thread with this:

Thread 102 (Thread 0x7fc2fa8c2700 (LWP 7842)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007fc2fa59e3d1 in _L_lock_885 () from /usr/local/fbcode/gcc-4.6.2-glibc-2.13/lib/libpthread.so.0
#2  0x00007fc2fa59e16c in __pthread_mutex_lock (mutex=0x12487c8) at pthread_mutex_lock.c:61
#3  0x000000000081fce1 in inline_mysql_mutex_lock (that=0x12487c8) at /s/bld/orig5610/include/mysql/psi/mysql_thread.h:688
#4  change_stage (enter_mutex=0x12487c8, this=<optimized out>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6161
#5  MYSQL_BIN_LOG::change_stage (this=<optimized out>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>, enter_mutex=0x12487c8) at /s/bld/orig5610/sql/binlog.cc:6141
#6  0x0000000000820c6c in MYSQL_BIN_LOG::ordered_commit (this=0x12487c0, thd=0x7fb1bbc78000, all=<optimized out>, skip_commit=<optimized out>) at /s/bld/orig5610/sql/binlog.cc:6352
#7  0x0000000000821116 in MYSQL_BIN_LOG::commit (this=0x12487c0, thd=0x7fb1bbc78000, all=false) at /s/bld/orig5610/sql/binlog.cc:5910
#8  0x000000000059591c in ha_commit_trans (thd=0x7fb1bbc78000, all=<optimized out>) at /s/bld/orig5610/sql/handler.cc:1420
#9  0x0000000000722ce7 in trans_commit_stmt (thd=0x7fb1bbc78000) at /s/bld/orig5610/sql/transaction.cc:373
#10 0x00000000006a81fc in mysql_execute_command (thd=0x7fb1bbc78000) at /s/bld/orig5610/sql/sql_parse.cc:4932
#11 0x00000000006ae7c8 in mysql_parse (parser_state=<optimized out>, thd=0x7fb1bbc78000, rawbuf=<optimized out>, length=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6149
#12 mysql_parse (thd=0x7fb1bbc78000, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /s/bld/orig5610/sql/sql_parse.cc:6022
#13 0x00000000006affa8 in dispatch_command (command=COM_QUERY, thd=0x7fb1bbc78000, packet=0x7fc2fa8c1350 "", packet_length=2923560976) at /s/bld/orig5610/sql/sql_parse.cc:1334
#14 0x000000000068273f in do_handle_one_connection (thd_arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:969
#15 0x00000000006827a9 in handle_one_connection (arg=<optimized out>) at /s/bld/orig5610/sql/sql_connect.cc:885
#16 0x00007fc2fa59bf81 in start_thread (arg=0x7fc2fa8c2700) at pthread_create.c:301
#17 0x00007fc2f92db85d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
[5 Mar 2013 4:44] Mark Callaghan
is this the same as http://bugs.mysql.com/bug.php?id=68251
[5 Mar 2013 4:48] Mark Callaghan
my guess is that the thread in new_file_impl still has LOCK_log and that blocks the thread in change_state for which enter_mutex==LOCK_log. So I think this is the same as Yoshi's bug.

This is pretty easy to reproduce. Can we get code coverage into the regression suite?
[5 Mar 2013 4:59] Mark Callaghan
Not sure whether this is ironic, but group commit for the Facebook patch in MySQL 5.1 doesn't have this problem. A variable is set ("stop_new_xids") in ::new_file_impl and then LOCK_log is released. I added code to avoid it. My memory is vague, but I though Kristian solved this as well.
[5 Mar 2013 6:03] Mark Callaghan
Looking at more code make me less sure this is a dup of Yoshi's bug. But that would mean there were 2 deadlock bugs in new replication code.
[11 Mar 2013 20:18] MySQL Verification Team
having trouble repeating this (at least on windows).  This happens after a while which probably indicates something bad.

Version: '5.6.10-log'  socket: ''  port: 3307  MySQL Community Server (GPL)
2013-03-11 22:14:40 10600 [ERROR] MYSQL_BIN_LOG::open_purge_index_file failed to open register  file.
2013-03-11 22:14:40 10600 [ERROR] MYSQL_BIN_LOG::open failed to sync the index file.
2013-03-11 22:14:40 10600 [ERROR] Could not open .\test-bin.000139 for logging (error 13). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
[12 Mar 2013 7:38] MySQL Verification Team
Sorry, I had no luck repeating this yet!
[14 Mar 2013 12:55] Bill Qu
The problem can be reproduced occasionally by a stress test case. The root cause is that some session were missed between stage 1 and stage 3 of the ordered commit.
[14 Mar 2013 12:58] Bill Qu
I'm looking into the problem.
[26 Mar 2013 19:11] NICOLAS KUCHARSKI
Dao / Shane, any updates on this?
[28 May 2013 6:56] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html
[28 May 2013 7:14] Jon Stephens
Fixed in 5.6+, together with BUG#68521. Documented fixes in the 5.6.12 and 5.7.2 changelogs, as follows:

      Deadlocks could sometimes occur on group commits with a high 
      number of concurrent updates, as well as when one client held a lock 
      from a commit while another client imposed a lock while rotating the 
      binary log.
    

Closed.