Bug #87326 deadlock in binary log.
Submitted: 5 Aug 2017 3:21 Modified: 23 Aug 2017 17:58
Reporter: Bernardo Perez Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.27 - 5.6.35 OS:Any
Assigned to: CPU Architecture:Any

[5 Aug 2017 3:21] Bernardo Perez
Description:
I am able to reproduce this on 5.6.27 and 5.6.25 (although less often than 5.6.27) 

The scenario is as follows:

1 master with 4 slaves. Insert into 2 tables that have triggers to insert in another 2 tables (historical). 

sync-binlog = '2'

When we do the backup, with a script we change the parameter to sync-binlog = '1' and then run a "flush logs;" after that we take a backup on the storage layer. 

How to repeat:
Start a load on primary tables, these tables will replicate to the historical tables. I ran 20 sessions against each table, each one of the sessions pulls from a file that has about 10Million inserts however I never fully load tables, the intent it's to have consistent write load. 

USE bench;

drop table bench.table;

CREATE TABLE bench.table1 (
  id bigint(20) unsigned zerofill NOT NULL AUTO_INCREMENT,
  char1 varchar(64) NOT NULL DEFAULT '',
  char2 varchar(64) NOT NULL DEFAULT '',
  char3 varchar(64) DEFAULT NULL,
  date1 datetime DEFAULT NULL,
  date2 datetime NOT NULL,
  PRIMARY KEY (id),
  KEY char1_char2_INDEX (char1,char2)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;

drop table bench.table1_replica;

CREATE TABLE bench.table1_replica (
  id bigint(20) unsigned zerofill NOT NULL,
  char1 varchar(64) NOT NULL DEFAULT '',
  char2 varchar(64) NOT NULL DEFAULT '',
  char3 varchar(64) DEFAULT NULL,
  date1 datetime NOT NULL DEFAULT '2017-01-01 00:00:00',
  date2 datetime NOT NULL,
  PRIMARY KEY (id,created_at),
  KEY char1_char2_INDEX (char1,char2,date1),
  KEY char3 (char3)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;

drop table bench.table1_info;

CREATE TABLE bench.table1_info (
  id bigint(20) unsigned zerofill NOT NULL AUTO_INCREMENT,
  char1 varchar(255) NOT NULL,
  char2 varchar(64) DEFAULT NULL,
  char3 varchar(255) DEFAULT NULL,
  char4 varchar(45) DEFAULT NULL,
  data text COMMENT 'This hold the additional comments',
  bingint1 bigint(20) DEFAULT NULL,
  char5 varchar(10) DEFAULT NULL,
  char6 varchar(16) DEFAULT NULL,
  date1 datetime DEFAULT NULL,
  date2 datetime NOT NULL,
  PRIMARY KEY (session_id),
  KEY bingint1 (bingint1),
  KEY char2_date1 (char2,date1)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;

drop table bench.table1_info_replica;

CREATE TABLE bench.table1_info_replica (
  id bigint(20) unsigned zerofill NOT NULL,
  char1 varchar(255) NOT NULL,
  char2 varchar(16) DEFAULT NULL,
  char3 varchar(255) DEFAULT NULL,
  char4 varchar(45) DEFAULT NULL,
  data text COMMENT 'This hold the additional comments',
  bingint1 bigint(20) DEFAULT NULL,
  char5 varchar(10) DEFAULT NULL,
  char6 varchar(16) DEFAULT NULL,
  date1 datetime NOT NULL DEFAULT '2017-01-01 00:00:00',
  date2 datetime NOT NULL,
  PRIMARY KEY (id,date1),
  KEY permanent_user_id_INDEX (char1),
  KEY credentials_id_indx (bigint1)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;

USE bench;

#drop trigger bench.tmp_table1_ins_trig;

DELIMITER $$
CREATE TRIGGER bench.tmp_table1_ins_trig
    AFTER INSERT ON bench.table1
    FOR EACH ROW
BEGIN
    INSERT INTO bench.table1_replica
        (id,
        char1,
        char2,
        char3,
        date1,
        date2)
        VALUES
        (new.id,
        new.char1,
        new.char2,
        new.char3,
        new.date1,
        new.date2);
END$$
DELIMITER ;

drop trigger bench.table1_info_ins_trig;

DELIMITER $$
CREATE TRIGGER bench.table1_info_ins_trig
    AFTER INSERT ON bench.table1_info
    FOR EACH ROW
BEGIN
    INSERT IGNORE INTO bench.table1_info_replica
                (  id ,
                   char1 ,
                   char2 ,
                   char3 ,
                   char4 ,
                   data ,
                   bingint1 ,
                   char5 ,
                   char6 ,
                   date1,
                   date2 )
                VALUES
                (  new.id ,
                   new.char1 ,
                   new.char2 ,
                   new.char3 ,
                   new.char4 ,
                   new.data ,
                   new.bingint1 ,
                   new.char5 ,
                   new.char6 ,
                   new.date1,
                   new.date2);
END$$
DELIMITER ;

Master my.cnf

[mysqld]
event-scheduler = 'ON'
explicit_defaults_for_timestamp
gtid-mode = 'OFF'
innodb_log_buffer_size = '58720256'
innodb_log_file_size = '2147483648'
innodb_monitor_enable = 'ALL'
innodb_print_all_deadlocks = '1'
key_buffer_size = '16777216'
innodb-buffer-pool-size = '96545620992'
innodb-file-per-table = '1'
innodb-flush-log-at-trx-commit = '2'
innodb-flush-method = 'O_DIRECT'
binlog_cache_size = '32768'
binlog_format = 'MIXED'
default_storage_engine = 'InnoDB'
read_buffer_size = '262144'
read_only = '0'
read_rnd_buffer_size = '524288'
relay-log-info-repository = 'TABLE'
relay_log_recovery = '1'
secure-file-priv = '/tmp'
server-id = '358069451'
skip-slave-start = '1'
slow_launch_time = '3'
slow-query-log = '1'
socket = '/tmp/mysql.sock'
sync-binlog = '2'
table_open_cache_instances = '16'
thread_stack = '262144'
wait_timeout = '28800'
local-infile = '1'
log_bin_trust_function_creators = '1'
log-output = 'TABLE'
log-slave-updates = '1'
long_query_time = '3'
master-info-repository = 'TABLE'
max_allowed_packet = '1073741824'
max_binlog_size = '134217728'
max_connect_errors = '200'
max_connections = '12000'
performance_schema = '1'
[5 Aug 2017 3:22] Bernardo Perez
Looks related to:
https://bugs.mysql.com/bug.php?id=68569
https://bugs.mysql.com/bug.php?id=68251

This thread seems rotating the log : MYSQL_BIN_LOG::rotate_and_purge 

Thread 4 (Thread 0x2ba47d2f4700 (LWP 24464)):
#0 0x00002b8b604d5265 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00002b8b604d0dc1 in _L_lock_816 () from /lib64/libpthread.so.0
#2 0x00002b8b604d0cc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00000000008844ac in inline_mysql_mutex_lock (src_file=0xc2f178 "/local/mysql-5.6.35.R1/sql/binlog.cc", src_line=5560, that=0x1352188 <mysql_bin_log+8>) at /local/mysql-5.6.35.R1/include/mysql/psi/mysql_thread.h:688
#4 MYSQL_BIN_LOG::rotate_and_purge (this=0x1352180 <mysql_bin_log>, thd=thd@entry=0x2ba4093a2000, force_rotate=force_rotate@entry=true) at /local/mysql-5.6.35.R1/sql/binlog.cc:5560
#5 0x0000000000702aff in reload_acl_and_cache (thd=thd@entry=0x2ba4093a2000, options=16130, tables=tables@entry=0x0, write_to_binlog=write_to_binlog@entry=0x2ba47d2f2fd0) at /local/mysql-5.6.35.R1/sql/sql_reload.cc:175
#6 0x00000000006e96cc in mysql_execute_command (thd=thd@entry=0x2ba4093a2000) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:4322
#7 0x00000000006ee898 in mysql_parse (thd=thd@entry=0x2ba4093a2000, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x2ba47d2f3730) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:6592
#8 0x00000000006effd0 in dispatch_command (command=COM_QUERY, thd=0x2ba4093a2000, packet=<optimized out>, packet_length=<optimized out>) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1376
#9 0x00000000006f1e74 in do_command (thd=<optimized out>) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1043
#10 0x00000000006bd682 in do_handle_one_connection (thd_arg=thd_arg@entry=0x2ba40927e000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:1024
#11 0x00000000006bd730 in handle_one_connection (arg=arg@entry=0x2ba40927e000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:941
#12 0x0000000000b17156 in pfs_spawn_thread (arg=0x2ba397b13400) at /local/mysql-5.6.35.R1/storage/perfschema/pfs.cc:1860
#13 0x00002b8b604cef18 in start_thread () from /lib64/libpthread.so.0
#14 0x00002b8b61618b2d in clone () from /lib64/libc.so.6

.
.
.

This thread seems to be committing into the binary log in an ordered manner and in group commit after we issue the flush. 

Thread 32 (Thread 0x2b97bab61700 (LWP 29395)):
#0 0x00002b8b604d5265 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00002b8b604d0dc1 in _L_lock_816 () from /lib64/libpthread.so.0
#2 0x00002b8b604d0cc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000087de08 in inline_mysql_mutex_lock (src_file=0xc2f178 "/local/mysql-5.6.35.R1/sql/binlog.cc", src_line=6835, that=0x1352188 <mysql_bin_log+8>) at /local/mysql-5.6.35.R1/include/mysql/psi/mysql_thread.h:688
#4 MYSQL_BIN_LOG::change_stage (this=<optimized out>, thd=<optimized out>, stage=<optimized out>, queue=<optimized out>, leave_mutex=<optimized out>, enter_mutex=0x1352188 <mysql_bin_log+8>) at /local/mysql-5.6.35.R1/sql/binlog.cc:6835
#5 0x0000000000884fd9 in MYSQL_BIN_LOG::ordered_commit (this=this@entry=0x1352180 <mysql_bin_log>, thd=thd@entry=0x2ba4091bc000, all=all@entry=false, skip_commit=skip_commit@entry=false) at /local/mysql-5.6.35.R1/sql/binlog.cc:7147
#6 0x0000000000885b6d in MYSQL_BIN_LOG::commit (this=0x1352180 <mysql_bin_log>, thd=0x2ba4091bc000, all=<optimized out>) at /local/mysql-5.6.35.R1/sql/binlog.cc:6547
#7 0x00000000005b55d4 in ha_commit_trans (thd=thd@entry=0x2ba4091bc000, all=all@entry=false, ignore_global_read_lock=ignore_global_read_lock@entry=false) at /local/mysql-5.6.35.R1/sql/handler.cc:1436
#8 0x000000000076d529 in trans_commit_stmt (thd=thd@entry=0x2ba4091bc000) at /local/mysql-5.6.35.R1/sql/transaction.cc:434
#9 0x00000000006e81c4 in mysql_execute_command (thd=thd@entry=0x2ba4091bc000) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:5203
#10 0x00000000006ee898 in mysql_parse (thd=thd@entry=0x2ba4091bc000, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x2b97bab60730) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:6592
#11 0x00000000006effd0 in dispatch_command (command=COM_QUERY, thd=0x2ba4091bc000, packet=<optimized out>, packet_length=<optimized out>) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1376
#12 0x00000000006f1e74 in do_command (thd=<optimized out>) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1043
#13 0x00000000006bd682 in do_handle_one_connection (thd_arg=thd_arg@entry=0x2ba409280000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:1024
#14 0x00000000006bd730 in handle_one_connection (arg=arg@entry=0x2ba409280000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:941
#15 0x0000000000b17156 in pfs_spawn_thread (arg=0x2b8b62035480) at /local/mysql-5.6.35.R1/storage/perfschema/pfs.cc:1860
#16 0x00002b8b604cef18 in start_thread () from /lib64/libpthread.so.0
#17 0x00002b8b61618b2d in clone () from /lib64/libc.so.6
[5 Aug 2017 3:22] Bernardo Perez
Two sender threads.

Thread 47 (Thread 0x2b8b61b96700 (LWP 28318)):
#0 0x00002b8b604d5265 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00002b8b604d0dc1 in _L_lock_816 () from /lib64/libpthread.so.0
#2 0x00002b8b604d0cc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00000000008631c8 in inline_mysql_mutex_lock (src_line=1203, src_file=0xc2d7d8 "/local/mysql-5.6.35.R1/sql/log_event.cc", that=0x1352188 <mysql_bin_log+8>) at /local/mysql-5.6.35.R1/include/mysql/psi/mysql_thread.h:688
#4 Log_event::read_log_event (file=file@entry=0x2b8b61b945a0, packet=packet@entry=0x2ba464bdf598, log_lock=log_lock@entry=0x1352188 <mysql_bin_log+8>, checksum_alg_arg=checksum_alg_arg@entry=1 '\001', log_file_name_arg=log_file_name_arg@entry=0x2b8b61b94c30 "/rdsdbdata/log/binlog/mysql-bin-changelog.001733", is_binlog_active=is_binlog_active@entry=0x2b8b61b94550) at /local/mysql-5.6.35.R1/sql/log_event.cc:1203
#5 0x000000000088d607 in mysql_binlog_send (thd=thd@entry=0x2ba464bdf000, log_ident=0x2ba40c421010 "mysql-bin-changelog.000928", pos=21625659, pos@entry=422, slave_gtid_executed=slave_gtid_executed@entry=0x0, flags=flags@entry=0) at /local/mysql-5.6.35.R1/sql/rpl_master.cc:1242
#6 0x000000000088f4d3 in com_binlog_dump (thd=thd@entry=0x2ba464bdf000, packet=packet@entry=0x2ba464be3001 "", packet_length=packet_length@entry=36) at /local/mysql-5.6.35.R1/sql/rpl_master.cc:765
#7 0x00000000006f0724 in dispatch_command (command=COM_BINLOG_DUMP, thd=0x2ba464bdf000, packet=0x2ba464be3001 "", packet_length=36) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1591
#8 0x00000000006f1e74 in do_command (thd=<optimized out>) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1043
#9 0x00000000006bd682 in do_handle_one_connection (thd_arg=thd_arg@entry=0x2ba464bdf000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:1024
#10 0x00000000006bd730 in handle_one_connection (arg=arg@entry=0x2ba464bdf000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:941
#11 0x0000000000b17156 in pfs_spawn_thread (arg=0x2b8b62035480) at /local/mysql-5.6.35.R1/storage/perfschema/pfs.cc:1860
#12 0x00002b8b604cef18 in start_thread () from /lib64/libpthread.so.0
#13 0x00002b8b61618b2d in clone () from /lib64/libc.so.6

Thread 48 (Thread 0x2b8b6024e700 (LWP 28317)):
#0  0x00002b8b604d5265 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00002b8b604d0dc1 in _L_lock_816 () from /lib64/libpthread.so.0
#2  0x00002b8b604d0cc7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000580f74 in inline_mysql_mutex_lock (that=0x1352188 <mysql_bin_log+8>, src_file=<optimized out>, src_line=<optimized out>) at /local/mysql-5.6.35.R1/include/mysql/psi/mysql_thread.h:688
#4  0x000000000088d358 in mysql_binlog_send (thd=thd@entry=0x2ba464bd6000, log_ident=0x2ba40c021010 "mysql-bin-changelog.000928", pos=21626226, pos@entry=422, slave_gtid_executed=slave_gtid_executed@entry=0x0, flags=flags@entry=0) at /local/mysql-5.6.35.R1/sql/rpl_master.cc:1606
#5  0x000000000088f4d3 in com_binlog_dump (thd=thd@entry=0x2ba464bd6000, packet=packet@entry=0x2ba464bda001 "", packet_length=packet_length@entry=36) at /local/mysql-5.6.35.R1/sql/rpl_master.cc:765
#6  0x00000000006f0724 in dispatch_command (command=COM_BINLOG_DUMP, thd=0x2ba464bd6000, packet=0x2ba464bda001 "", packet_length=36) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1591
#7  0x00000000006f1e74 in do_command (thd=<optimized out>) at /local/mysql-5.6.35.R1/sql/sql_parse.cc:1043
#8  0x00000000006bd682 in do_handle_one_connection (thd_arg=thd_arg@entry=0x2ba464bd6000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:1024
#9  0x00000000006bd730 in handle_one_connection (arg=arg@entry=0x2ba464bd6000) at /local/mysql-5.6.35.R1/sql/sql_connect.cc:941
#10 0x0000000000b17156 in pfs_spawn_thread (arg=0x2ba397b12800) at /local/mysql-5.6.35.R1/storage/perfschema/pfs.cc:1860
#11 0x00002b8b604cef18 in start_thread () from /lib64/libpthread.so.0
#12 0x00002b8b61618b2d in clone () from /lib64/libc.so.6
[5 Aug 2017 3:35] Bernardo Perez
Pstack 5.6.27

Attachment: pstack5627.txt (text/plain), 153.68 KiB.

[5 Aug 2017 3:36] Bernardo Perez
pstack 5.6.35

Attachment: pstack5635.txt (text/plain), 144.46 KiB.

[5 Aug 2017 23:47] Bernardo Perez
I've run multiple tests against 5.6.35 and 5.6.27.

The ratio of this happening in 5.6.35 its around 20% 
The ratio of this happening in 5.6.27 its around 50%
[7 Aug 2017 8:27] Bernardo Perez
Adjusting to right version
[7 Aug 2017 17:24] MySQL Verification Team
Hi,

I tried first reproducing this on 5.7 without any success. I'll test 5.6 now, but before that I want to confirm the "deadlock" with you.

When you have this write stream on master and you experience this "deadlock" - what happens exactly? You see that *all* write's are waiting and any new write is not executing also and if you stop the write load and leave the master for a while it never aborts any of the queries and just stays in locked state? Or you see a different behavior, where you have transactions aborted after a while?

Thanks
Bogdan
[7 Aug 2017 18:41] MySQL Verification Team
Hi,

Few questions
 - sync-binlog = '1' change, you do this globally or session wise?
 - flush logs, you are doing this with a user that has permissions to do this?

I'm in contact with our dev team as discussing with colleagues this might be fixed in 5.7 but was "impossible" (too much changes for GA version) to be fixed in 5.6. Waiting for confirmation

all best
Bogdan
[8 Aug 2017 9:49] Bernardo Perez
Hello Bogdan,

First of all, thanks for taking a look into this.

I will provide detailed info to your questions later on. Gathering info.

I've run 50 tests, 20 of them on 5.7.17 30 of them on 5.6.27 and 5.6.35.

I was not able to repro this in 5.7.17

The ratio of this happening on the 5.6 versions I've mentioned it before.

The ratio of this happening in 5.6.35 its around 20% 
The ratio of this happening in 5.6.27 its around 50%

I will update you later.

Thanks for the help.
[8 Aug 2017 9:58] MySQL Verification Team
Hi,

> I've run 50 tests, 20 of them on 5.7.17 30 of them on 5.6.27 and 5.6.35.
> I was not able to repro this in 5.7.17

This is consistent to what I find. I'm doing the final checks but if I'm right this will not be fixed in 5.6, the only solution is upgrading to 5.7

all best
Bogdan
[8 Aug 2017 11:36] Bernardo Perez
Hello Bogdan,

Sorry for the time on reply but I wanted to be 100% sure I got things right.

> Few questions
> - sync-binlog = '1' change, you do this globally or session wise?
> - flush logs, you are doing this with a user that has permissions to do this?

We set sync-binlog globally as: set global sync_binlog =
We run the flush logs with a user with "ALL PRIVILEGES"

Trying to repro it again to confirm on the previous questions.

>You see that *all* write's are waiting and any new write is not executing also and if you stop the write load and >leave the master for a while it never aborts any of the queries and just stays in locked state? 
Our experience is that everything stalls and does not matter if we kill the sessions.

I am reproducing this again to double check.

Regards,
[8 Aug 2017 13:40] Bernardo Perez
Hello again,

so, double checked for you.

>You see that *all* write's are waiting and any new write is not executing also and if you stop the write load and >leave the master for a while it never aborts any of the queries and just stays in locked state? 

It happens, I let it run until inserts were reporting 1400 seconds in show processlist. 

I ran a kill on all the processes. 

They stayed another 800 seconds there in killed and not moving.

Right after a new write was again stuck. 

The only way out of this is killing mysqld and start it again. 

I think this covers all the questions. 

Eager to know your findings. 

Regards,
Bernardo P.
[9 Aug 2017 10:54] Bernardo Perez
Hello Bogdan,

I was wondering if could it be possible for you sharing any updates on the investigation. 

Thanks in advance.

Regards,
Bernardo P.
[10 Aug 2017 11:41] Bernardo Perez
Hello,

Would it be possible to get a confirmation on if this is a bug and/or it's not happening in 5.7? That would help us to take an operational decision.

Thanks in advance.

Regards,
Bernardo P.
[10 Aug 2017 12:19] MySQL Verification Team
Hi Bernardo,

So far I can't reproduce this on 5.7 and am still trying to figure out if this is the bug that could not be fixed on 5.6. That info I will have to come back with but as for the research so far
 - reproducible on 5.6
 - not reproducible (so far) on 5.7

further info I'll share as I get it
all best
Bogdan
[21 Aug 2017 18:40] MySQL Verification Team
Hi,
Nothing new, I am reproducing this on 5.6, this is fixed in 5.7, I don't see a separate bug report for the fix for 5.7 but the fix for this problem is "too big" to go into running GA version (5.6 in this case) so this won't be fixed in 5.6 and is already fix in 5.7+ so not much I can do with it any more.

I'm setting it to verified, as it is verified in 5.6 but I doubt the fix will happen so the only solution really is to upgrade

all best
Bogdan
[23 Aug 2017 17:58] Bernardo Perez
Hello Bogdan,

Thanks for the updates and the work you have done. 

This is confirming what I thought regarding of 5.7 not being impacted by the rework on the access of the reader to the binary log. 

I am following the report for future updates.

Regards,
Bernardo P.