| Bug #93089 | parallel workers+slave_preserve_commit_order+flushtables with read lock deadlock | ||
|---|---|---|---|
| Submitted: | 6 Nov 2018 3:37 | Modified: | 7 Nov 2018 7:45 |
| Reporter: | ashe sun (OCA) | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 5.7 | OS: | Any |
| Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[6 Nov 2018 3:39]
ashe sun
2.1 Generate 2 transaction that can be apply on slave in parallel. (do insert on master.)
[6 Nov 2018 10:27]
MySQL Verification Team
Hi, Thanks for submission. Can you reproduce this issue without using GDB ? thanks bogdan
[6 Nov 2018 13:01]
MySQL Verification Team
reminds me of https://bugs.mysql.com/bug.php?id=89247
[7 Nov 2018 1:52]
gang chen
Hi Bogdan This bug can not repeat by debug_sync inner mysqld ? I am not sure about that ,but I will try. maybe you can run sysbench oltp test on master ,and send "flush tables with read lock" to slave again and again,good lock to you.
[7 Nov 2018 7:45]
ashe sun
Hi Bogdan
I have read all documents of Debug Sync Facility,and there is no way to control slave worker threads. But an easy way is here. (Tell me how to control internal threads using Debug Sync Facility if you got it)
This is my way to repeat.
Prepare your environment with master-slave replication and open parallel slave sql workers.
step 0
on master
create table ashe(id int not null auto_increment primary key,name varchar(10));
create table delay_table(id int);
step1
1.1 Generate 2 transaction that can be apply on slave in parallel.
session1 session2
set binlog_format=statement; set binlog_format=statement;
begin; begin;
insert into tt select sleep(10);
insert into ashe values(40,'aaa'); insert into ashe values(41,'aaa');
commit; commit
step 2
on slave
flush tables with read lock .
and then you will see that on slave instance.
mysql> show processlist;
+----+-------------+-----------+----------------+---------+------+--------------------------------------------------------+---------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+----------------+---------+------+--------------------------------------------------------+---------------------------------------+
| 3 | root | localhost | NULL | Query | 13 | Waiting for commit lock | flush tables with read lock |
| 22 | root | localhost | NULL | Query | 0 | starting | show processlist |
| 41 | root | localhost | NULL | Sleep | 1493 | | NULL |
| 48 | system user | | NULL | Connect | 688 | Waiting for master to send event | NULL |
| 49 | system user | | NULL | Connect | 20 | Slave has read all relay log; waiting for more updates | NULL |
| 50 | system user | | test_dead_lock | Connect | 25 | Waiting for global read lock | insert into ashe(name) values('aaaa') |
| 51 | system user | | NULL | Connect | 33 | Waiting for preceding transaction to commit | NULL |
| 52 | system user | | NULL | Connect | 657 | Waiting for an event from Coordinator | NULL |

Description: slave config: mysql> show global variables like '%slave%' -> ; +-------------------------------------------+-----------------------+ | Variable_name | Value | +-------------------------------------------+-----------------------+ | init_slave | | | log_slave_updates | ON | | log_slow_slave_statements | ON | | rpl_semi_sync_master_wait_for_slave_count | 1 | | rpl_semi_sync_master_wait_no_slave | ON | | rpl_semi_sync_slave_enabled | OFF | | rpl_semi_sync_slave_trace_level | 32 | | rpl_stop_slave_timeout | 31536000 | | slave_allow_batching | OFF | | slave_checkpoint_group | 512 | | slave_checkpoint_period | 300 | | slave_compressed_protocol | OFF | | slave_exec_mode | STRICT | | slave_load_tmpdir | /data/mysql/mysql_tmp | | slave_max_allowed_packet | 1073741824 | | slave_net_timeout | 30 | | slave_parallel_type | LOGICAL_CLOCK | | slave_parallel_workers | 16 | | slave_pending_jobs_size_max | 16777216 | | slave_preserve_commit_order | ON | | slave_rows_search_algorithms | INDEX_SCAN,HASH_SCAN | | slave_skip_errors | OFF | | slave_sql_verify_checksum | ON | | slave_transaction_retries | 128 | | slave_type_conversions | | | sql_slave_skip_counter | 0 | +-------------------------------------------+-----------------------+ dead lock: mysql> show processlist;l +----+-------------+-----------+--------------------+---------+------+--------------------------------------------------------+--------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-------------+-----------+--------------------+---------+------+--------------------------------------------------------+--------------------------------------------+ | 3 | root | localhost | test_dead_lock | Query | 2264 | Waiting for commit lock | flush tables with read lock | | 4 | system user | | NULL | Connect | 2708 | Waiting for master to send event | NULL | | 5 | system user | | NULL | Connect | 2289 | Slave has read all relay log; waiting for more updates | NULL | | 6 | system user | | NULL | Connect | 2319 | Waiting for global read lock | insert into ashe(id,name) values(40,'aaa') | | 7 | system user | | NULL | Connect | 2308 | Waiting for preceding transaction to commit | NULL | | 8 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 9 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 10 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 11 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 12 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 13 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 14 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 15 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | | 16 | system user | | NULL | Connect | 2708 | Waiting for an event from Coordinator | NULL | deadlock: flush tables with read lock --> thread 7 --> thread 6 --> flush tables with read lock How to repeat: step 0 on master create table ashe(id int not null auto_increment primary key,name varchar(10)); step1 on slave gdb attach slave with none-stop mode and set breakpoint at function open_and_lock_tables step2 2.1 Generate 2 transaction that can be apply on slave in parallel. session1 session2 begin; begin; insert into ashe values(40,'aaa'); insert into ashe values(41,'aaa'); commit; commit step 3 in gdb 3.1 let first worker thread with transaction 1 stopped in open_and_lock_tables 3.2 let second worker thread with transaction 2 continue,and it will wait_for_its_turn(trx1) step 3 on slave 3.1 flush tables with read lock . step 4 in gdb let first worker go or exit gdb. That's all. thanks. Suggested fix: 1. Block coordinator befor flush tables with read lock. 2.Remind users stop slave sql_thread befor flush tables with read lock. 3.Detect dead lock conditions when flush tables with read lock and exit and Remind users err message.