Bug #73066 Replication stall with multi-threaded replication
Submitted: 20 Jun 2014 15:45 Modified: 9 Nov 5:59
Reporter: Ovais Tariq Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.17 OS:Any
Assigned to: CPU Architecture:Any

[20 Jun 2014 15:45] Ovais Tariq
Description:
Periodically what happens every few days that the replication gets stalled with the coordinator thread in the state "Waiting for Slave Worker to release partition" and the worker threads in the state "Waiting for an event from Coordinator". At that point trying to stop the slave by executing "STOP SLAVE" also gets stalled, together with stalling all the further "SHOW SLAVE STAUS" calls.

Below is how typically the processlist looks in that particular case:
+------+-------------+---------------------+---------+---------+-------+-----------------------------------------------+-------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+-------------+---------------------+---------+---------+-------+-----------------------------------------------+-------------------+-----------+---------------+
| 2477 | root | localhost | NULL | Sleep | 33355 | | NULL | 1 | 46 |
| 2967 | system user | | NULL | Connect | 24248 | Waiting for master to send event | NULL | 0 | 0 |
| 2968 | system user | | NULL | Connect | 24085 | Waiting for Slave Worker to release partition | NULL | 0 | 0 |
| 2969 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2970 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2971 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2972 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2973 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2974 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2975 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2976 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2977 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 1 |
| 2978 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 4268 | root | localhost | some_db | Query | 871 | Killing slave | stop slave | 0 | 0 |
| 4370 | root | localhost | some_db | Sleep | 4 | | NULL | 0 | 0 |
| 4377 | monitoring | 192.168.1.222:11159 | NULL | Query | 464 | init | show slave status | 0 | 0 |
| 4419 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
+------+-------------+---------------------+---------+---------+-------+-----------------------------------------------+-------------------+-----------+---------------+
17 rows in set (0.00 sec)

Looking into this further, the InnoDB status shows us that one of the worker thread actually has an open transaction:
---TRANSACTION 53684133006, ACTIVE 24245 sec
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 2977, OS thread handle 0x36, query id 56740540 Waiting for an event from Coordinator
TABLE LOCK table `db2`.`tbl_name` /* Partition `p24` */ trx id 53684133006 lock mode IX
RECORD LOCKS space id 3504 page no 23250 n bits 440 index `PRIMARY` of table `db2`.`tbl_name` /* Partition `p24` */ trx id 53684133006 lock_mode X locks rec but not gap

First off all the transaction is a replicated transaction being executed by the worker thread and is still open for an unknown reason, while holding only a single row lock.

So the slave is stuck because one of the worker has a transaction open for 24245 seconds and the coordinator is waiting on the worker to finish.

I will be attaching the backtrace and the state of the coordinator in follow up comments.

So based on the backtrace this is what we have as the position up to which the workers have executed everything:
group_relay_log_name = "./relay-bin.000973", '\000' <repeats 482 times>,
group_relay_log_pos = 616886798,

Looking at the relay log following are the events in the relay log starting at position 616886798:
# at 616886798
#140618 19:31:55 server id 11 end_log_pos 616886824 CRC32 0xca064602 GTID [commit=yes]
SET @@SESSION.GTID_NEXT= 'da2d1223-8151-11e3-8e1d-997ca3babb7c:4367700088'/*!*/;
# at 616886846
#140618 19:31:55 server id 11 end_log_pos 616886911 CRC32 0x893a663b Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=1403119915/*!*/;
BEGIN
/*!*/;
# at 616886933
#140618 19:31:55 server id 11 end_log_pos 616887078 CRC32 0xb7f1ec9f Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=1403119915/*!*/;
UPDATE some_table ...
/*!*/;
# at 616887100
#140618 19:32:05 server id 6 end_log_pos 616887123 CRC32 0xfdd70dd7 Stop
DELIMITER ;
# End of log file

So after the UPDATE there is no COMMIT event, but a STOP event. A STOP event is written when mysqld is being shutdown. The above UPDATE is what the worker thread executed in the transaction that is still open.

So what happened was that while IO thread was writing an event group to the relay log, mysqld was shutdown and the event group was written partially.

When MySQL comes back up again, the log is rotated, and IO thread starts writing from the event group which was partially written. So the next log has the following event group:
#140618 19:31:55 server id 11 end_log_pos 616886824 CRC32 0xca064602 GTID [commit=yes]
SET @@SESSION.GTID_NEXT= 'da2d1223-8151-11e3-8e1d-997ca3babb7c:4367700088'/*!*/;
# at 478
#140618 19:31:55 server id 11 end_log_pos 616886911 CRC32 0x893a663b Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=1403119915/*!*/;
SET @@session.pseudo_thread_id=708866417/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=224/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 565
#140618 19:31:55 server id 11 end_log_pos 616887078 CRC32 0xb7f1ec9f Query thread_id=708866417 exec_time=0 error_code=0
use `db2`/*!*/;
SET TIMESTAMP=1403119915/*!*/;
UPDATE some_tbl ....
/*!*/;
# at 732
#140618 19:31:55 server id 11 end_log_pos 616887109 CRC32 0x8f5b2f7c Xid = 30416488792
COMMIT/*!*/;

You can see that the "end_log_pos" values are the same for the partial event I showed earlier and this one. It is the same UPDATE transaction that was partially written.

Now comes the interesting part. When the coordinator reads the relay log, it sends the partial event to the worker, since the event is partial hence the worker never commits the transaction and the transaction is kept open. The coordinator reads the next event (which is the full version of the partial event) but the coordinator cannot assign the next event to another worker because of one of the workers having an open transaction.

And apparently, MTS waits for workers to commit transactions when it sees log rotated. 

How to repeat:
Stop mysqld process on slave when the relay log is in the middle of writing a binlog event group

Suggested fix:
Correct behavior would be for MTS to execute a rollback when it sees a STOP event, so that it does not get stuck with an open transaction.
[20 Jun 2014 15:45] Ovais Tariq
The MySQL configuration is as follows:
[mysqld]
core-file
user = mysql
port = 3306
datadir = /db
socket = /tmp/mysql.sock
default-storage-engine = innodb
skip-external-locking
log_warnings=1
skip_name_resolve
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
group_concat_max_len = 1000000

server-id = 666

# InnoDB settings
innodb_data_home_dir = /db
innodb_log_group_home_dir = /db
innodb_data_file_path = ibdata1:100M:autoextend
innodb_buffer_pool_size = 20G
innodb_buffer_pool_instances = 8
innodb_log_file_size = 10G
innodb_buffer_pool_load_at_startup=OFF
innodb_buffer_pool_dump_at_shutdown=OFF
# allows for row_format=compressed
innodb_file_format=Barracuda
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50
innodb_file_per_table
innodb_doublewrite = 0
innodb_io_capacity = 1200
innodb_read_io_threads = 6
innodb_write_io_threads = 6
innodb_stats_on_metadata = OFF

# Slow query log settings
# The default logs all full table scans,tmp tables,filesorts on disk queries
#use_global_long_query_time = 1
#long_query_time = 0.5
slow_query_log_file = slowquery.log
slow_query_log = 1
long_query_time = 3
log_slow_filter = "full_scan,tmp_table_on_disk,filesort_on_disk"
log_slow_verbosity = "full"

# Other general MySQL settings
sync_binlog = 0
query_cache_type = 0
query_cache_size = 0
max_connections = 3000
thread_cache_size = 1000
back_log = 1024
thread_concurrency = 32
innodb_thread_concurrency = 64
tmpdir = /var/tmp
max_allowed_packet = 24M
max_join_size = 4294967295
net_buffer_length = 2K
thread_stack = 512K
tmp_table_size = 64M
max_heap_table_size = 64M
table_open_cache = 2000

# Replication settings (master to slave)
binlog_cache_size = 2M
binlog_format=mixed
log-bin = bin
log-error = error.log
expire_logs_days = 5
slave-parallel-workers = 10
master-info-repository = "table"
relay-log-info-repository = "table"
gtid_mode = ON
enforce_gtid_consistency = true
log-slave-updates
replicate-ignore-table = mysql.user
replicate-ignore-table = mysql.db
replicate-ignore-table = mysql.tables_priv
replicate-ignore-table = mysql.proxies_priv

# Started tuning slave catchup speed, can use more research
slave-checkpoint-period = 1000
slave-checkpoint-group = 2048
[20 Jun 2014 15:50] Ovais Tariq
More details on the state of the coordinator thread

Attachment: state_of_coordinator_thread_from_gdb.txt (text/plain), 39.71 KiB.

[23 Jun 2014 10:24] Umesh Shastry
Thank you for the report.
I'm able to reproduce this issue with below steps:

# Setup replication using gtid(conf files attached)
# Stop replication(slave>stop slave;)
# Emulate load on master

# bin/mysqlslap --auto-generate-sql --auto-generate-sql-add-autoincrement --auto-generate-sql-execute-number=100000 --auto-generate-sql-load-type=mixed --auto-generate-sql-secondary-indexes=2 -c 10 --create-schema='replication' -T -e InnoDB -i 10 --number-char-cols=10 -S /tmp/mysql_master.sock

# Stop replication(slave>start slave;)
# Kill the slave kill -9 when one of repl thread is in "Waiting for master to send event" State

# Wait for sometime
# Start Slave server again(no skip slave start etc used)

Wait for sometime and then try to stop slave "stop slave" or "show slave status\G", everything seems to hang

mysql> stop slave;

^^ Freezed

mysql> show slave status\G

^^ Freezed

slave> show processlist;
+----+-------------+-----------+-------------+---------+------+-----------------------------------------------+-------------------+
| Id | User        | Host      | db          | Command | Time | State                                         | Info              |
+----+-------------+-----------+-------------+---------+------+-----------------------------------------------+-------------------+
|  1 | system user |           | NULL        | Connect | 6656 | Waiting for master to send event              | NULL              |
|  2 | system user |           | NULL        | Connect | 6567 | Waiting for Slave Worker to release partition | NULL              |
|  3 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
|  4 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
|  5 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
|  6 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
|  7 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
|  8 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
|  9 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
| 10 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
| 11 | system user |           | NULL        | Connect | 6656 | Waiting for an event from Coordinator         | NULL              |
| 12 | system user |           | NULL        | Connect | 7025 | Waiting for an event from Coordinator         | NULL              |
| 13 | root        | localhost | NULL        | Query   | 5449 | Killing slave                                 | stop slave        |
| 14 | root        | localhost | NULL        | Query   | 5344 | init                                          | show slave status |
| 15 | root        | localhost | NULL        | Query   | 4908 | init                                          | show slave status |
| 16 | root        | localhost | replication | Sleep   | 4278 |                                               | NULL              |
| 21 | root        | localhost | NULL        | Query   |    0 | init                                          | show processlist  |
+----+-------------+-----------+-------------+---------+------+-----------------------------------------------+-------------------+
17 rows in set (0.00 sec)
[23 Jun 2014 10:40] Umesh Shastry
Typo in earlier repro note:

# Stop replication(slave>start slave;)

Should be

# Start replication(slave>start slave;)
[23 Jun 2014 12:02] qinglin zhang
Same bug as http://bugs.mysql.com/bug.php?id=72794.
[23 Jun 2014 12:03] qinglin zhang
Same bug as http://bugs.mysql.com/bug.php?id=72794.
[26 Feb 2015 13:30] Laurynas Biveinis
Failed to reproduce with 5.7.5.
[22 Dec 2017 18:40] Rolf Martin-Hoster
5.7.18 output

Attachment: 5 (application/octet-stream, text), 17.76 KiB.

[22 Dec 2017 18:40] Rolf Martin-Hoster
This appears to still be happening in 5.7.18
[9 Nov 5:59] Umesh Shastry
Confirmed that issue is no longer reproducible on latest 5.6 build.
Internal discussion confirmed that bug seems to have been fixed by Joao in 5.6.21+ post Bug#72794.  Closing the issue for now and joining latest results shortly.

regards,
Umesh
[10 Nov 6:29] Umesh Shastry
I'm unable to reproduce this on 5.7.24 with exact steps(only difference being that in 2014, had a smaller test box and now with moderate one).
[10 Nov 6:30] Umesh Shastry
5.7.24 - Test results

Attachment: 73066_5.7.24.results (application/octet-stream, text), 18.03 KiB.

[19 Nov 4:34] Umesh Shastry
Hello Rolf Martin-Hoster,

Our internal discussions and re-verification efforts from my end and Development confirmed that original issue is no longer reproducible, most likely fixed after Bug #72794 fixed. Also,  one more difference was in the original report both stop slave and show slave status hanged but looking at your provided details, only found in former case. We concluded that the issue might have been caused due to some corner cases.  In order to proceed further, may I request you to provide these below details?  Quoting as is Dev's requested details:

-- Relay log contents when applier thread has errored out, this will help us to identify if there were any partial transactions in relay log.
-- Slave's error log file.
-- Does slave_parallel_type=DATABASE/LOGICAL_CLOCK was in use?
-- There seems to be an active transaction but it is hard to say why the transaction is in active state for such a long time.
    Can you please provide more details regarding what type of transactions were they doing?
-- Just before the MTS hang scenario was the IO thread or Slave server was restarted and START SLAVE was initiated?
-- Are you doing any cross version replication?

Also we don't fix bugs in old versions, don't backport bug fixes, so need to check with latest version anyway. So, please, upgrade and inform us if problem still exists in latest GA builds along with exact reproducible steps.

Thank you!

regards,
Umesh