Bug #81330 circular semisync replication fails on msr (and slow without msr)
Submitted: 5 May 2016 19:41 Modified: 14 Feb 2017 16:17
Reporter: Trey Raymond Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.6.29, 5.7.17 OS:Any
Assigned to: CPU Architecture:Any
Tags: semisync

[5 May 2016 19:41] Trey Raymond
Description:
Consider the case:
Two servers in a dual-master/single-writer circular replication setup
Semisynchronous replication enabled between the two masters
0 or any number of asynchronous slaves from each

This provides a level of durability and easy failover while [supposedly] maintaining good write performance on the active master.  Nontransactional read scaling is easy through normal async slaves.

The problem arises with the inactive master's slave executor thread[s].  With normal semisync logic and log_slave_updates on, one of the slave threads executes a transaction, and then waits for the active master to ack before committing.  This results in an enormous slowdown to the slave's execution (it will lag significantly behind the master, potentially indefinitely), as well as affecting performance upstream mainly because acks from the active master take longer to return.  In tests w/ 48 threads on master, 4 slave exec threads evenly sharing load, 25 rows per commit: 67K rows/sec without this issue (no slave sql lag), 64K with the issue (endlessly growing slave sql lag), 87K control without semisync.  That's a 5% drop in performance, but more importantly, total loss of usefulness of the secondary due to lag.  Any write switchover would need to wait for that thread to catch up, which could be days or weeks if it's been running for a bit.

As a workaround, you can disable log_slave_updates (and disable GTID if enabled), and use only the two servers.  This eliminates the ability to scale out reads via slaves (a major loss of functionality), and adds some complexity to the backup process.  It's a hacky fix that only works in very specific implementations.

Another workaround would be to globally enable semisync in the primary channel (from current active to current secondary) only, and switch back and forth as traffic is switched.  This adds a level of complexity and risk to the switchover process that isn't really acceptable in a production deploy.  It's something that should really be handled by mysql internally.

How to repeat:
configure two servers in circular replication with semisync enabled, run a lot of writes on one.

Suggested fix:
An easy fix would be one of the following:
- a global option to disable semisync acks for the slave threads, which could be set for configurations like this.
- a session option to disable semisync acks for one session, which could be used in init_slave.  this could have other uses outside of this bug, but could also incur risk.

A more complicated fix would be to improve the logic of the slave sql threads and have them skip calling semisync ack if the server_id associated with the transaction in the relay log is also connected as a slave.  This sounds better on paper, and more accurately solves the use case of semisync, but could lead to confusion among some users that have many semisync slaves.  It would need to be well documented.
[10 Feb 2017 21:51] Trey Raymond
More annoying with msr+semisync in 5.7, as more things are dependent on replication
[10 Feb 2017 23:56] Trey Raymond
raised severity as this setup now fails in 5.7.  logs like:

2017-02-10T23:38:58.858496Z 123 [Note] Start binlog_dump to master_thread_id(123) slave_server(1653271619), pos(, 4)
2017-02-10T23:38:59.872681Z 122 [Note] Aborted connection 122 to db: 'unconnected' user: 'replication' host: '98.138.236.67' (Failed on my_net_write())
2017-02-10T23:39:01.684790Z 123 [Note] Start semi-sync binlog_dump to slave (server_id: 1653271619), pos(, 4)
2017-02-10T23:39:01.948952Z 0 [ERROR] /home/y/libexec64/mysqld: Got an error reading communication packets
2017-02-10T23:39:01.949160Z 123 [Note] Stop semi-sync binlog_dump to slave (server_id: 1653271619)
2017-02-10T23:39:01.953178Z 126 [Note] While initializing dump thread for slave with UUID <5d10e410-efbd-11e6-94bd-782bcb180148>, found a zombie dump thread with the same UUID. Master is killing the
zombie dump thread(123).
2017-02-10T23:39:01.953399Z 126 [Note] Start binlog_dump to master_thread_id(126) slave_server(1653271619), pos(, 4)
2017-02-10T23:39:02.159962Z 126 [Note] Start semi-sync binlog_dump to slave (server_id: 1653271619), pos(, 4)
2017-02-10T23:39:02.159990Z 123 [Note] Aborted connection 123 to db: 'unconnected' user: 'replication' host: '98.138.236.67' (Failed on my_net_write())
2017-02-10T23:39:02.386143Z 0 [ERROR] /home/y/libexec64/mysqld: Got an error reading communication packets
2017-02-10T23:39:02.386541Z 126 [Note] Stop semi-sync binlog_dump to slave (server_id: 1653271619)
2017-02-10T23:39:02.389893Z 127 [Note] While initializing dump thread for slave with UUID <5d10e410-efbd-11e6-94bd-782bcb180148>, found a zombie dump thread with the same UUID. Master is killing the
zombie dump thread(126).
2017-02-10T23:39:02.390084Z 127 [Note] Start binlog_dump to master_thread_id(127) slave_server(1653271619), pos(, 4)
2017-02-10T23:39:02.535603Z 0 [ERROR] /home/y/libexec64/mysqld: Got an error reading communication packets
2017-02-10T23:39:02.535604Z 127 [Note] Start semi-sync binlog_dump to slave (server_id: 1653271619), pos(, 4)
2017-02-10T23:39:02.660628Z 126 [Note] Aborted connection 126 to db: 'unconnected' user: 'replication' host: '98.138.236.67' (Failed on my_net_write())
2017-02-10T23:39:02.798574Z 0 [ERROR] /home/y/libexec64/mysqld: Got an error reading communication packets
2017-02-10T23:39:02.798921Z 127 [Note] Stop semi-sync binlog_dump to slave (server_id: 1653271619)
2017-02-10T23:39:02.801802Z 128 [Note] While initializing dump thread for slave with UUID <5d10e410-efbd-11e6-94bd-782bcb180148>, found a zombie dump thread with the same UUID. Master is killing the
zombie dump thread(127).
2017-02-10T23:39:02.802036Z 128 [Note] Start binlog_dump to master_thread_id(128) slave_server(1653271619), pos(, 4)
2017-02-10T23:39:04.238689Z 127 [Note] Aborted connection 127 to db: 'unconnected' user: 'replication' host: '98.138.236.67' (Failed on my_net_write())
2017-02-10T23:39:04.396646Z 128 [Note] Start semi-sync binlog_dump to slave (server_id: 1653271619), pos(, 4)
2017-02-10T23:39:04.610367Z 43 [ERROR] Error reading packet from server for channel 'ne1-003': Lost connection to MySQL server during query (server_errno=2013)
2017-02-10T23:39:04.610400Z 43 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysqld-bin.000001' at position 393867224 for channel 'ne1-003'
2017-02-10T23:39:04.610412Z 43 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER a
nd PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2017-02-10T23:39:04.634018Z 0 [ERROR] /home/y/libexec64/mysqld: Got an error reading communication packets
2017-02-10T23:39:04.634524Z 128 [Note] Stop semi-sync binlog_dump to slave (server_id: 1653271619)
2017-02-10T23:39:04.637837Z 129 [Note] While initializing dump thread for slave with UUID <5d10e410-efbd-11e6-94bd-782bcb180148>, found a zombie dump thread with the same UUID. Master is killing the
zombie dump thread(128).
2017-02-10T23:39:04.638025Z 129 [Note] Start binlog_dump to master_thread_id(129) slave_server(1653271619), pos(, 4)
2017-02-10T23:39:05.222679Z 128 [Note] Aborted connection 128 to db: 'unconnected' user: 'replication' host: '98.138.236.67' (Failed on my_net_write())
2017-02-10T23:39:05.634652Z 129 [Note] Start semi-sync binlog_dump to slave (server_id: 1653271619), pos(, 4)
2017-02-10T23:39:05.905290Z 0 [ERROR] /home/y/libexec64/mysqld: Got an error reading communication packets
2017-02-10T23:39:05.905927Z 129 [Note] Stop semi-sync binlog_dump to slave (server_id: 1653271619)
[11 Feb 2017 20:41] Trey Raymond
tested a few cases

A is writer.  B and C are backup masters.

A->B semisync + B->A semisync
- works fine in any situation with only two masters.

A->B semisync + A->C semisync + B->C semisync + C->B semisync + B->A semisync + C->A semisync
- see above log paste!

A->B async + A->C async + B->C async + C->B async + B->A semisync + C->A semisync
- the channels sending data from B or C to A would fail repeatedly (A's slave threads) - it seemed to alternate between them, possibly whichever sent back the transaction first.

A->B semisync + A->C async + B->A async + C->A async
- works here, semisync is outbound

A->B semisync + A->C semisync + B->A semisync + C->A semisync
- failures on A's slave threads, alternating again

A's slave threads fail and reconnect:
2017-02-11T20:09:45.394721Z 14 [ERROR] Error reading packet from server for channel 'ne1-002': Lost connection to MySQL server during query (server_errno=2013)
2017-02-11T20:09:45.394748Z 14 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysqld-bin.000002' at position 4 for channel 'ne1-002'
2017-02-11T20:09:45.394759Z 14 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER a
nd PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.

B or C see this failure on their end:
2017-02-11T20:09:40.388838Z 148 [Note] Aborted connection 148 to db: 'unconnected' user: 'replication' host: '98.138.236.64' (Failed on my_net_write())
2017-02-11T20:09:45.394469Z 0 [ERROR] /home/y/libexec64/mysqld: Got an error reading communication packets
2017-02-11T20:09:45.394651Z 155 [Note] Stop semi-sync binlog_dump to slave (server_id: 1653271616)
2017-02-11T20:09:45.397984Z 156 [Note] While initializing dump thread for slave with UUID <e65451a8-efbc-11e6-aac9-782bcb1808c5>, found a zombie dump thread with the same UUID. Master is killing the
zombie dump thread(155).
2017-02-11T20:09:45.398129Z 156 [Note] Start binlog_dump to master_thread_id(156) slave_server(1653271616), pos(, 4)
2017-02-11T20:09:46.395644Z 156 [Note] Start semi-sync binlog_dump to slave (server_id: 1653271616), pos(, 4)
2017-02-11T20:09:46.395651Z 155 [Note] Aborted connection 155 to db: 'unconnected' user: 'replication' host: '98.138.236.64' (Failed on my_net_write())
[11 Feb 2017 20:46] Trey Raymond
Here's what I think is happening:

Transaction with gtid ServerA:1 from a normally connected client executes on A.
replication sends that transaction to servers B and C, the method here isn't really relevant.
B and C both execute transaction ServerA:1.  They then try to write that to their binary logs and get acks from their slaves, in this case, server A is the only slave of each.
Server A is trying to ack transaction ServerA:1 twice - as it comes from different sources.  This is totally safe (and works fine without semisync), but I believe something in the way semisync handles this is causing the threads to conflict and one dies.  The fact that which one dies alternates implies that it's picking on based on which channel was first to execute and request acknowledgement, which would vary randomly.

I'd love to be able to test the 4-channel scenario above, but with B->A semisync and C->A async.  blocked by https://bugs.mysql.com/bug.php?id=80122 though - can't test that easily.  I'll try building a version without the semisync plugins installed to get a case like that, but it's a pain.
[11 Feb 2017 20:51] Trey Raymond
So, this issue has diverged a bit and gotten more serious since the first report back on 5.6.  However, I believe it's the same solution at the heart:

"A more complicated fix would be to improve the logic of the slave sql threads and have them skip calling semisync ack if the server_id associated with the transaction in the relay log is also connected as a slave.  This sounds better on paper, and more accurately solves the use case of semisync, but could lead to confusion among some users that have many semisync slaves.  It would need to be well documented."

An update on that, server ID logic would come directly from the GTID on newer systems, but both would be considered as long as non-gtid replication is supported.

I also think it's worth fixing what's causing the slave threads to die when two receive the same transaction.  I feel that if we only fix one side of it, that bug will rear its head again in a potentially worse way down the road.
[11 Feb 2017 23:14] Trey Raymond
"A significant functionality is missing and there is no available workaround" - per docs this should be S1.
[13 Feb 2017 23:57] Trey Raymond
there is a workaround available if you only have a few servers, that changes functionality, but gets similar results.  it's not viable on larger setups, but it's a good test.
set the following:
log_slave_updates=off on all masters (assume it or log_bin is already disabled on slaves)
rpl_semi_sync_master_wait_for_slave_count=(masters+slaves-1)  -- basically one less than the total host count (so it doesn't count itself), or an arbitrarily large number assuming no wait_no_slave.
rpl_semi_sync_slave_enabled=on on slaves - makes all hosts forced semisync clients.

This will leave you with a functional system which has the safety of semisync, but only works if you have very few slaves - because the writing master needs to wait for acks from every host in the whole cluster, instead of just the backup masters.
[14 Feb 2017 1:10] Trey Raymond
update: that doesn't work in practice, but due to another bug with 5.7's gtid improvements
[14 Feb 2017 16:17] Trey Raymond
Test case: The 3-master setup with semisync on all channels does work *without* gtid or log_slave_updates.  It still gets more failures than I would like, but it's comparable to a 2-server system scaled up.  Not many overall - few enough to work with - not just failing in a loop like this issue.