Bug #80122 semi-sync is ON (or partially so) when slaves are ON but master is OFF
Submitted: 22 Jan 2016 22:37 Modified: 4 Feb 2016 8:52
Reporter: Ernie Souhrada Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.25, 5.6.28 OS:Ubuntu
Assigned to: CPU Architecture:Any
Tags: replication, semi-sync, semi-synchronous

[22 Jan 2016 22:37] Ernie Souhrada
Description:
According to the documentation, semi-sync should is only active if it has been enabled on both the master and the slaves.  "The plugins must be enabled both on the master side and the slave side to enable semisynchronous replication. If only one side is enabled, replication will be asynchronous."

However, I have the following:

MASTER:
mysql> show global status like '%semi%status';
mysql> show global status like '%semi%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | OFF   |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
| Rpl_semi_sync_slave_status                 | OFF   |
+--------------------------------------------+-------+

mysql> show global variables like '%semi%';
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled       | OFF   |
| rpl_semi_sync_master_timeout       | 2500  |
| rpl_semi_sync_master_trace_level   | 32    |
| rpl_semi_sync_master_wait_no_slave | ON    |
| rpl_semi_sync_slave_enabled        | ON    |
| rpl_semi_sync_slave_trace_level    | 32    |
+------------------------------------+-------+

SLAVES (identical):
mysql> show global status like '%semi%status';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_status | OFF   |
| Rpl_semi_sync_slave_status  | ON    |
+-----------------------------+-------+

mysql> show global variables like '%semi%';
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled       | OFF   |
| rpl_semi_sync_master_timeout       | 2500  |
| rpl_semi_sync_master_trace_level   | 32    |
| rpl_semi_sync_master_wait_no_slave | ON    |
| rpl_semi_sync_slave_enabled        | ON    |
| rpl_semi_sync_slave_trace_level    | 32    |
+------------------------------------+-------+

And in the error log on the master, I see entries like this:
2016-01-12 00:10:02 110998 [Note] Stop semi-sync binlog_dump to slave (server_id: 167891604)
2016-01-12 00:10:04 110998 [Note] Start semi-sync binlog_dump to slave (server_id: 167891604), pos(nameofmyserver-6-9-bin.001824, 4474187)

This probably would not have even shown up on my radar except that we hit a period of heavy write activity and started getting alerts about IO thread lag - roughly 1GB of binlogs had not even been transferred from master to either slave.  The network was not saturated, and as soon as I disabled semi-sync on one of the slaves and restarted the it, the rate of binlog transfer went up immensely and the IO lag disappeared.

How to repeat:
Configure a master and two slaves (it probably works with just one).

For the slave, set rpl_semi_sync_slave_enabled=ON.  It does not matter if this is also set on the master.  

On the master, set rpl_semi_sync_master_enabled=OFF.

Watch the error log on the master as you stop / start replication:
SLAVE> stop slave;
master log: 2016-01-22 22:30:02 110998 [Note] Stop semi-sync binlog_dump to slave (server_id: 167878471)

SLAVE> start slave;
master log: 2016-01-22 22:30:05 110998 [Note] Start semi-sync binlog_dump to slave (server_id: 167878471), pos(nameofmyserver-6-9-bin.002191, 11656050)

Suggested fix:
Make semi-sync replication work the same way it's described in the manual.
[4 Feb 2016 8:52] Umesh Shastry
Hello Ernie Souhrada,

Thank you for the report.
Observed this with 5.6.28 build.

Thanks,
Umesh
[11 Feb 2017 19:57] Trey Raymond
confirming this is still a problem in 5.7.17
[11 Feb 2017 20:21] Trey Raymond
Note that with multi-source replication, any slave with semisync slave enabled will connect as semisync on *all* channels - you can't set one master to enabled and the other to disabled to control that.
[11 Feb 2017 23:10] Trey Raymond
showing odd behavior re msr

Attachment: slave_start.txt (text/plain), 43.12 KiB.

[11 Feb 2017 23:13] Trey Raymond
Attached a file with a log transcript and terminal input/output.
You can see that, using msr with one master having semisync and the other just async, mysql fails when starting both at once.  If you start each channel individually, it seems to work, but as soon as you go back to a global stop slave/start slave it fails again.
I'm assuming the same thing enforcing semisync no matter what the master says is what's causing this to fail when one master doesn't have semisync support.