Bug #85382 Getting semi-sync reply magic number errors when slave_compressed_protocol is 1
Submitted: 9 Mar 2017 6:22 Modified: 10 Mar 2017 6:42
Reporter: Jaime Sicam Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7, 5.7.17 OS:Any
Assigned to: CPU Architecture:Any

[9 Mar 2017 6:22] Jaime Sicam
Description:
It looks like semisync replication is not compatible with slave_compressed_protocol given these errors on master:
2017-03-09T06:03:05.736142Z 33 [Note] Start binlog_dump to master_thread_id(33) slave_server(101), pos(mysql-bin.000003, 55744206)
2017-03-09T06:03:05.736360Z 32 [Note] Stop semi-sync binlog_dump to slave (server_id: 101)
2017-03-09T06:03:06.736172Z 33 [Note] Start semi-sync binlog_dump to slave (server_id: 101), pos(mysql-bin.000003, 55744206)
2017-03-09T06:03:06.796895Z 0 [ERROR] Read semi-sync reply magic number error
2017-03-09T06:03:06.797982Z 0 [ERROR] /home/drake/binaries_stock/5.7.15/bin/mysqld: Got timeout reading communication packets
2017-03-09T06:03:06.836233Z 0 [ERROR] /home/drake/binaries_stock/5.7.15/bin/mysqld: Got an error reading communication packets
2017-03-09T06:03:06.837107Z 33 [Note] Stop semi-sync binlog_dump to slave (server_id: 101)
2017-03-09T06:03:06.837614Z 34 [Note] While initializing dump thread for slave with UUID <00019791-2222-2222-2222-222222222222>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(33).
2017-03-09T06:03:06.837648Z 34 [Note] Start binlog_dump to master_thread_id(34) slave_server(101), pos(mysql-bin.000003, 57622446)
2017-03-09T06:03:07.837293Z 34 [Note] Start semi-sync binlog_dump to slave (server_id: 101), pos(mysql-bin.000003, 57622446)
2017-03-09T06:03:07.907752Z 0 [ERROR] Read semi-sync reply magic number error
2017-03-09T06:03:07.908827Z 0 [ERROR] /home/drake/binaries_stock/5.7.15/bin/mysqld: Got timeout reading communication packets
2017-03-09T06:03:07.929182Z 0 [ERROR] /home/drake/binaries_stock/5.7.15/bin/mysqld: Got an error reading communication packets
2017-03-09T06:03:07.930163Z 35 [Note] While initializing dump thread for slave with UUID <00019791-2222-2222-2222-222222222222>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(34).

and slave:
2017-03-09T06:03:06.836081Z 1 [ERROR] Error reading packet from server for channel '': Lost connection to MySQL server during query (server_errno=2013)
2017-03-09T06:03:06.836101Z 1 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000003' at position 57622446 for channel ''
2017-03-09T06:03:06.836110Z 1 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2017-03-09T06:03:07.929185Z 1 [ERROR] Error reading packet from server for channel '': Lost connection to MySQL server during query (server_errno=2013)
2017-03-09T06:03:07.929204Z 1 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000003' at position 59500731 for channel ''
2017-03-09T06:03:07.929213Z 1 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2017-03-09T06:03:09.075862Z 1 [ERROR] Error reading packet from server for channel '': Lost connection to MySQL server during query (server_errno=2013)
2017-03-09T06:03:09.075879Z 1 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000003' at position 61379031 for channel ''
2017-03-09T06:03:09.075887Z 1 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.

How to repeat:
To reproduce via mysql sandbox:
1. Setup replication:
make_replication_sandbox 5.7.15

2. Enable semisync replication on master:
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';

3. Enable semisync replication on slave:
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';

4. Edit master my.cnf and add user [mysqld] section:
slave_compressed_protocol = 1
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=1000

5. Edit slave my.cnf and add under [mysqld] section:
slave_compressed_protocol=1
rpl_semi_sync_slave_enabled=1

6. Restart Master and Slave
./restart_all

7. Import data on Master:
mysql -u root < employees.sql

8. Review error logs on master and slave

Solution: Disable slave_compressed_protocol on slave:
STOP SLAVE;
SET GLOBAL slave_compressed_protocol=0;
START SLAVE;
Try importing data again
[10 Mar 2017 6:42] MySQL Verification Team
Hello Jaime Sicam,

Thank you for the report and test case.
Observed the reported issue with 5.7.17 build, I didn't notice any replication failure etc but observed that error log on master/slave flooded with the messages.

Thanks,
Umesh
[10 Mar 2017 6:43] MySQL Verification Team
test results

Attachment: 85382.5.7.17.results (application/octet-stream, text), 52.92 KiB.

[28 Mar 2017 11:26] Bryan Beaudreault
I'm seeing this as well. As you said there is no replication error, but it results in a large latency on all writes. The writes do not succeed until the error happens and the slave reconnects and dumps the binlog.
[20 Apr 2017 3:47] monty solomon
The QPS drops from around 300 QPS to around 30 QPS when slave_compressed_protocol is enabled running the following command on the master with two (2) slavea.

pt-heartbeat --database heartbeat --table heartbeat --update --create-table --daemonize --interval 0.01

The state on the master is "Waiting for semi-sync ACK from slave"

master my.cnf settings

plugin-load = rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so
rpl_semi_sync_master_timeout = 1000000000000000000
rpl_semi_sync_master_wait_no_slave = 1
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 0

slave my.cnf settings

plugin-load = rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so
rpl_semi_sync_master_timeout = 1000000000000000000
rpl_semi_sync_master_wait_no_slave = 1
rpl_semi_sync_master_enabled = 0
rpl_semi_sync_slave_enabled = 1
[22 Nov 2017 7:07] ashe sun
5.7.20 Affected.
[18 Apr 2018 6:16] Xiaofeng Dong
It may be fixed in MySQL-5.7.21.

Replication: The receiver thread for semisynchronous replication was not able to receive acknowledgements from slaves that used compression of the master/slave protocol (slave_compressed_protocol=ON). The receiver thread now handles compressed acknowledgements correctly. (Bug #26027024, Bug #86230)

https://bugs.mysql.com/bug.php?id=86230
[19 Dec 2018 19:16] Fernando Camargos
> It may be fixed in MySQL-5.7.21.

It has not: I've just tested with 5.7.22 and I still see the same ERROR messages, though pt-table-checksum doesn't show any signs of data inconsistency.
[13 Jul 2023 3:08] Biao You
> It may be fixed in MySQL-5.7.21.
my db is 5.7.28 and still see the same ERROR messages.