Bug #100702 There are compatibility issues with compression during MySQL5.7 master-slave syn
Submitted: 1 Sep 2020 1:52 Modified: 16 Sep 2020 12:07
Reporter: Wang Wei Email Updates:
Status: Can't repeat Impact on me:
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.7 OS:CentOS (7.2)
Assigned to: MySQL Verification Team CPU Architecture:x86

[1 Sep 2020 1:52] Wang Wei
Environmental description:
The mysql main library is version 5.6, there are four slave libraries, three are version 5.6 and one is version 5.7. The table structure of all the master and slaves is the same. There is a lot of delay in the 5.7 slave library, and 5.6 is no problem. The business is zabbix monitoring, basically all insert batch insert operations, and each insert SQL insert data is about 400-1000 rows.
Problem: MySQL 5.7 has a large delay from the database, the relaylog is placed normally, the application to the database is slow, IO and CPU are not pressured, there is no difference between the sync_binlog 20000 or 0, max_allowed_packet=128M, innodb_flush_log_at_trx_commit=0, bulk_insert_buffer_size = 128M, binlog_format =row, sync_relay_log=10000, parallel replication is not used, SSL is not turned on, GDID is not turned on, and semi-synchronization is not turned on。

How to repeat:
1: Check each parameter related to performance, and no abnormality is found.

2: Checking the network card, hard disk, replacing the server, and restarting the database server all has no effect. The 5.7 delay still exists, and the hardware problem is eliminated.

3: 5.7 Synchronizing the binlog of the main library 5.6 to the relaylog is fast and normal, but the playback efficiency of the relaylog in the 5.7 database is extremely low.

4: Compare the show engine innodb status results of 5.6 and 5.7 from the library:
Buffer pool size 655359
Buffer pool size, bytes 10737401856
Free buffers 1019
Database pages 649599
Old database pages 239773
Modified db pages 119309
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 10777670, not young 181119246
13.90 youngs/s, 157.51 non-youngs/s
Pages read 8853516, created 135760152, written 784514803
20.96 reads/s, 58.17 creates/s, 507.02 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0/1000 not 2/1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 649599, unzip_LRU len: 0
I/O sum[209618]:cur[2], unzip sum[0]:cur[0]
Buffer pool size 819100
Buffer pool size, bytes 13420134400
Free buffers 1018
Database pages 722328
Old database pages 266620
Modified db pages 99073
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 37153, not young 795
0.00 youngs/s, 0.00 non-youngs/s
Pages read 149632, created 572696, written 2706369
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0/1000 not 0/1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 722328, unzip_LRU len: 453903
I/O sum[98685]:cur[0], unzip sum[882]:cur[6]
The comparison found that there is a value for unzip in 5.7, and there is no value in 5.6. It is initially suspected that the cause of the delay is related to compression and decompression.

5: Use perf top -p `pidof mysqld` to view the 5.7 slave library
It is found that libz.so.1.2.7 [.] crc32 accounts for a higher proportion than mysqld, which is about 6%. This library is related to compression and decompression.

6: Modify the level of innodb_compression_level to 0 (that is, do not enable compression, the default is 6, the range is 0-9), the observation has no effect, and the delay still exists. It's just that the proportion of libz has gone down, but the proportion of libc-2.17.so has gone up, which is higher than that of mysqld, at about 9%.

7: Check the history tables of zabbix. In order to save disk space, these tables were compressed:
CREATE TABLE `trends` (
  `itemid` bigint(20) unsigned NOT NULL,
  `clock` int(11) NOT NULL DEFAULT '0',
  `num` int(11) NOT NULL DEFAULT '0',
  `value_min` double(16,4) NOT NULL DEFAULT '0.0000',
  `value_avg` double(16,4) NOT NULL DEFAULT '0.0000',
  `value_max` double(16,4) NOT NULL DEFAULT '0.0000',
  PRIMARY KEY (`itemid`,`clock`),
  KEY `clock` (`clock`)

It is suspected to be related to the compression parameter ”ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8”.
8: Rebuild all history tables, remove ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8, re-synchronize, gradually reduce the delay, and restore.
Question: Why does the same table structure cause master-slave delay in 5.7? It may be caused by the low downward compatibility of compression and decompression in MySQL 5.7.
[2 Sep 2020 17:50] MySQL Verification Team
Hi Wang,

I do not see what is reported as a bug here. You have shown us an analysis of your system, but what you IMHO need is support from our MySQL Support team to help you properly tune your system. For start, you cannot expect maximum optimization in replication between 5.6 and 5.7 as we are at 8.0 these days. The first thing I would suggest there is to migrate everything to 5.7 or 8.0, but I'd leave our support team to discuss that with you. Also, using a proper tool like MySQL Enterprise Monitor you can see where this delay comes from.

Simple replication setup 5.6 to 5.7 with table compression does not show any degradation in a test scenario so you really require system tuning unless you can create a repeatable test case showing we have a bug here.

kind regards
[3 Sep 2020 0:57] Wang Wei
I have tested and many times the 5.6 upgrade to version 5.7 this delay problem, and also tried multiple minor versions of MySQL 5.7. This problem has existed for a long time and the root cause of the delay has not been determined, but as long as the master and slave libraries of the two versions are consistent , And the parameter ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 is enabled in the table, version 5.6 is the master library, 5.7 is the slave library, there will be a lot of delays, and the 5.6 slave library synchronization 5.6 master library will not have a delay, I think The 5.7 version has a compression compatibility problem, which can be reproduced in the case of a large number of batch write operations.
[3 Sep 2020 1:02] Wang Wei
This problem has appeared for a long time, and I have submitted this problem before (https://bugs.mysql.com/bug.php?id=96091
), but your reply is no problem, but my recent test found that this delay has a lot to do with the parameter ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8. If you don’t use this, the 5.6 main library uses this parameter, and the 5.7 slave This problem will not occur if the table structure of the library is synchronized without this parameter.
[4 Sep 2020 9:36] MySQL Verification Team

I'm not sure I follow.

Have you tried both master and slave to be the latest 5.7 with compression turned on? 

I do not see any performance issues with 5.7 slave.
[16 Sep 2020 2:33] Wang Wei
If the master and slave are both 5.7 are no problem, but the master is 5.6, this problem will occur when compression is enabled when the slave is 5.7.
[16 Sep 2020 12:07] MySQL Verification Team

Replicating from latest 5.6.49 to latest 5.7.31 I cannot reproduce any issues.

Anyhow, replicating betweeen 5.6 and 5.7 is not meant to be a permanent solution, you are supposed to use replication between major releases only as a temporary solution until you upgrade them both to same major release so a small performance tradeoff between two major releases is not something I'd consider a bug.