Bug #83000 Mysql 5.7.13 replication cannot keep up. 5.7.12 works fine.
Submitted: 14 Sep 2016 19:31 Modified: 21 Dec 2016 11:20
Reporter: Pete French Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Packaging Severity:S2 (Serious)
Version:5.7.13 (and 5.7.15) OS:FreeBSD
Assigned to: CPU Architecture:Any

[14 Sep 2016 19:31] Pete French
Description:
Upgrading our systems to 5.7.13 leads to a situation where slaves cannot keep up with replication even under moderate load. The seconds behind master keeps growing. It appears that it is thus the SQL thread which is causing the issue, not the Io thread. A 'show process list' doesn't how anything happening on the slave, even when it is several hours behind. We are replicating using GTIDs and the underlying filesystem is ZFS.

How to repeat:
Upgrade mastr/slave setup from 5.7.12 to 5.7.13

Suggested fix:
Rolling back to 5.7.12 fixes the issue, the lost hours are caught up in a matter of minutes.
[15 Sep 2016 22:08] MySQL Verification Team
Thank you for the bug report. Are you able to test the lastest 5.7.15 release presents the same issue than 5.7.13 on your environment?. Thanks.
[19 Sep 2016 16:22] Pete French
I just tries this with 5.7.15 and it lags just the same - currently at 13 minutes behind and gettling slower and slower. Have updated the version accordingly

I can provide a machine exhibiting the problem for you to login and example if required.
[19 Sep 2016 16:30] Pete French
I just rolled back to 5.7.12 - it went from being seventeen minutes behind to zero in about three to four seconds.
[4 Nov 2016 19:39] Pete French
Is there any progress on this ? I can still easily reproduce the issue, and the offer of setting up a system for you to login to, with the problem in lace, is still open if required. I was going to try a trace to see if I can see any difference in the system calls that the two versions are making, would that help >
[1 Dec 2016 13:51] Pete French
More tetsing on this - it is a FreeBSD specific problem. It occurs on both 10.3 and 11.0, and is independent of the underlying filesystem as I tried both ZFS and UFS.

I also tried Open Indiana (Solariis) with the same dataset and that works fine, hence I conclude its FreeBSD only.
[20 Dec 2016 12:31] Pete French
I managed to isolate this behaviour on FreeBSD to a change in the mode of the installed example my.cnf file from 0640 to 0644. This makes no sense to me (it is not even the file I am using) but it is completely reproducible. FreeBSD bug is here:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214566
[20 Dec 2016 13:55] MySQL Verification Team
So can you show us a diff of the output of SHOW GLOBAL VARIABLES between
the two versions,  or just the contents of that my.cnf..

Then we should close this bug?
[21 Dec 2016 10:57] Pete French
Woah! Loads fo différences! But how ? Thats not my my,cnf files ? Is it reading that one and then applying my one over the top ? The differences are below for the record...

diff vars.fast vars.slow
11c11
< |binlog_cache_size|32768|
---
> |binlog_cache_size|16777216|
62c62
< |explicit_defaults_for_timestamp|OFF|
---
> |explicit_defaults_for_timestamp|ON|
76c76
< 449a194c-4623-11e3-8c37-001b783c1faf:1-25009041|
---
> 449a194c-4623-11e3-8c37-001b783c1faf:1-25009046|
80,81c80,82
< |gtid_purged|3c6b6d8b-7065-11e6-95fb-001d9263ab24:1-5,
< 449a194c-4623-11e3-8c37-001b783c1faf:1-23324369|
---
> |gtid_purged|2672cfab-b568-11e6-bf72-d0bf9c45a37c:1-5,
> 3c6b6d8b-7065-11e6-95fb-001d9263ab24:1-5,
> 449a194c-4623-11e3-8c37-001b783c1faf:1-24640111|
92c93
< |have_symlink|YES|
---
> |have_symlink|DISABLED|
111c112
< |innodb_autoinc_lock_mode|1|
---
> |innodb_autoinc_lock_mode|2|
132,133c133,134
< |innodb_data_file_path|ibdata1:12M:autoextend|
< |innodb_data_home_dir||
---
> |innodb_data_file_path|ibdata1:128M:autoextend|
> |innodb_data_home_dir|/var/db/mysql|
189c190
< |innodb_open_files|2000|
---
> |innodb_open_files|16384|
199c200
< |innodb_read_io_threads|4|
---
> |innodb_read_io_threads|8|
220c221
< |innodb_temp_data_file_path|ibtmp1:12M:autoextend|
---
> |innodb_temp_data_file_path|ibtmp1:128M:autoextend|
230c231
< |innodb_write_io_threads|4|
---
> |innodb_write_io_threads|8|
257c258
< |log_output|FILE|
---
> |log_output|TABLE|
270c271
< |long_query_time|10.000000|
---
> |long_query_time|0.500000|
306c307
< |myisam_recover_options|OFF|
---
> |myisam_recover_options|BACKUP,FORCE|
314c315
< |net_retry_count|10|
---
> |net_retry_count|16384|
433c434
< |slave_load_tmpdir|/usr/home/mysql/tmp|
---
> |slave_load_tmpdir|/var/db/mysql_tmpdir|
446c447
< |slow_query_log|OFF|
---
> |slow_query_log|ON|
473c474
< |sync_relay_log|10000|
---
> |sync_relay_log|1|
476,477c477,478
< |table_definition_cache|1400|
< |table_open_cache|2000|
---
> |table_definition_cache|8192|
> |table_open_cache|16384|
[21 Dec 2016 11:08] MySQL Verification Team
Different config will lead to different performance. hence, not a bug here.
Not sure what's to blame, but large binlog_cache_size isn't doing you any favors if most trx are small.

http://dev.mysql.com/doc/refman/5.7/en/option-files.html
[21 Dec 2016 11:20] Pete French
Indeed. I didn't realise it was reading multiple config files, that does explain everything. Time to overhaul where I keep files and look again at some of the options. Sorry for the noise.