Bug #80889 PURGE BINARY LOGS TO is reading the whole binlog file and causing MySql to Stall
Submitted: 29 Mar 2016 15:31 Modified: 5 Aug 2016 10:22
Reporter: Davy Campano Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:5.7.11 OS:Linux (4.1.17-22.30.amzn1.x86_64)
Assigned to: CPU Architecture:Any

[29 Mar 2016 15:31] Davy Campano
Calling "PURGE BINARY LOGS TO 'binarylog.00003'" causes the system to read the whole 'binarylog.00003' file.  

I recently upgraded from MySql 5.6.22 to MySql 5.7.10 and when the automatic purge for binary logs occurs (expire_logs_days), the system stalls for 8-12 seconds until it finishes.  I am using 1GB binary log files and I have been able to reproduce this on a fresh install.  I have tested this on the most recent version (5.7.11) and able to reproduce the issue there too.

How to repeat:
1.  Install MySql 5.7.11
2.  Run mysqlslap to generate a large binary log
     mysqlslap --host=localhost  --auto-generate-sql --verbose --concurrency=500 --iterations=100 --number-int-cols=5 --number-char-cols=20
3.  Attach strace to the mysqld process and run "PURGE BINARY LOGS TO"
4.  Observe that strace shows the whole file being read
[29 Mar 2016 15:34] Davy Campano
my.cnf with tested settings

Attachment: my.cnf (application/octet-stream, text), 948 bytes.

[29 Mar 2016 15:40] Davy Campano
strace log when purge is running

Attachment: purge_strace1.log (application/octet-stream, text), 3.33 KiB.

[29 Mar 2016 19:15] Davy Campano
After further testing this is only occurring when GTID_MODE=OFF.  If I enable GTIDs, then the whole file is not read.  This bug is occurring on a fresh install of Amazon Linux 2015.09.  I can test further distributions if needed, but it doesn't appear to be distro related.
[5 Apr 2016 11:38] Venkatesh Duggirala
Post by Developer
Hell Davy Campano, 

Thank you for the bug report. It is verified using your steps.

[5 Aug 2016 10:22] David Moss
Posted by developer:
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.14 changelog:
When --gtid-mode=off, issuing a PURGE BINARY LOGS TO file statement causes the system to read the whole file. Depending on the size of the file this could cause the server to stall for a few seconds. This was due to the way binary log files were being searched for the Previous_gtids value. The fix adds an optimization to stop the reading of the binary log file if it is supposed only to read the lost GTIDs and a Previous_gtids event is found.