Bug #83226 BINLOG SENDER IS GENERATING UNWANTED FSEEKS ON BINARY LOG FILE
Submitted: 30 Sep 2016 12:32 Modified: 16 Dec 2016 14:17
Reporter: João Gramacho Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[30 Sep 2016 12:32] João Gramacho
Description:
The Binlog_sender does the following "per event" to be sent to a slave:

1) Peek the event size;
2) Adjust the packet buffer to fit the whole event, based on its size;
3) Read the (full) event into the buffer;
4) Send the event to the slave.

Detailing (1) a little bit, it:
1.1) Saves current IO_CACHE position;
1.2) Read the event reader (currently 19 bytes) into a temporary buffer;
1.3) Get the event size from the temporary buffer;
1.4) Change back the IO_CACHE position to the beginning of the event;

Detailing (3):
3.1) Read the event reader (currently 19 bytes) from the IO_CACHE into a temporary buffer;
3.2) Get the event size from the temporary buffer;
3.3) Append the temporary buffer (with the event header) to the packet buffer;
3.4) Calculate the remaining event data to be read (event size - header size);
3.5) Append the remaining event data from the IO_CACHE directly to the packet buffer.

The binary log file IO_CACHE uses an internal 8K buffer for caching the real file access. It reads 8K blocks from the real file when the read request is less than 8K in size.

When the event header is fully stored within an 8K block of the file, the IO_CACHE guarantees that the read(of 19 bytes) and the fseek(to current position minus 19) will not require additional reads from the real file.

When the event header is crossing the boundary of an 8K block, the IO_CACHE copies the part that is in the current 8K block, reads the next 8K block from the file and copies the rest of the event header. The "problem" is that, in this case, the repositioning to the beginning of the event will require to read the previous 8K block again from the real file. This will lead to a hit at "COUNT_MISC" counter at "performance_schema.file_summary_by_event_name" for the event_name 'wait/io/file/sql/binlog'.

How to repeat:
1) Setup master and slave replication;
2) Stop the slave replication thread;
3) Generate a workload on the master with varying transaction sizes, but avoiding binary log rotation;
4) TRUNCATE performance_schema.file_summary_by_event_name;
5) Start the slave replication receiver (a.k.a. I/O) thread;
6) Wait for the receiver thread to sync with the master;
7) SELECT EVENT_NAME, COUNT_MISC, SUM_TIMER_MISC FROM performance_schema.file_summary_by_event_name WHERE EVENT_NAME LIKE '%binlog';

The COUNT_MISC should have a very small value (approx. 5), showing that no unwanted "MISC" operations happened while dumping the binary log file content to the slave.

Suggested fix:
As the code is reading the event header twice (once for peeking the event size, other to prepare the packet to be sent to the slave), I suggest to optimize this, reading the event reader only once for both operations, removing the need of the item (1.4) of mentioned on the bug description.
[16 Dec 2016 14:17] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.17 and 8.0.1 changelogs:
The number of generated unwanted fseeks into the binary log file being replicated to a slave has been reduced.