MySQL Bugs: #73202: write/sync redo log before flush thread cache to binlog

Bug #73202	write/sync redo log before flush thread cache to binlog
Submitted:	5 Jul 2014 0:31	Modified:	24 Nov 2014 2:01
Reporter:	zhai weixiang (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.6, 5.7.5	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	Contribution

Description:
According to the crash recovery logic of mysql server, we only need to write/sync prepared transaction before writing to binlog. And currently transactions are not grouped during innodb prepare stage.  

I changed the related code described as bellow:

1. don’t sync redo log during innodb prepare stage if binlog enabled.

2. during flush stage of group commit (In function process_flush_stage_queue)
a) collect  group commit queue

b) call ha_flush_logs to sync redo log (make sure all redo log of prepared transaction in current 
queue were synced before writing to binlog)
--innobase_flush_logs have to be changed.

c) flush thread cache to binlog file for every thd in queue

a simple test:
sysbench , 20 tables with 100,000 records, update_index.lua

binlog_enhanced_flush : 0 means default logic, 1 means new logic described above (based on MySQL5.6.16)

threads binlog_enhanced_flush = 0 binlog_enhanced_flush = 1
8 2740 3000
16 4350 5400
32 7300 9800
64 11700 15800
128 18800 24000
256 24000 30000
512 30000 39000

How to repeat:
read the code 

Suggested fix:
I will attach a simple patch later to prove the performance improvement

patch based on 5.7.4, add new option binlog_enhanced_sync to control the new behavior

Attachment: bug73202.diff (application/octet-stream, text), 13.24 KiB.

Thank you for the report and contribution.

But of us to accept your contribution you need to sign OCA and contribute through "Contribution" tab. Please do it.

updated patch, code cleanup

Attachment: bug73202.diff (application/octet-stream, text), 12.96 KiB.

re-attach the patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug73202.diff (application/octet-stream, text), 10.94 KiB.

is there any comment ?

Thank you for the feedback and re-attaching the patch.

We still need to run the tests in our environment to confirm that the bug exists. Please be patient: it will take some time.

This is an interesting idea. Couple of comments about the patch:
* I am not clear why you need to make all the changes to MYSQL_BIN_LOG::process_flush_stage_queue(). I'd say something like following at the start of the function should do the trick:

if (enhanced_sync)
  ha_flush_logs(NULL, 1);

* There is a bit of a race where opt_binlog_enhanced_sync can change between the time it is read in innodb prepare code path and later in process_flush_stage_queue().

hi, Inaam

Thanks for your reply.

First:

The main logic of changes in function process_flush_stage_queue :
step 1: collect group commit queue
step 2: write/sync redo log
step 3: flush binlog cache.

Suppose we swap step 1 and step2:
step 1: write/sync redo log
step 2: collect group commit queue
step 3: flush binlog cache
a transaction may haven't prepared (so it's redo log is not copied to log buffer after step1) but quickly entered into ordered_commit and marked itself as a follower before step 2.

Second:
Yes, you're right. this option should remain unchanged between innodb prepare and binlog commit. But I don't know how to solve this problem gracefully :( . Since this option is rarely changed, I decide to tolerate this race currently. But of course, this should be fixed.

I don't fully understand this part of code but won't having LOCK_log prevent incoming requests to be queued in the flush queue?

Only the leader of group has to acquire LOCK_log. Check the code in function MYSQL_BIN_LOG::change_stage -> Stage_manager::enroll_for

Thank you for the report.

Verified as described.

My results on 8-cores machine:

	no patch	patch
4	1916.32		2330.66
8	2840.54		3200.32 
16	3506.79		4369.22
32	1994.24		2775.53

Weixiang,

Thanks for the pointer. I'll look into that code.
One suggestion is that though the current approach should work but I think we should leave the write during innodb prepare as is. That is, we only attempt to group the sync operation which is the real heavy duty call.

I agree with Inaam, we should split the fix into two and do the binlog changes first and then decide on the InnoDB changes.

Maybe we can add some option to let user choose: group redo sync or group redo write. :-) because we can also reduce log_sys mutex contention besides IO overhead if log_write_up_to is not called during innodb prepare.

Posted by developer:
 
Thanks for your feedback. The following was added to the 5.6.23 and 5.7.6 changelog with commit 4672:
When using SHOW SLAVE STATUS to monitor replication performance, Seconds_Behind_Master sometimes displayed unexpected lag behind the master. This was caused by Previous_gtids_log_events being written to the slave's relay log with a timestamp behind the master, and then being used to calculate the Seconds_Behind_Master. This fix ensures that events generated on the slave that are added to the relay log and not used when calculating Seconds_Behind_Master.

Wait..David, the changelog entry you posted seems not related to this bug report..

let me re-open this bug :)

Hello Zhai,
yes you are right, sorry that was my mistake. Let me double check the status of this and get back to you.

OK so I have spoken to the Replication team and they will follow up in regards to your contribution personally. I will close the bug again :)

Hi, David, So what is the state of this bug ? Is it fixed in the next release ?

Hello Zhai, this is the content that has been added to the 5.7.6 changelog:
When using InnoDB with binary logging enabled, concurrent transactions written in the InnoDB redo log are now grouped together before synchronizing to disk when innodb_flush_log_at_trx_commit is set to 1, which reduces the amount of synchronization operations. This can lead to improved performance.

The replication team will follow up to let you know how they worked with your contribution. Thanks!

Zhai,

Thank you.

Your contributed patch inspired us and served as a foundation for
writing a fix for the issue at hand.

Regards,
Luís