Bug #73202 | write/sync redo log before flush thread cache to binlog | ||
---|---|---|---|
Submitted: | 5 Jul 2014 0:31 | Modified: | 24 Nov 2014 2:01 |
Reporter: | zhai weixiang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6, 5.7.5 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[5 Jul 2014 0:31]
zhai weixiang
[7 Jul 2014 7:07]
zhai weixiang
patch based on 5.7.4, add new option binlog_enhanced_sync to control the new behavior
Attachment: bug73202.diff (application/octet-stream, text), 13.24 KiB.
[15 Jul 2014 14:29]
Sveta Smirnova
Thank you for the report and contribution. But of us to accept your contribution you need to sign OCA and contribute through "Contribution" tab. Please do it.
[16 Jul 2014 8:45]
zhai weixiang
updated patch, code cleanup
Attachment: bug73202.diff (application/octet-stream, text), 12.96 KiB.
[18 Jul 2014 14:41]
zhai weixiang
re-attach the patch (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug73202.diff (application/octet-stream, text), 10.94 KiB.
[4 Aug 2014 5:54]
zhai weixiang
is there any comment ?
[5 Aug 2014 12:15]
Sveta Smirnova
Thank you for the feedback and re-attaching the patch. We still need to run the tests in our environment to confirm that the bug exists. Please be patient: it will take some time.
[5 Aug 2014 18:52]
Inaam Rana
This is an interesting idea. Couple of comments about the patch: * I am not clear why you need to make all the changes to MYSQL_BIN_LOG::process_flush_stage_queue(). I'd say something like following at the start of the function should do the trick: if (enhanced_sync) ha_flush_logs(NULL, 1); * There is a bit of a race where opt_binlog_enhanced_sync can change between the time it is read in innodb prepare code path and later in process_flush_stage_queue().
[6 Aug 2014 0:20]
zhai weixiang
hi, Inaam Thanks for your reply. First: The main logic of changes in function process_flush_stage_queue : step 1: collect group commit queue step 2: write/sync redo log step 3: flush binlog cache. Suppose we swap step 1 and step2: step 1: write/sync redo log step 2: collect group commit queue step 3: flush binlog cache a transaction may haven't prepared (so it's redo log is not copied to log buffer after step1) but quickly entered into ordered_commit and marked itself as a follower before step 2. Second: Yes, you're right. this option should remain unchanged between innodb prepare and binlog commit. But I don't know how to solve this problem gracefully :( . Since this option is rarely changed, I decide to tolerate this race currently. But of course, this should be fixed.
[6 Aug 2014 1:59]
Inaam Rana
I don't fully understand this part of code but won't having LOCK_log prevent incoming requests to be queued in the flush queue?
[6 Aug 2014 2:10]
zhai weixiang
Only the leader of group has to acquire LOCK_log. Check the code in function MYSQL_BIN_LOG::change_stage -> Stage_manager::enroll_for
[11 Aug 2014 23:51]
Sveta Smirnova
Thank you for the report. Verified as described. My results on 8-cores machine: no patch patch 4 1916.32 2330.66 8 2840.54 3200.32 16 3506.79 4369.22 32 1994.24 2775.53
[13 Aug 2014 14:42]
Inaam Rana
Weixiang, Thanks for the pointer. I'll look into that code. One suggestion is that though the current approach should work but I think we should leave the write during innodb prepare as is. That is, we only attempt to group the sync operation which is the real heavy duty call.
[14 Aug 2014 0:09]
Sunny Bains
I agree with Inaam, we should split the fix into two and do the binlog changes first and then decide on the InnoDB changes.
[14 Aug 2014 2:59]
zhai weixiang
Maybe we can add some option to let user choose: group redo sync or group redo write. :-) because we can also reduce log_sys mutex contention besides IO overhead if log_write_up_to is not called during innodb prepare.
[19 Nov 2014 13:26]
David Moss
Posted by developer: Thanks for your feedback. The following was added to the 5.6.23 and 5.7.6 changelog with commit 4672: When using SHOW SLAVE STATUS to monitor replication performance, Seconds_Behind_Master sometimes displayed unexpected lag behind the master. This was caused by Previous_gtids_log_events being written to the slave's relay log with a timestamp behind the master, and then being used to calculate the Seconds_Behind_Master. This fix ensures that events generated on the slave that are added to the relay log and not used when calculating Seconds_Behind_Master.
[19 Nov 2014 14:12]
zhai weixiang
Wait..David, the changelog entry you posted seems not related to this bug report..
[20 Nov 2014 7:36]
zhai weixiang
let me re-open this bug :)
[20 Nov 2014 10:23]
David Moss
Hello Zhai, yes you are right, sorry that was my mistake. Let me double check the status of this and get back to you.
[20 Nov 2014 11:08]
David Moss
OK so I have spoken to the Replication team and they will follow up in regards to your contribution personally. I will close the bug again :)
[24 Nov 2014 2:01]
zhai weixiang
Hi, David, So what is the state of this bug ? Is it fixed in the next release ?
[4 Dec 2014 16:12]
David Moss
Hello Zhai, this is the content that has been added to the 5.7.6 changelog: When using InnoDB with binary logging enabled, concurrent transactions written in the InnoDB redo log are now grouped together before synchronizing to disk when innodb_flush_log_at_trx_commit is set to 1, which reduces the amount of synchronization operations. This can lead to improved performance. The replication team will follow up to let you know how they worked with your contribution. Thanks!
[4 Dec 2014 16:20]
Luis Soares
Zhai, Thank you. Your contributed patch inspired us and served as a foundation for writing a fix for the issue at hand. Regards, Luís