Bug #69341 | semi-sync replication is very slow with many clients | ||
---|---|---|---|
Submitted: | 29 May 2013 0:55 | Modified: | 5 Jul 2013 10:13 |
Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S5 (Performance) |
Version: | 5.6.11 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[29 May 2013 0:55]
Yoshinori Matsunobu
[29 May 2013 1:09]
Yoshinori Matsunobu
semisync mutex is LOCK_binlog_ defined in semisync_master.h, and here is an example stack trace where mysql_binlog_send() was blocked. #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007f5dd1aee407 in _L_lock_1143 () from /usr/local/fbcode/gcc-4.6.2-glibc-2.13/lib/libpthread.so.0 #2 0x00007f5dd1aee30a in __pthread_mutex_lock (mutex=0x7f5dd0744870) at pthread_mutex_lock.c:101 #3 0x00007f5dd0540ed9 in ReplSemiSyncMaster::updateSyncHeader (this=0x7f5dd0744820, packet=0x7f5b7b970000 "", log_file_name=0x7f5b7c3bbb89 "binary-5611.000003", log_file_pos=247858586, server_id=561111) at semisync_master.cc:912 #4 0x0000000000647183 in Binlog_transmit_delegate::before_send_event (this=0x128c840, thd=<optimized out>, flags=<optimized out>, packet=0x7f5b7c3bb150, log_file=0x7f5b7c3bbb80 "/binlogs/binary-5611.000003", log_pos=247858586) at sql/rpl_handler.cc:348 #5 0x00000000008406d3 in mysql_binlog_send (thd=0x7f5b72821000, log_ident=0x7f5b7c3bc6b0 "", pos=247858586, slave_gtid_executed=0x7f5b7c3bc8c0) at sql/rpl_master.cc:1434
[29 May 2013 11:58]
Luis Soares
Hi Yoshinori! Thanks for reporting this and for providing detailed feedback.
[2 Jun 2013 14:14]
Libing Song
Analysis ======== I tested mysql-5.6.10, mysql-5.6.11 and mysql-5.6.12(not GA yet). I found it is very slow on the debug version of mysql-5.6.10 and mysql-5.6.11. Not only semisync, but also async replication(even without binlog) is slow as well. I tested with 50000 inserts on 100 sessions. mysql-5.6.10 and 5.6.11 can handle 200 inserts/s. mysql-5.6.12 debug version has no performance problem. It can handle 2083 inserts/s. all release versions have no performance problem. They can handle almost 10000 inserts/s. Just like Yoshinori said, the binlog group flush queue is very small, on 5.6.10 and 5.6.11 debug versions. Rpl_semi_sync_master_net_waits are about 49000. But on 5.6.12 debug version, its value is about 12000. 5.6.11 release version has small binlog group flush queue too, Rpl_semi_sync_master_net_waits is about 44000. But on 5.6.12 release version it is about 5000. I finally found the patch for Bug#16720368 made the good performance on mysql-5.6.12 debug version. It removed some debug code of innodb through undefining UNIV_SYNC_DEBUG. But the patch doesn't effect binlog flush queue size. So I need some time to check which patch makes bigger binlog flush queue size. It seems binlog flash queue size doesn't has obvious effect on the performance. Because wait time is decided by replication time which is fixed.
[4 Jun 2013 4:21]
Libing Song
test performance with mysqlslap
Attachment: my_test.sh (application/x-shellscript, text), 5.40 KiB.
[4 Jun 2013 4:25]
Libing Song
Test Methods ============ I have a small script to test the performance. It setup servers through a test case. It is uploaded above. run the script in mysql-test directory. it has three options: simple, rpl, semisync. you can run it like: sh my_test.sh semisync You may encounter a problem that install_semisync.inc doesn't exist. It exists in mysql-5.6.12. I will upload it too. I get release build through "cmake . -DBUILD_CONFIG=mysql_release". I get debug build through "./BUILD/compile-pentium-debug-max-no-ndb
[4 Jun 2013 4:27]
Libing Song
copy it into mysql-test/include
Attachment: install_semisync.inc (application/octet-stream, text), 1.08 KiB.
[4 Jun 2013 4:30]
Libing Song
Hi Yoshinori, As I could not repeat the bug on release versions. Could you please try to reproduce the bug by using the script?
[4 Jun 2013 5:11]
Yoshinori Matsunobu
Libing, I could repeat the problem by increasing rpl_semi_sync_master_timeout. Default is 10000 (10 seconds). I increased this value since I didn't want to semisync to be disabled. I could get better throughput with default rpl_semi_sync_master_timeout, but at that time semisync was disabled. # diff -up include/install_semisync.inc.orig include/install_semisync.inc --- include/install_semisync.inc.orig +++ include/install_semisync.inc @@ -17,6 +17,7 @@ if ($value == No such row) SET sql_log_bin = 0; eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_PLUGIN'; SET GLOBAL rpl_semi_sync_master_enabled = 1; + SET GLOBAL rpl_semi_sync_master_timeout = 100000000; SET sql_log_bin = 1; } --enable_query_log
[4 Jun 2013 11:51]
Libing Song
Hi Yoshinori, Thanks for your valuable feedback. I verified the bug too.
[7 Jun 2013 0:57]
Libing Song
patch for the bug
Attachment: libing-bug69341.diff (text/x-patch), 1.71 KiB.
[7 Jun 2013 0:59]
Libing Song
Hi Yoshinori, All, there is a tentative patch uploaded to this bug report that fixes the reported issue. Feel free to test it. We welcome your feedback on the patch. Thanks!
[5 Jul 2013 10:13]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. Fixed in 5.6+. Documented in the 5.6.13 and 5.7.2 changelogs as follows: When rpl_semi_sync_master_timeout was set to an extremely large value, semi-synchronous replication became very slow, especially when many sessions were working in parallel. It was discovered that the code to calculate this timeout was inside the wait loop itself, with the result that an increase in the value of rpl_semi_sync_master_timeout caused repeated iterations. This fix improves the method used to calculate wakeup times, and moves it outside of the wait loop, so that it is executed one time only. Closed.
[7 Apr 2016 5:46]
MySQL Verification Team
Bug #80651 marked as duplicate of this