MySQL Bugs: #76686: reduce mutex contention of LOCK

Bug #76686	reduce mutex contention of LOCK_done during Group Commit
Submitted:	14 Apr 2015 9:14	Modified:	24 Nov 2015 13:49
Reporter:	zhai weixiang (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:

While testing 5.7.7 with high concurrency UPDATE workload, LOCK_done become the hottest mutex which has the biggest avg
_timer_wait.

mysql> SELECT COUNT_STAR, SUM_TIMER_WAIT, AVG_TIMER_WAIT, EVENT_NAME FROM events_waits_summary_global_by_event_name where COUNT_STAR > 0 and EVENT_NAME like 'wait/synch/%' order by SUM_TIMER_WAIT desc limit 10;
+------------+-------------------+----------------+-------------------------------------------------+
| COUNT_STAR | SUM_TIMER_WAIT | AVG_TIMER_WAIT | EVENT_NAME |
+------------+-------------------+----------------+-------------------------------------------------+
| 6193764 | 10116959865868260 | 1633410210 | wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done |
| 3668477 | 440341688031435 | 120033900 | wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_done |
| 20475626 | 201610885084140 | 9846225 | wait/synch/mutex/innodb/log_sys_mutex |
| 86799 | 82459396897815 | 950003895 | wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_commit |
| 26942676 | 60654154009740 | 2251125 | wait/synch/mutex/innodb/trx_sys_mutex |
| 816248 | 50329738918530 | 61659510 | wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_log |
| 37264701 | 10345439478585 | 277530 | wait/synch/mutex/innodb/redo_rseg_mutex |
| 11024825 | 7652761523310 | 693825 | wait/synch/mutex/innodb/lock_mutex |
| 61828624 | 6717719588595 | 108315 | wait/synch/sxlock/innodb/hash_table_locks |
| 44911844 | 2832104469780 | 62640 | wait/synch/mutex/innodb/trx_mutex |
+------------+-------------------+----------------+-------------------------------------------------+
10 rows in set (0.03 sec)

LOCK_done is used to protect condition waiting for three stages. When signal_done is called, every threads will wake up and check it’s pending state even it’s not necessary.

I slightly changed the code to partition COND_done and LOCK_done. I’ll attach a patch later.

Bellow is a simple test:
basic setting:
innodb_flush_log_at_trx_commit = 2
sync_binlog = 1000
gtid disabled.

all data fits in memory !

using sysbench , update_non_index.lua, 100 tables with 100,000

threads,   orignal , after patching
16, 31300, 31400
32, 34900, 35300
64, 36500, 38500
128, 37700, 39500
256, 35600, 37200

How to repeat:
Test 5.7.7 under high concurrency update workload.

Suggested fix:
Attach the patch later

a simple patch to prove the performance improvement , not fully tested (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: signal_partition_57.diff (application/octet-stream, text), 7.59 KiB.

Hi Zhai,

Thanks for the report and for the patch. I need to double check:

> threads,   orignal , after patching
> 16, 31300, 31400
> 32, 34900, 35300
> 64, 36500, 38500
> 128, 37700, 39500
> 256, 35600, 37200

What these numbers represent (original and after patch)?
rw/sec, no of locks.. ?

I did test your patch and I can say running the update_non_index.lua test I get 10-15% more requests per second and and 40% shorter response time without your patch.

can you confirm your numbers, what exactly are you counting and did you compare response times, rw requests/sec and some other metric during the test with your patch?

all best
Bogdan Kecman

Hey, Bogdan.

These numbers represen updates/sec.

Can you give me some details about your testing server..This is really a simple patch and shouldn't introduce in  regression!

I just re-tested the patch and can still see the improvement though the number is a little different. 

my sysbench command:

sysbench/sysbench --debug=off --test=sysbench/tests/db/update_non_index.lua --oltp-tables-count=10  --oltp-point-selects=0 --oltp-table-size=100000 --num-threads=$1 --max-requests=10000000000 --max-time=7200000 --oltp-auto-inc=off --mysql-engine-trx=yes --mysql-table-engine=innodb --oltp-test-mod=complex --mysql-db=$2  --mysql-host=$host --mysql-port=$port --mysql-user=$username --report-interval=5  --percentile=99 run

The server has 24 cpu cores and 196GB memory, all data can fit in memory.

I set sync_binlog = 1,  innodb_flush_log_at_trx_commit=1 (different with the previous test)

updates/sec :
threads, Original, after patching
16, 17300,  17400
32, 25700, 26100
64, 31700, 33500
128, 37000, 39100
256, 37600, 38800

I've checked the RT, and didn't find any regression.

Hi Zhai,

> These numbers represen updates/sec.

Then I definitely can't reproduce your results and I tried on 2 different boxes.

> Can you give me some details about your testing server..
> This is really a simple patch and shouldn't introduce in  regression!

I too don't see reason for the regression but I do see the regression on 2 servers I tested on. First server is a small testing box (E6850@3GHz, 16G ram, sata spinning disks in 1+0 raid) and second server is 12 core Xeon(R) CPU E5-1650 v2 @ 3.50GHz with 128G ram single spinning disk (I did not test with SSD's).

> I just re-tested the patch and can still see 
> the improvement though the number is a little different. 

I run the test 6 times on both machines, every time the patched version had less updates/sec then unpatched version, both compiled on the machine where test was running.

> my sysbench command:
> 
> sysbench/sysbench --debug=off --test=sysbench/tests/db/update_non_index.lua --oltp-tables-count=10  --oltp-point-selects=0 --oltp-table-size=100000 --num-threads=$1 --max-requests=10000000000 --max-time=7200000 --oltp-auto-inc=off --mysql-engine-trx=yes --mysql-table-engine=innodb --oltp-test-mod=complex --mysql-db=$2  --mysql-host=$host --mysql-port=$port --mysql-user=$username --report-interval=5  --percentile=99 run

The tests I made is different (running with different num-threads everything else is same each time):

sysbench --num-threads=64 --test=./sysbench/tests/db/update_non_index.lua --mysql-socket=/tmp/mysql.sock --mysql-user=sb --mysql-password=sysbench --mysql-db=sysbench   run

I'll re-run it with additional parameters to sysbench but I doubt I'll see a difference

> The server has 24 cpu cores and 196GB memory, all data can fit in memory.

I think all data fits into ram on my 128G server but I did not inspect the test so can't be 100% sure will check, and did not fit in ram on my first server (intentionally using smaller server). I seen more degradation on the first server (data not all in ram) then on the bigger one but in both cases I see less updates/sec with patch.

> I set sync_binlog = 1,  innodb_flush_log_at_trx_commit=1 (different with the previous test)

I was running the test with "default" config (only ibd pool is 80% of the ram everything else is not configured so defaults used), seems appropriate for testing this type of patch

> I've checked the RT, and didn't find any regression.

I'll re-run the test on a larger, 24 core machine to see if I can reproduce your results but in any way having regression on the smaller setup is worrying.

all best
Bogdan

Hi, Bogdan

With default configuration , I guess the main bottleneck will not be group commit, but dirty page preflush because the default log file size is too small. :) 

I suggest you configure the following option:

innodb_log_files_in_group=4
innodb_log_file_size=4G
innodb_log_buffer_size=100M
 
Besides, If we don't consider the result of the test (of course, the test is important !!), what' your opinion about this patch ? should the performance be improved In theory ?

Hi,

I'll redo the test on larger machine and with different config but even if this do change result for the better there's still a question of 10-15% lower performance on smaller setup that we can't ignore. 

Why this patch introduces this degradation I can't say, from looking at the patch I see it can improve performance in certain situation and do not affect performance in other - I don't see a scenario where it will degrade performance; but there's obviously something I'm not seeing that's there as tests are pretty conclusive :(

Let me rerun the test and then, if I, in any scenario, reproduce your results, we can then introduce this patch to a review committee for further analysis.

all best
Bogdan Kecman

Hi,

I did verify this. Now it's up to "higher instances" on how to proceed :)

thanks for your work
Bogdan Kecman

just to record, there is a bug in the patch i attached before,  cond_index will overflow after long time running and turn negative. This situation should be taken into consideration.

rebase to 5.7.10 and fix several bugs

Attachment: group_commit_signal.diff (text/x-patch), 10.99 KiB.