Bug #72829 LOCK_grant is major contention point, leaves 30-40% idle CPU.
Submitted: 2 Jun 2014 2:02 Modified: 27 Mar 2015 14:52
Reporter: Stewart Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Security: Privileges Severity:S5 (Performance)
Version:5.7.4-m14 OS:Linux
Assigned to: Dmitry Lenev CPU Architecture:Any
Tags: contention, mutex, performance, PowerPC

[2 Jun 2014 2:02] Stewart Smith
Description:
In running key lookup read-only benchmarks on a dual socket POWER8, similar to Dimitri's setup that got 500kQPS here: http://dimitrik.free.fr/blog/archives/2013/10/mysql-performance-the-road-to-500k-qps-with-... I encountered some contended mutexes when trying to use all of the CPU in the system.

With my patches I have posted in other bug reports, I could get good performance that many people would be extremely happy with, but there was still 30-40% idle CPU on the system as reported by top.

Analysis of this with perf led to the LOCK_grant rwlock as being the major culprit. This is due to contention on gaining a rdlock for the LOCK_grant rwlock.

Once commenting out the acquisition of the rdlock (safe enough for a benchmark, likely not production where GRANT queries are executed), I got a 50% increase in performance without even trying (and roughly 60% if I pushed the server).

i.e. a 50% increase in number of queries per second simply by alleviating the contention point of acquiring a single read lock to check table permissions.

(i'll extract the exact patch snippet a bit later, that machine is currently down).

How to repeat:
Benchmark on powerful system, I used a dual socket POWER8 system. You may be more likely to see this issue on a system with several NUMA nodes, but I have not investigated this very far.

I am not sure how this contention would present itself on Intel systems, but on decently large Intel SMP systems you may be able to get a gain by doing the same (it depends how visible the bouncing of the cacheline with the rwlock on it between CPUs is).

Suggested fix:
There are several approaches that can be taken:
1) As this is mostly a read only structure (unless you go and modify grants), you could split the mutex into several with a writer having to aquire all of them. This would likely solve the problem pretty well, although looks a bit funny :)
2) RCU is made for solving this problem.
[3 Jun 2014 5:45] zhai weixiang
I have observed a similar  stall while testing mysql 5.6.16 using sysbench (oltp_simple.lua, simple query according to pk), 100 threads.

And bellow is the backtrace of stalled threads:
     26 __lll_lock_wait(libpthread.so.0),pthread_rwlock_rdlock(libpthread.so.0),inline_mysql_rwlock_rdlock(mysql_thread.h:865),check_grant(mysql_thread.h:865),check_table_access(sql_parse.cc:6073),select_precheck(sql_parse.cc:7817),mysql_execute_command(sql_parse.cc:3025),mysql_parse(sql_parse.cc:6719),dispatch_command(sql_parse.cc:1629),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1858),start_thread(libpthread.so.0),clone(libc.so.6)

mysql> SELECT COUNT_STAR, SUM_TIMER_WAIT, AVG_TIMER_WAIT, EVENT_NAME FROM events_waits_summary_global_by_event_name where COUNT_STAR > 0 and EVENT_NAME like 'wait/synch/%' order by SUM_TIMER_WAIT desc limit 10;
+------------+-----------------+----------------+-------------------------------------------+
| COUNT_STAR | SUM_TIMER_WAIT  | AVG_TIMER_WAIT | EVENT_NAME                                |
+------------+-----------------+----------------+-------------------------------------------+
|  253517384 | 693956170419380 |        2737208 | wait/synch/mutex/innodb/trx_sys_mutex     |
|   84200619 | 170156170825292 |        2020424 | wait/synch/rwlock/sql/LOCK_grant          |
|  169394249 | 162045674872324 |         956584 | wait/synch/mutex/sql/LOCK_table_cache     |
|  169266465 | 102097289542516 |         602988 | wait/synch/rwlock/sql/MDL_lock::rwlock    |
|  158900882 |  85391894604336 |         537152 | wait/synch/mutex/sql/MDL_map::mutex       |
|  422538610 |  47745352665600 |         112924 | wait/synch/mutex/sql/THD::LOCK_thd_data   |
|   86263377 |  40824446804108 |         473060 | wait/synch/rwlock/innodb/btr_search_latch |
|  168992808 |  35817481899112 |         211896 | wait/synch/mutex/mysys/THR_LOCK::mutex    |
|    1204410 |   2312177205408 |        1919708 | wait/synch/mutex/innodb/dict_sys_mutex    |
|     800543 |   2139313523784 |        2672244 | wait/synch/mutex/sql/LOCK_open            |
+------------+-----------------+----------------+-------------------------------------------+
10 rows in set (0.27 sec)
[3 Jun 2014 7:48] Stewart Smith
The difference for sysbench point selects is the difference between ~600kQPS and 1 million kQPS. See https://www.flamingspork.com/blog/2014/06/03/1-million-sql-queries-per-second-mysql-5-7-on...

Basically, LOCK_grant is what is in the way of 1 million queries per second with ("unmodified".. apart from my "make it work on POWER" patches) MySQL 5.7.
[3 Jun 2014 19:49] Kristofer Pettersson
Is this the situation for a few user system or are there many users? I think there is a linear search hiding in there too if I'm not mistaken.
[3 Jun 2014 23:03] Stewart Smith
In my benchmark there is only root, with no password and the default setup of "root can access everything".

The connections are merely sysbench.

The overhead is in having the cacheline with the rwlock bounce between CPUs.
[4 Jun 2014 0:46] Stewart Smith
On POWER, you can easily see this is perf by looking at how long the load instruction takes for the rwlock itself. I suspect there's a way to extract this data on intel too.
[1 Jul 2014 3:20] Stewart Smith
Any chance of having this Verified? It seems that others can reproduce with different workloads and simply by code review you can see that the rwlock can easily be the bottleneck.
[22 Oct 2014 9:48] zhai weixiang
While testing 5.7.5...LOCK_grant became the major contention during PK-QUERY test after THR_LOCK is removed completely.

Using sysbench , query by pk , 200 tables with 10 record.

From performance schema:

root@performance_schema 05:37:18>SELECT COUNT_STAR, SUM_TIMER_WAIT, AVG_TIMER_WAIT, EVENT_NAME FROM events_waits_summary_global_by_event_name where COUNT_STAR > 0 and EVENT_NAME like 'wait/synch/%' order by SUM_TIMER_WAIT desc limit 20;
+------------+----------------+----------------+---------------------------------------------------+
| COUNT_STAR | SUM_TIMER_WAIT | AVG_TIMER_WAIT | EVENT_NAME                                        |
+------------+----------------+----------------+---------------------------------------------------+
|   26086077 | 17068952969200 |         654000 | wait/synch/rwlock/sql/LOCK_grant                  |
|   78512461 | 16715117127344 |         212768 | wait/synch/sxlock/innodb/hash_table_locks         |
|  130770402 | 15605301433540 |         119028 | wait/synch/mutex/sql/THD::LOCK_query_plan         |
|   52314260 | 12553580466128 |         239800 | wait/synch/mutex/sql/LOCK_table_cache             |
|   78446571 |  9825008327284 |         125132 | wait/synch/mutex/sql/THD::LOCK_thd_data           |
|   26175928 |  6455036525220 |         246340 | wait/synch/sxlock/innodb/index_tree_rw_lock       |
|   52299088 |  5974745787424 |         114232 | wait/synch/mutex/sql/THD::LOCK_thd_query          |
|       7568 |     1095036672 |         144316 | wait/synch/mutex/innodb/flush_list_mutex          |
|       7656 |      885055584 |         115540 | wait/synch/mutex/innodb/buf_pool_mutex            |
|       7603 |      638888240 |          83712 | wait/synch/mutex/sql/LOCK_global_system_variables |
|        242 |      354649376 |        1465396 | wait/synch/sxlock/innodb/dict_operation_lock      |
|       3445 |      351421232 |         101588 | wait/synch/mutex/innodb/dict_sys_mutex            |
|       1602 |      339106848 |         211460 | wait/synch/mutex/innodb/innobase_share_mutex      |
|       1602 |      321693008 |         200560 | wait/synch/mutex/sql/LOCK_open                    |
|       2672 |      315876768 |         118156 | wait/synch/mutex/sql/LOCK_plugin                  |
|       1050 |      308335712 |         293428 | wait/synch/mutex/sql/LOCK_connection_count        |
|       1602 |      235623120 |         146932 | wait/synch/mutex/innodb/file_format_max_mutex     |
|       1281 |      217084400 |         169168 | wait/synch/mutex/sql/LOCK_thd_list                |
|        983 |      209930512 |         213204 | wait/synch/mutex/sql/LOCK_status                  |
|       1380 |      199285136 |         144316 | wait/synch/mutex/sql/LOCK_user_conn               |
+------------+----------------+----------------+---------------------------------------------------+
20 rows in set (0.50 sec)
[12 Nov 2014 23:49] Matthew Lord
We've also seen this in our performance testing with MySQL 5.7. We want to do some refactoring around this mutex. 

Marking it as a verified and known issue.
[14 Nov 2014 4:32] Matthew Lord
Thanks, guys! :)
[9 Mar 2015 12:05] Ståle Deraas
Posted by developer:
 
This bug was fixed in WL#8355, and pushed to 5.7.
[10 Mar 2015 1:59] Stewart Smith
I look forward to testing the fix (I can't see anything more recent than 5.7.5 and the worklog is private, so I cannot provide feedback yet).
[27 Mar 2015 14:52] Paul DuBois
Noted in 5.7.7 changelog.

To address a scalability bottleneck for some workloads where
LOCK_grant is locked in read-mode, LOCK_grant locks are now
partitioned. Read lock requests on LOCK_grant now acquire one of
multiple LOCK_grant partitions. Write locks must acquire all
partitions.