Bug #62018 innodb adaptive hash index mutex contention
Submitted: 28 Jul 2011 15:15 Modified: 16 Jan 2013 17:43
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.6.3 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: innodb, performance

[28 Jul 2011 15:15] Mark Callaghan
Description:
I am testing the 5.6.3 innodb features preview from labs.mysql.com to determine whether smaller page sizes improve the peak page read rate I can get via InnoDB. I have a flash card that can do ~35k reads/second at 16kb, ~50k reads/second at 8kb and ~75k reads/second at 4kb. InnoDB is unable to saturate this card (and benefit from smaller pages) because of mutex contention elsewhere. My workload was sysbench doing single row fetches by primary key with a 16GB buffer pool and ~32GB of data. This has significant buffer pool churn so the adaptive hash index is constantly updated.  At medium concurrency levels (32 to 64 clients) there is significant contention on btr_search_latch. I took several call stack samples with PMP to determine the problem:

In two of the cases for 32 concurrent clients, these were the threads holding btr_search_latch. In the second case ha_remove_all_nodes_to_page is doing heap managment while holding btr_search_latch (innodb_use_sys_malloc=ON). mem_heap_block_free is really hard to understand with so many ifdefs.

ha_remove_all_nodes_to_page,btr_search_drop_page_hash_index,buf_LRU_free_block,buf_LRU_search_and_free_block,buf_LRU_get_free_block,buf_page_init_for_read,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innobase::index_read,handler::index_read_idx_map,join_read_const,join_read_const_table,make_join_statistics,JOIN::optimize,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone

 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,os_event_reset,sync_array_reserve_cell,mutex_spin_wait,mem_heap_block_free,ha_remove_all_nodes_to_page,btr_search_drop_page_hash_index,buf_LRU_free_block,buf_LRU_search_and_free_block,buf_LRU_get_free_block,buf_page_init_for_read,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innobase::index_read,handler::index_read_idx_map,join_read_const,join_read_const_table,make_join_statistics,JOIN::optimize,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone

How to repeat:
run a workload using fast storage device and determine whether InnoDB can do more than ~30,000 QPS when each query is likely to do a page read. ramfs is sufficient. you don't need flash.

Suggested fix:
reduce contention on btr_search_latch
[28 Jul 2011 15:24] Mark Callaghan
At higher concurrency levels transaction start (open read view) and stop (even for read only) becomes more significant as a bottleneck. So I think the btr_search_latch problem should be fixed first.
[29 Jul 2011 19:16] Mark Callaghan
This is another example of the thread that holds btr_search_latch getting blocked on a mutex while free'ing memory. I will guess that it is mem_heap_block_free in this block calling buf_block_free and blocked on the the buffer_pool_mutex.

I suspect it isn't a good idea to have the adaptive hash memory get returned frequently to the buffer pool.

        if (type == MEM_HEAP_DYNAMIC || len < UNIV_PAGE_SIZE / 2) {

                ut_ad(!buf_block);
                mem_area_free(block, mem_comm_pool);
        } else {
                ut_ad(type & MEM_HEAP_BUFFER);

                buf_block_free(buf_block);
        }

Thread 19 (Thread 1267435840 (LWP 2946)):
#0  0x0000003cc74b8c17 in sched_yield () from /lib64/libc.so.6
#1  0x0000000000870eb9 in mutex_spin_wait (mutex=0xf5d1568, file_name=0xa3ff18 "/s/bld/563inno/storage/innobase/include/buf0buf.ic", line=825) at /s/bld/563inno/storage/innobase/sync/sync0sync.c:537
#2  0x0000000000822761 in mem_heap_block_free (heap=<value optimized out>, block=0x2aabe9058000) at /s/bld/563inno/storage/innobase/include/sync0sync.ic:222
#3  0x00000000008f5810 in ha_remove_all_nodes_to_page (table=0x32, fold=<value optimized out>, page=0x2aae72803000 "R4<AB>C<FF><FF><FF><FF>") at /s/bld/563inno/storage/innobase/ha/ha0ha.c:379
#4  0x00000000008a42da in btr_search_drop_page_hash_index (block=0x2aab053b3070) at /s/bld/563inno/storage/innobase/btr/btr0sea.c:1171
#5  0x00000000008bbcf6 in buf_LRU_free_block (bpage=0x2aab053b3070, zip=1) at /s/bld/563inno/storage/innobase/buf/buf0lru.c:1604
#6  0x00000000008bc7eb in buf_LRU_search_and_free_block (buf_pool=0xf5d1568, n_iterations=<value optimized out>) at /s/bld/563inno/storage/innobase/buf/buf0lru.c:633
#7  0x00000000008bcb54 in buf_LRU_get_free_block (buf_pool=0xf5d1568) at /s/bld/563inno/storage/innobase/buf/buf0lru.c:900
#8  0x00000000008ab16e in buf_block_alloc (buf_pool=0x32) at /s/bld/563inno/storage/innobase/buf/buf0buf.c:442
#9  0x00000000008a3207 in btr_search_check_free_space_in_heap () at /s/bld/563inno/storage/innobase/btr/btr0sea.c:154
#10 0x00000000008a8751 in btr_search_info_update_slow (info=0x2aaf7401ddf0, cursor=0x2aaf73022b88) at /s/bld/563inno/storage/innobase/btr/btr0sea.c:628
#11 0x000000000089b5b1 in btr_cur_search_to_nth_level (index=0x2aaf7401dae8, level=0, tuple=0x2aaf73027248, mode=2, latch_mode=<value optimized out>, cursor=0x2aaf73022b88, has_search_latch=0, file=0xa4c740 "/s/bld/563
inno/storage/innobase/row/row0sel.c", line=4023, mtr=0x4b8b6030) at /s/bld/563inno/storage/innobase/include/btr0sea.ic:83
#12 0x0000000000860178 in row_search_for_mysql (buf=0x2aaf73022730 "", mode=2, prebuilt=0x2aaf73026e58, match_mode=1, direction=0) at /s/bld/563inno/storage/innobase/include/btr0pcur.ic:516
#13 0x00000000007f2955 in ha_innobase::index_read (this=0x2aaf73022340, buf=0x2aaf73022730 "", key_ptr=<value optimized out>, key_len=<value optimized out>, find_flag=<value optimized out>) at /s/bld/563inno/storage/in
nobase/handler/ha_innodb.cc:6245
#14 0x000000000060f699 in Sql_cmd_handler_read::execute (this=0x2aaf72fe3a00, thd=0x2aaf900dc9c0) at /s/bld/563inno/sql/sql_handler.cc:762
#15 0x0000000000629a7e in mysql_execute_command (thd=0x2aaf900dc9c0) at /s/bld/563inno/sql/sql_parse.cc:4548
#16 0x000000000062debb in mysql_parse (thd=0x2aaf900dc9c0, rawbuf=<value optimized out>, length=<value optimized out>, parser_state=0x4b8b7e00) at /s/bld/563inno/sql/sql_parse.cc:5741
#17 0x000000000062f507 in dispatch_command (command=COM_QUERY, thd=0x2aaf900dc9c0, packet=<value optimized out>, packet_length=42) at /s/bld/563inno/sql/sql_parse.cc:1181
#18 0x000000000062f8b4 in do_command (thd=0x2aaf900dc9c0) at /s/bld/563inno/sql/sql_parse.cc:903
#19 0x00000000006077bf in do_handle_one_connection (thd_arg=0x2aaf900dc9c0) at /s/bld/563inno/sql/sql_connect.cc:787
#20 0x0000000000607869 in handle_one_connection (arg=0x32) at /s/bld/563inno/sql/sql_connect.cc:706
#21 0x0000003cc7c062f7 in start_thread () from /lib64/libpthread.so.0
#22 0x0000003cc74d1e3d in clone () from /lib64/libc.so.6
[24 Dec 2013 15:55] Sveta Smirnova
Bug #68961 was marked as duplicate of this one. Need to check Bug #68961 after this bug is fixed.
[15 May 2015 12:52] Laurynas Biveinis
Bug 62018 fix for 5.7.7

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug62018.patch (application/octet-stream, text), 72.78 KiB.

[15 May 2015 12:53] Laurynas Biveinis
Patch author Alexey Kopytov
[16 May 2015 8:05] Laurynas Biveinis
Bug 62018 fix for 5.7.7, v2

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug62018.patch (application/octet-stream, text), 72.96 KiB.

[16 May 2015 8:18] Laurynas Biveinis
The previous submission had a draft version of the patch uploaded by mistake, thus re-uploaded. The difference is a few obvious typos and warning fixes.
[13 Mar 2017 16:27] Dimitri KRAVTCHUK
A similar solution to proposed by Percona was implemented in MySQL 5.7 (AHI parts). 
however, both 5.7 and Percona solutions are only in a half-step from real fixing of this AHI problem..
a true fix is yet in pipe..

Rgds,
-Dimitri