Bug #100512 adaptive hash index(AHI) building causing contention on btr_search_latches
Submitted: 13 Aug 9:17 Modified: 25 Aug 6:38
Reporter: Zhou Xinjing from CDB Team of Tencent (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.7.31,8.0.21 OS:Linux (3.10.107)
Assigned to: CPU Architecture:x86 (96 cores)
Tags: performance

[13 Aug 9:17] Zhou Xinjing from CDB Team of Tencent
Description:
We were testing MySQL 5.7.31 with sysbench 1.0 oltp_read_only to find out how much the adaptive hash index is helping with performance. We found that when turning on the adaptive hash index and with high concurrency, there is big contention on the btr_search_latches for building hash index for btree pages. 

The database we used contains a single 500M-rows sysbench table(~100GB). The buffer pool is configured with 200GB. We first pre-warm the buffer pool by executing the sysbench 1.0 oltp_read_only workload with 256 threads against the table until it is fully cached. Then we turn on the innodb_adaptive_hash_index option and watch how the system perform. 
To our surprise, the QPS starts normal(similar to the case without AHI) and quickly drops to bottom(2M qps => 80K qps). And slowly the performance increases after 400s and reaches the original ~2M after 650s.

So we took a stacktrace of the system using pt-pmp. The hotspot stacktrace is as follows:

__lll_lock_wait (/lib64/libpthread.so.0), _L_lock_812 (/lib64/libpthread.so.0), pthread_mutex_lock (from /lib64/libpthread.so.0), enter (storage/innobase/include/sync0types.h:483), reset (storage/innobase/include/sync0types.h:483), os_event_reset (storage/innobase/os/os0event.cc:576), sync_array_reserve_cell (storage/innobase/sync/sync0arr.cc:397), sync_array_get_and_reserve_cell(storage/innobase/include/sync0arr.ic:84), rw_lock_x_lock_func(storage/innobase/sync/sync0rw.cc:787), pfs_rw_lock_x_lock_func (storage/innobase/include/sync0rw.ic:719), btr_search_x_lock (storage/innobase/include/btr0sea.ic:98), btr_search_build_page_hash_index (storage/innobase/btr/btr0sea.cc:1565), btr_search_info_update_slow (storage/innobase/btr/btr0sea.cc:775), btr_search_info_update(storage/innobase/include/btr0sea.ic:89), btr_cur_search_to_nth_level (storage/innobase/btr/btr0cur.cc:1947), btr_pcur_open_with_no_init_func (storage/innobase/include/btr0pcur.ic:530), row_search_mvcc (storage/innobase/row/row0sel.cc:5167), ha_innobase::index_read (storage/innobase/handler/ha_innodb.cc:8766) ...

So basically the threads on btree search code path are competing for on the btr_search_latches because they decide to build hash index for the page just searched. Since this is a exclusive latch, only one of the thread is allowed to be operating on the AHI hash table while the rest are just waiting and wasting CPU cycles. 

The performance trend is visualized in this picture: 
https://4mrnuw.sn.files.1drv.com/y4mEdb6OmvQnLmooeAYXXKOq2SfMlvOIevqwK8ZP8Ntg8t57BIHf71I2A...

How to repeat:
As above.

Suggested fix:
We observe that in btr_cur_search_to_nth_level, only when the btr_search_latches is not x-latched does the system consider performing an adaptive hash index search. (storage/innobase/btr/btr0cur.cc:939) Because the benefit of hash search might be offset by the time it takes to wait for the writer to release the latch. We might be better off using the btree search instead. 

Inspired by this, we can actually apply the same idea to the contention problem described above. In btr_search_info_update_slow, if the reader decides to build the hash index for the page, we postpone the building if the btr_seach_latches is x-latched by a writer, thereby avoiding the contention on this critical search path. This make senses as we will be better off falling back to the btree search instead of busy waiting for the x-latch to be released. And the hash index will eventually be built if the page is frequently accessed latter(which is worth the effort).
[13 Aug 9:19] Zhou Xinjing from CDB Team of Tencent
We implemented the suggested fix described above. 

Here is the visualization after the fix:

https://4crnuw.sn.files.1drv.com/y4meIDK_67rBRqTVk6mPEoFJwVEWgJgTCdxX2wKavPNbJCziNTao-J4aK...
[13 Aug 9:23] Zhou Xinjing from CDB Team of Tencent
This patch implements the idea described above

Attachment: ahi_build_contention_avoidance.patch (application/octet-stream, text), 870 bytes.

[13 Aug 12:40] MySQL Verification Team
Hi Mr. Zhou Xinjing from CDB Team of Tencent,

Thank you for your performance improvement request.

What you experience is very useful to us. We do need some further feedback from you. First of all, do you get the same result with the latest 8.0 ????

Next, we are very grateful for the patch, but we can't accept it until you sign our OCA agreement. Hence, please read carefully the next comment.
[13 Aug 12:40] MySQL Verification Team
Thank you very much for your patch contribution, we appreciate it!

In order for us to continue the process of reviewing your contribution to MySQL, please send us a signed copy of the Oracle Contributor Agreement (OCA) as outlined in http://www.oracle.com/technetwork/community/oca-486395.html

Signing an OCA needs to be done only once and it's valid for all other Oracle governed Open Source projects as well.

Getting a signed/approved OCA on file will help us facilitate your contribution - this one, and others in the future.  

Please let me know, if you have any questions.

Thank you for your interest in MySQL.
[13 Aug 14:20] Zhou Xinjing from CDB Team of Tencent
We will verify if it is affecting latest 8.0 branch as well shortly.
[13 Aug 14:38] MySQL Verification Team
Hi Mr. Zhou Xinjing from CDB Team of Tencent,

I am checking regarding your Team's OCA.

Thanks in advance regarding 8.0.
[14 Aug 8:40] Zhou Xinjing from CDB Team of Tencent
Update on repeating this problem on the latest 8.0.21 branch(from GitHub):

Our machine is running Linux kernel 3.10.107 which is a relatively old one. So we encountered the same scalability issue related to ppoll described here: https://bugs.mysql.com/bug.php?id=98284

In order to remove this bottleneck so that we can observe the AHI's behavior more clearly, we modified the source code to replace ppoll with poll. The system then works expectedly after the modification.

We then conducted the same experiment on the modified version. We observed similar contention on btr_search_latches. So we think the problem is affecting both 5.7 and 8.0.
We then applied the patch and it works as well.
[14 Aug 8:43] Zhou Xinjing from CDB Team of Tencent
Visualization of the experiment on 8.0.21

Attachment: 8.0.21, oltp_read_only, 200GB BP, 100GB table, 256 users.pdf (application/pdf, text), 62.49 KiB.

[14 Aug 8:45] Zhou Xinjing from CDB Team of Tencent
Visualization of the experiment on 8.0.21.

Attachment: vis.pdf (application/pdf, text), 62.49 KiB.

[14 Aug 13:16] MySQL Verification Team
Thank you Mr.  Zhou Xinjing from CDB Team of Tencent,

Now we know that 8.0 version is affected too.

We also know that your patch works for 8.0 as well.

We are very grateful.
[24 Aug 12:59] MySQL Verification Team
Hi Mr. Zhou Xinjing from CDB Team of Tencent,

We have just been informed that you have not been listed as contributing reporter in your company. Hence, this is a problem that is internal to your company. This is also a reason why our records do not show that you are an OCA contributor.

We are eagerly waiting that this problem is resolved on your side.

Thanks in advance.
[25 Aug 6:38] Zhou Xinjing from CDB Team of Tencent
Just got an update from my team that I am now covered by the Company OCA. I think the issue is resolved.
[25 Aug 12:13] MySQL Verification Team
Thank you.

That solves this issue.