MySQL Bugs: #109214: AHI cannot be created at the secondary index due to the optimizer estimations

Bug #109214	AHI cannot be created at the secondary index due to the optimizer estimations
Submitted:	27 Nov 2022 15:48	Modified:	29 Nov 2022 3:48
Reporter:	Zike Wang	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	5.7.40	OS:	CentOS (3.10.0)
Assigned to:		CPU Architecture:	x86 (32 cores)
Tags:	performance

Description:
When we tested the performance improvement of AHI using sysbench 1.1.0 select_random_points in MySQL 5.7.40, we found that hash indexes could not be created for non-unique secondary indexes, which prevented btr search from being accelerated with AHI.

We use a database containing a table of 1 million rows and test a simple query select c from sbtest1 where k in (...) with the buffer pool fully warmed up. The table contains a clustered index and a non-unique secondary index k_1 on column k.

We used 64 threads for performance testing, expecting AHI to be created on both secondary index k_1 and clustered index, but to our surprise, looking at the status of the hash tables with the show engine innodb status command, we found that only the clustered index had AHI created and used for queries.

So we carefully tracked the update process of btr search info on the secondary index k_1. We know that when we create a hash index for a certain query condition on the index, we need to satisfy a certain query mode <fields, bytes, left_side> that occurs consecutively for a certain number of times. InnoDB uses btr_cur_search_to_nth_level to locate leaf nodes in B+ Tree, and then updates btr search info, so the search mode in btr_cur_search_to_nth_level will affect AHI statistics.

For the preceding query statement, btr_cur_search_to_nth_level will be called four times, three of which occur on the secondary index k_1 and one on the clustered index. Let's analyze the first three processes. First, the optimizer estimates the row range, calls the InnoDB interface and uses PAGE_CUR_GE and PAGE_CUR_G search mode to locate, and then searches for records in InnoDB using PAGE_CUR_GE mode to locate in btr_cur_search_to_nth_level. From the perspective of AHI, left_side has changed three times. It is therefore not possible to reach the threshold for establishing an AHI.

The root cause of this problem is that the update of btr search info cannot identify whether the query condition comes from the semantics of the user's query statement or from other places. There are many callers of btr_cur_search_to_nth_level, such as in this example, from multi range analysis, and the corresponding function is btr_estimate_n_rows_in_range.

How to repeat:
As above.

Suggested fix:
We observed that there are two interactions with AHI in btr_cur_search_to_nth_level, one is query via btr_search_guess_on_hash, and the other is update via btr_search_info_update. When using AHI for query, strict conditions need to be met, one of which excludes calls from optimizer estimation.

Inspired by this condition, we can also add the same judgment to the update condition to filter non-user query condition semantics. This patch implements the idea described above.

diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
index da62273..bfabf5d 100644
--- a/storage/innobase/btr/btr0cur.cc
+++ b/storage/innobase/btr/btr0cur.cc
@@ -1914,7 +1914,7 @@ need_opposite_intention:
                will properly check btr_search_enabled again in
                btr_search_build_page_hash_index() before building a
                page hash index, while holding search latch. */
-               if (btr_search_enabled && !index->disable_ahi) {
+               if (btr_search_enabled && !index->disable_ahi && !estimate) {
                        btr_search_info_update(index, cursor);
                }
                ut_ad(cursor->up_match != ULINT_UNDEFINED

It should be noted that in order to make the AHI adaptive process more accurately match the SQL's real query semantics, more filter conditions should be added here. This patch only addresses the issues that arise in the preceding scenarios.

Hi Mr. Wang,

Thank you for your bug report.

However, AHI can be created only on the primary index. That is how this algorithm has been designed and how it is implemented in the InnoDB storage engine, since 2001. In short, that is how it will stay.

Clustered index is more then enough, since AHI is the best algorithm for query cacheing and not for speeding up the index searches.

Not a bug.

Thank you very much for the verification and reply.

The current AHI implementation, by default, is enabled for all indexes and is controlled by only one switch. Therefore, after AHI is enabled, btr search info will be maintained on each index, and btr search tables will also be maintained if the pattern is satisfied in the secondary index.

According to the reply, if we only consider the effect on the clustered index in the design goal, should we add an additional switch to only enable for clustered index to reduce the maintenance cost and memory overhead on the secondary indexes. We have done some verification and it is effective.