| Bug #115843 | The incorrect calculation of 'select_limit' led to an incorrect index selection | ||
|---|---|---|---|
| Submitted: | 15 Aug 2024 8:26 | Modified: | 21 Aug 2024 8:04 |
| Reporter: | ksql- team | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
| Version: | 8.0.32, 8.0.39 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | Optimizer bug, order by | ||
[16 Aug 2024 7:56]
MySQL Verification Team
Hello! Thank you for the report and detailed steps to reproduce. Verified as described. Sincerely, Umesh
[16 Aug 2024 8:16]
MySQL Verification Team
8.4.2 seems to be Ok. bin/mysql -uroot -S/tmp/mysql.sock Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7 Server version: 8.4.2 MySQL Community Server - GPL Copyright (c) 2000, 2024, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database test; Query OK, 1 row affected (0.00 sec) mysql> use test Database changed mysql> explain SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1; +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+ | 1 | SIMPLE | test1 | NULL | range | idx_a_b_c,idx_a_c | idx_a_c | 5 | NULL | 1000 | 100.00 | Using index condition; Backward index scan | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1; +------+------+------+------+------+ | id | a | b | c | d | +------+------+------+------+------+ | 1000 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 1 row in set (0.00 sec) mysql> explain SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1; +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+ | 1 | SIMPLE | test2 | NULL | range | idx_a_b_c,idx_a_c | idx_a_c | 5 | NULL | 1000 | 100.00 | Using index condition; Backward index scan | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+ 1 row in set, 1 warning (0.01 sec) mysql> SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1; +------+------+------+------+------+ | id | a | b | c | d | +------+------+------+------+------+ | 1000 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 1 row in set (0.00 sec) mysql>
[21 Aug 2024 8:04]
ksql- team
Hello! Thank you for your prompt response and for testing the MySQL 8.4.2 version. However, I have encountered a discrepancy that I would like to bring to your attention. I downloaded the MySQL source code from the official GitHub repository and checked out the mysql-8.4.2 tag. After compiling the source code in my local environment on CentOS 7 with GCC version 11.2.1, I attempted to reproduce the steps and queries you provided in your response. Unfortunately, the results I obtained were not consistent with what you observed. Specifically, the issue I initially reported still appears to be present in 8.4.2. To better understand the situation, could you please provide details on the testing environment and the compilation options you used during your testing? This information would be greatly helpful for me to align my setup and further investigate the issue. Thank you again for your assistance, and I look forward to your guidance. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.4.2-debug Source distribution Copyright (c) 2000, 2024, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> explain SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1; +----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+ | 1 | SIMPLE | test1 | NULL | index | idx_a_b_c,idx_a_c | idx_c | 5 | NULL | 99 | 1.01 | Using where; Backward index scan | +----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1; +------+------+------+------+------+ | id | a | b | c | d | +------+------+------+------+------+ | 1000 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 1 row in set (25.36 sec) mysql> explain SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1; +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+ | 1 | SIMPLE | test2 | NULL | range | idx_a_b_c,idx_a_c | idx_a_c | 5 | NULL | 1000 | 100.00 | Using where; Backward index scan | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+ 1 row in set, 1 warning (0.01 sec) mysql> SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1; +------+------+------+------+------+ | id | a | b | c | d | +------+------+------+------+------+ | 1000 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 1 row in set (0.02 sec)

Description: When considering the ORDER clause, the optimizer will call the function test_if_cheaper_ordering to loop through all indexes to find the one with the lowest cost. The key variable in cost calculation is "select_limit". However, since "select_limit" isn't reset with each loop iteration, its value gradually inflates. This inflation causes the "select_limit" for later indexes to be incorrectly amplified, leading to the selection of the wrong index. source code: bool test_if_cheaper_ordering(...) { ... for (nr = 0; nr < table->s->keys; nr++) { ... if (select_limit > refkey_rows_estimate) select_limit = table_records; else if (table_records >= refkey_rows_estimate) select_limit = (ha_rows)(select_limit * (double)table_records / refkey_rows_estimate); ... } A simple example: We create two tables, with the only difference being the order of the indexes. In table test1, the index idx_c is second, while in table test2, the index idx_c is last. We then insert the same data into both tables. CREATE TABLE `test1` ( `id` int NOT NULL AUTO_INCREMENT, `a` int DEFAULT NULL, `b` int DEFAULT NULL, `c` int DEFAULT NULL, `d` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_c` (`c`), KEY `idx_a_b_c` (`a`,`b`,`c`), KEY `idx_a_c` (`a`,`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci CREATE TABLE `test2` ( `id` int NOT NULL AUTO_INCREMENT, `a` int DEFAULT NULL, `b` int DEFAULT NULL, `c` int DEFAULT NULL, `d` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_a_b_c` (`a`,`b`,`c`), KEY `idx_a_c` (`a`,`c`), KEY `idx_c` (`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci Execute the same query on both tables. table test1: mysql> explain SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1; +----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+ | 1 | SIMPLE | test1 | NULL | index | idx_a_b_c,idx_a_c | idx_c | 5 | NULL | 100 | 1.00 | Using where; Backward index scan | +----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+ 1 row in set, 1 warning (0.00 sec) table test2: mysql> SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1; +------+------+------+------+------+ | id | a | b | c | d | +------+------+------+------+------+ | 1000 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 1 row in set (28.46 sec) mysql> explain SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1; +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+ | 1 | SIMPLE | test2 | NULL | range | idx_a_b_c,idx_a_c | idx_a_c | 5 | NULL | 1000 | 100.00 | Using where; Backward index scan | +----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+ 1 row in set, 1 warning (0.01 sec) mysql> SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1; +------+------+------+------+------+ | id | a | b | c | d | +------+------+------+------+------+ | 1000 | 1 | 1 | 1 | 1 | +------+------+------+------+------+ 1 row in set (0.01 sec) In table test1, the optimizer incorrectly selected the index idx_c, causing the query time to increase by several thousand times! The issue arises because idx_c is ordered before idx_a_c in table test1, so the "select_limit" calculation for idx_c carries over into subsequent iterations. This incorrectly inflates the cost calculation for idx_a_c, leading to the wrong index selection. How to repeat: We provided a Python script to repeat the issue.You only need to modify the corresponding MySQL credentials. import mysql.connector from mysql.connector import Error def create_connection(host_name, user_name, user_password, db_name, port): connection = None try: connection = mysql.connector.connect( host=host_name, user=user_name, passwd=user_password, database=db_name, port=port ) print("MySQL Database connection successful") except Error as e: print(f"The error '{e}' occurred") return connection def execute_query(connection, query, description): cursor = connection.cursor() try: cursor.execute(query) connection.commit() print(f"{description} created successfully") except Error as e: print(f"The error '{e}' occurred") def execute_query_with_data(connection, query, data): cursor = connection.cursor() try: cursor.executemany(query, data) connection.commit() except Error as e: print(f"The error '{e}' occurred") # Replace these variables with your MySQL credentials host_name = "<your_host>" user_name = "<your_username>" user_password = "<your_password>" db_name = "<your_database>" port = <your_port> # Create a connection to the database connection = create_connection(host_name, user_name, user_password, db_name, port) # SQL query to create the first table test1 create_table_query_1 = """ CREATE TABLE `test1` ( `id` int NOT NULL AUTO_INCREMENT, `a` int DEFAULT NULL, `b` int DEFAULT NULL, `c` int DEFAULT NULL, `d` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_c` (`c`), KEY `idx_a_b_c` (`a`,`b`,`c`), KEY `idx_a_c` (`a`,`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci """ # SQL query to create the second table test2 with reversed index order create_table_query_2 = """ CREATE TABLE `test2` ( `id` int NOT NULL AUTO_INCREMENT, `a` int DEFAULT NULL, `b` int DEFAULT NULL, `c` int DEFAULT NULL, `d` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_a_b_c` (`a`,`b`,`c`), KEY `idx_a_c` (`a`,`c`), KEY `idx_c` (`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci """ # Execute the query to create the tables with custom descriptions execute_query(connection, create_table_query_1, "table test1") execute_query(connection, create_table_query_2, "table test2") # SQL query to insert data into the tables insert_data_query = """ INSERT INTO {} (a, b, c, d) VALUES (%s, %s, %s, %s) """ # Data generation logic user_data = [] print("Data insertion process started...") for i in range(1, 100001): value = (i - 1) // 1000 + 1 # The same value for a, b, c, d user_data.append((value, value, value, value)) # Insert in batches of 10,000 to avoid memory issues if i % 10000 == 0: execute_query_with_data(connection, insert_data_query.format('test1'), user_data) execute_query_with_data(connection, insert_data_query.format('test2'), user_data) user_data = [] # Insert any remaining data if user_data: execute_query_with_data(connection, insert_data_query.format('test1'), user_data) execute_query_with_data(connection, insert_data_query.format('test2'), user_data) print("Data inserted successfully") # Close the connection connection.close() Suggested fix: Store the original value, and then replace select_limit with this value at the start of each loop. eg. bool test_if_cheaper_ordering(...) { ... ha_rows origin_select_limit = select_limit; ... for (nr = 0; nr < table->s->keys; nr++) { select_limit = origin_select_limit; ... }