Bug #115843 The incorrect calculation of 'select_limit' led to an incorrect index selection
Submitted: 15 Aug 2024 8:26 Modified: 21 Aug 2024 8:04
Reporter: ksql- team Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.32, 8.0.39 OS:Any
Assigned to: CPU Architecture:Any
Tags: Optimizer bug, order by

[15 Aug 2024 8:26] ksql- team
Description:
When considering the ORDER clause, the optimizer will call the function test_if_cheaper_ordering to loop through all indexes to find the one with the lowest cost.

The key variable in cost calculation is "select_limit". However, since "select_limit" isn't reset with each loop iteration, its value gradually inflates. This inflation causes the "select_limit" for later indexes to be incorrectly amplified, leading to the selection of the wrong index.

source code:

bool test_if_cheaper_ordering(...) {
  ...

  for (nr = 0; nr < table->s->keys; nr++) {
      ...

      if (select_limit > refkey_rows_estimate)
          select_limit = table_records;
      else if (table_records >= refkey_rows_estimate)
          select_limit = (ha_rows)(select_limit * (double)table_records /
                                   refkey_rows_estimate);
      ...
}

A simple example:

We create two tables, with the only difference being the order of the indexes. In table test1, the index idx_c is second, while in table test2, the index idx_c is last. We then insert the same data into both tables.

CREATE TABLE `test1` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  `c` int DEFAULT NULL,
  `d` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_c` (`c`),
  KEY `idx_a_b_c` (`a`,`b`,`c`),
  KEY `idx_a_c` (`a`,`c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `test2` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  `c` int DEFAULT NULL,
  `d` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_a_b_c` (`a`,`b`,`c`),
  KEY `idx_a_c` (`a`,`c`),
  KEY `idx_c` (`c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Execute the same query on both tables.

table test1:

mysql> explain SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key   | key_len | ref  | rows | filtered | Extra                            |
+----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+
|  1 | SIMPLE      | test1 | NULL       | index | idx_a_b_c,idx_a_c | idx_c | 5       | NULL |  100 |     1.00 | Using where; Backward index scan |
+----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)

table test2:

mysql> SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+------+------+------+------+------+
| id   | a    | b    | c    | d    |
+------+------+------+------+------+
| 1000 |    1 |    1 |    1 |    1 |
+------+------+------+------+------+
1 row in set (28.46 sec)

mysql> explain SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key     | key_len | ref  | rows | filtered | Extra                            |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+
|  1 | SIMPLE      | test2 | NULL       | range | idx_a_b_c,idx_a_c | idx_a_c | 5       | NULL | 1000 |   100.00 | Using where; Backward index scan |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+
1 row in set, 1 warning (0.01 sec)

mysql> SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+------+------+------+------+------+
| id   | a    | b    | c    | d    |
+------+------+------+------+------+
| 1000 |    1 |    1 |    1 |    1 |
+------+------+------+------+------+
1 row in set (0.01 sec)

In table test1, the optimizer incorrectly selected the index idx_c, causing the query time to increase by several thousand times! The issue arises because idx_c is ordered before idx_a_c in table test1, so the "select_limit" calculation for idx_c carries over into subsequent iterations. This incorrectly inflates the cost calculation for idx_a_c, leading to the wrong index selection.

How to repeat:
We provided a Python script to repeat the issue.You only need to modify the corresponding MySQL credentials.

import mysql.connector
from mysql.connector import Error

def create_connection(host_name, user_name, user_password, db_name, port):
    connection = None
    try:
        connection = mysql.connector.connect(
            host=host_name,
            user=user_name,
            passwd=user_password,
            database=db_name,
            port=port
        )
        print("MySQL Database connection successful")
    except Error as e:
        print(f"The error '{e}' occurred")
    
    return connection

def execute_query(connection, query, description):
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        connection.commit()
        print(f"{description} created successfully")
    except Error as e:
        print(f"The error '{e}' occurred")

def execute_query_with_data(connection, query, data):
    cursor = connection.cursor()
    try:
        cursor.executemany(query, data)
        connection.commit()
    except Error as e:
        print(f"The error '{e}' occurred")

# Replace these variables with your MySQL credentials
host_name = "<your_host>"
user_name = "<your_username>"
user_password = "<your_password>"
db_name = "<your_database>"
port = <your_port> 

# Create a connection to the database
connection = create_connection(host_name, user_name, user_password, db_name, port)

# SQL query to create the first table test1
create_table_query_1 = """
CREATE TABLE `test1` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  `c` int DEFAULT NULL,
  `d` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_c` (`c`),
  KEY `idx_a_b_c` (`a`,`b`,`c`),
  KEY `idx_a_c` (`a`,`c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
"""

# SQL query to create the second table test2 with reversed index order
create_table_query_2 = """
CREATE TABLE `test2` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  `c` int DEFAULT NULL,
  `d` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_a_b_c` (`a`,`b`,`c`),
  KEY `idx_a_c` (`a`,`c`),
  KEY `idx_c` (`c`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
"""

# Execute the query to create the tables with custom descriptions
execute_query(connection, create_table_query_1, "table test1")
execute_query(connection, create_table_query_2, "table test2")

# SQL query to insert data into the tables
insert_data_query = """
INSERT INTO {} (a, b, c, d) VALUES (%s, %s, %s, %s)
"""

# Data generation logic
user_data = []
print("Data insertion process started...")
for i in range(1, 100001):
    value = (i - 1) // 1000 + 1  # The same value for a, b, c, d
    user_data.append((value, value, value, value))
    
    # Insert in batches of 10,000 to avoid memory issues
    if i % 10000 == 0:
        execute_query_with_data(connection, insert_data_query.format('test1'), user_data)
        execute_query_with_data(connection, insert_data_query.format('test2'), user_data)
        user_data = []

# Insert any remaining data
if user_data:
    execute_query_with_data(connection, insert_data_query.format('test1'), user_data)
    execute_query_with_data(connection, insert_data_query.format('test2'), user_data)

print("Data inserted successfully")

# Close the connection
connection.close()

Suggested fix:
Store the original value, and then replace select_limit with this value at the start of each loop.

eg.

bool test_if_cheaper_ordering(...) {
  ...

  ha_rows origin_select_limit = select_limit;
  ...

  for (nr = 0; nr < table->s->keys; nr++) {
    select_limit = origin_select_limit;
  ...
}
[16 Aug 2024 7:56] MySQL Verification Team
Hello!

Thank you for the report and detailed steps to reproduce.
Verified as described.

Sincerely,
Umesh
[16 Aug 2024 8:16] MySQL Verification Team
8.4.2 seems to be Ok. 

bin/mysql -uroot -S/tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.4.2 MySQL Community Server - GPL

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed

mysql> explain SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key     | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | test1 | NULL       | range | idx_a_b_c,idx_a_c | idx_a_c | 5       | NULL | 1000 |   100.00 | Using index condition; Backward index scan |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+------+------+------+------+------+
| id   | a    | b    | c    | d    |
+------+------+------+------+------+
| 1000 |    1 |    1 |    1 |    1 |
+------+------+------+------+------+
1 row in set (0.00 sec)

mysql> explain SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key     | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | test2 | NULL       | range | idx_a_b_c,idx_a_c | idx_a_c | 5       | NULL | 1000 |   100.00 | Using index condition; Backward index scan |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+--------------------------------------------+
1 row in set, 1 warning (0.01 sec)

mysql> SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+------+------+------+------+------+
| id   | a    | b    | c    | d    |
+------+------+------+------+------+
| 1000 |    1 |    1 |    1 |    1 |
+------+------+------+------+------+
1 row in set (0.00 sec)

mysql>
[21 Aug 2024 8:04] ksql- team
Hello!

Thank you for your prompt response and for testing the MySQL 8.4.2 version. However, I have encountered a discrepancy that I would like to bring to your attention.
I downloaded the MySQL source code from the official GitHub repository and checked out the mysql-8.4.2 tag. After compiling the source code in my local environment on CentOS 7 with GCC version 11.2.1, I attempted to reproduce the steps and queries you provided in your response.
Unfortunately, the results I obtained were not consistent with what you observed. Specifically, the issue I initially reported still appears to be present in 8.4.2.
To better understand the situation, could you please provide details on the testing environment and the compilation options you used during your testing? This information would be greatly helpful for me to align my setup and further investigate the issue.
Thank you again for your assistance, and I look forward to your guidance.

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.4.2-debug Source distribution

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> explain SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key   | key_len | ref  | rows | filtered | Extra                            |
+----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+
|  1 | SIMPLE      | test1 | NULL       | index | idx_a_b_c,idx_a_c | idx_c | 5       | NULL |   99 |     1.01 | Using where; Backward index scan |
+----+-------------+-------+------------+-------+-------------------+-------+---------+------+------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SELECT * FROM test1 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+------+------+------+------+------+
| id   | a    | b    | c    | d    |
+------+------+------+------+------+
| 1000 |    1 |    1 |    1 |    1 |
+------+------+------+------+------+
1 row in set (25.36 sec)

mysql> explain SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+
| id | select_type | table | partitions | type  | possible_keys     | key     | key_len | ref  | rows | filtered | Extra                            |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+
|  1 | SIMPLE      | test2 | NULL       | range | idx_a_b_c,idx_a_c | idx_a_c | 5       | NULL | 1000 |   100.00 | Using where; Backward index scan |
+----+-------------+-------+------------+-------+-------------------+---------+---------+------+------+----------+----------------------------------+
1 row in set, 1 warning (0.01 sec)

mysql> SELECT * FROM test2 WHERE a = 1 ORDER BY c DESC LIMIT 1;
+------+------+------+------+------+
| id   | a    | b    | c    | d    |
+------+------+------+------+------+
| 1000 |    1 |    1 |    1 |    1 |
+------+------+------+------+------+
1 row in set (0.02 sec)