Description:
When a query contains an ORDER BY clause with a small LIMIT value, MySQL optimizer triggers a "low_limit" optimization mode and recheck index selection.If the current index doesn't support the required sorting or is an index merge, the optimizer selects an index that supports the sorting keys. However, this process fails to adequately consider the filtering efficiency of the WHERE conditions. As a result, the ultimately selected index, while supporting the sorting requirement, often leads to scanning a large number of unnecessary rows. This is clearly a defect in the optimizer that needs to be addressed.
table design:
CREATE TABLE `t1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`a` tinyint(4) DEFAULT NULL,
`b` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_b` (`b`)
) ENGINE=InnoDB AUTO_INCREMENT=500001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
mysql> explain select * from `t1` where b = '2025-03-03'and id > 100 and a=1 order by id limit 1;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | range | PRIMARY,idx_b | PRIMARY | 8 | NULL | 249789 | 1.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> select * from`t1` where b = '2025-03-03'and id > 100 and a=1 order by id limit 1;
+--------+------+------------+
| id | a | b |
+--------+------+------------+
| 479751 | 1 | 2025-03-03 |
+--------+------+------------+
1 row in set (20.78 sec)
mysql> select * from`t1` force index(idx_b) where b = '2025-03-03'and id > 100 and a=1 order by id limit 1;
+--------+------+------------+
| id | a | b |
+--------+------+------------+
| 479751 | 1 | 2025-03-03 |
+--------+------+------------+
1 row in set (0.00 sec)
Obviously, the optimizer chose the wrong primary key index, resulting in scanning a large number of unnecessary rows.Below is a partial summary of the Optimizer_trace. We can see that it's because the low_limit was triggered, and then the optimizer re-selected an index that supports sorting. In this example, the only index that supports sorting is the primary key.
{
"attaching_conditions_to_tables": {
"original_condition": "((`t1`.`a` = 1) and (`t1`.`b` = '2025-03-03') and (`t1`.`id` > 100))",
"attached_conditions_computation": [
{
"table": "`t1`",
"rechecking_index_usage": {
"recheck_reason": "low_limit",
"limit": 1,
"row_estimate": 126.2,
"range_analysis": {
"table_scan": {
"rows": 499579,
"cost": 599497
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": true,
"key_parts": [
"id"
]
},
{
"index": "idx_b",
"usable": false,
"cause": "not_applicable"
}
],
"setup_range_conditions": [
],
"group_index_range": {
"chosen": false,
"cause": "not_group_by_or_distinct"
},
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "PRIMARY",
"ranges": [
"100 < id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 249789,
"cost": 50203,
"chosen": true
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "PRIMARY",
"rows": 249789,
"ranges": [
"100 < id"
]
},
"rows_for_plan": 249789,
"cost_for_plan": 50203,
"chosen": true
}
}
}
}
],
"attached_conditions_summary": [
{
"table": "`t1`",
"attached": "((`t1`.`a` = 1) and (`t1`.`b` = '2025-03-03') and (`t1`.`id` > 100))"
}
]
}
},
How to repeat:
We provide a Python script for creating tables and importing data. You only need to specify parameters in the terminal. The format can be referenced in the following statement.
python3 /path/to/bug_report_data,py --host 127.0.0.1 --port 3306 --user root --password 123456 --db test
import mysql.connector
from datetime import datetime, timedelta
import random
import argparse
def main():
# Parse command line arguments
parser = argparse.ArgumentParser(description='Generate and insert test data into MySQL database')
parser.add_argument('--port', type=str, required=True, help='MySQL port')
parser.add_argument('--host', type=str, required=True, help='MySQL host')
parser.add_argument('--db', type=str, required=True, help='MySQL database name')
parser.add_argument('--user', type=str, required=True, help='MySQL username')
parser.add_argument('--password', type=str, required=True, help='MySQL password')
parser.add_argument('--records', type=int, default=500000, help='Total records to generate (default: 500000)')
args = parser.parse_args()
# Database connection configuration
config = {
'user': args.user,
'password': args.password,
'port': args.port,
'host': args.host,
'database': args.db,
'raise_on_warnings': True
}
# Data generation configuration
total_records = args.records
batch_size = 1000
try:
# Connect to database
print(f"Connecting to MySQL database at {args.host}:{args.port}...")
conn = mysql.connector.connect(**config)
cursor = conn.cursor()
# Create table if not exists
print("Creating table if it doesn't exist...")
create_table_sql = """
CREATE TABLE IF NOT EXISTS t1 (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
a TINYINT,
b DATE,
INDEX idx_b (b)
)
"""
cursor.execute(create_table_sql)
conn.commit()
# Generate date range
print("Generating dates...")
start_date = datetime(2024, 8, 25)
end_date = datetime(2025, 3, 10)
delta = end_date - start_date
all_dates = [start_date + timedelta(days=i) for i in range(delta.days + 1)]
date_strs = [d.strftime("%Y-%m-%d") for d in all_dates]
# Calculate records per day to distribute evenly
days_count = len(all_dates)
records_per_day = total_records // days_count
remaining_records = total_records % days_count
# Prepare insert statement
insert_sql = """
INSERT INTO t1 (
a, b
) VALUES (%s, %s)
"""
# Generate and insert data in batches
print("Starting data insertion...")
for i in range(0, total_records, batch_size):
# Generate batch data with sequential dates
data = []
for j in range(min(batch_size, total_records - i)):
idx = i + j
# Calculate which day this record belongs to
day_idx = idx // records_per_day
if day_idx >= days_count: # Handle remaining records
day_idx = days_count - 1
# Date field (sequential distribution) - now named b
b = date_strs[day_idx]
# Valid field - now named a
a = random.choices([0, 1], weights=[0.1, 0.9])[0]
data.append((a, b))
# Insert batch
cursor.executemany(insert_sql, data)
conn.commit()
print(f"Inserted {i + len(data)}/{total_records} records ({(i + len(data))/total_records*100:.1f}%)...")
except mysql.connector.Error as err:
print("Error:", err)
finally:
if 'conn' in locals() and conn.is_connected():
cursor.close()
conn.close()
print("Database connection closed.")
print("Data insertion completed.")
if __name__ == "__main__":
main()
Suggested fix:
After the "low_limit" is triggered, the optimizer reconsiders index selection by simply choosing the lowest-cost sorting index from those that support sorting, without considering filtering costs. This approach seems somewhat unreasonable. Although MySQL 5.7.33 and above provides the optimizer switch "prefer_ordering_index," and disabling it can solve the index selection issue in this case, doing so might cause incorrect index selections for other SQL queries. Therefore, we believe the best solution would be to consider both sorting capability and filtering costs when reconsideration is triggered by "low_limit."