Description:
# Description
The MySQL optimizer fails to treat a column as "constant" for ORDER BY simplification purposes when that column is filtered using `IS NULL` in the `WHERE` clause.
According to the MySQL Manual:
1. `NULL` values are not strictly comparable. (https://dev.mysql.com/doc/refman/8.4/en/working-with-null.html)
> “Two `NULL` values are regarded as equal in a `GROUP BY`.
When doing an `ORDER BY`, `NULL` values are presented first if you do `ORDER BY ... ASC` and last if you do `ORDER BY ... DESC`.”
2. https://dev.mysql.com/doc/refman/8.4/en/order-by-optimization.html:
> "all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause."
However, the optimizer currently only seems to recognize equality operators (e.g., `col = 5`) as constants, failing to recognize that `col IS NULL` also guarantees a single value (`NULL`) for sorting purposes.
# Behavior
When running a query with: `WHERE col_a = [CONST] AND col_b IS NULL ORDER BY col_a, col_b` MySQL resorts to a `filesort` even if a covering index `(col_a, col_b)` exists. If `col_b` is removed from the `ORDER BY`, the optimizer correctly uses the index (Range scan / Using index).
# Analysis
An optimization trace analysis provided the conclusive evidence. For the example below provided in the "How to repeat" section, if we collected and inspected the optimizer trace for the filesort query, we will see the following:
```json
"simplifying_order_by": {
"original_clause": "`issue_repro`.`category_id`,`issue_repro`.`nullable_date`,`issue_repro`.`item_name`",
"items": [
{
"item": "`issue_repro`.`category_id`",
"equals_constant_in_where": true
},
{
"item": "`issue_repro`.`nullable_date`"
},
{
"item": "`issue_repro`.`item_name`"
}
],
"resulting_clause_is_simple": true,
"resulting_clause": "`issue_repro`.`nullable_date`,`issue_repro`.`item_name`"
}
```
It should have marked `nullable_date` with `"equals_constant_in_where": true` too but it did not.
How to repeat:
-- 1. Create the database and use it (since you didn't have one selected)
CREATE DATABASE IF NOT EXISTS bug_test;
USE bug_test;
-- 2. Create a table with a nullable column
CREATE TABLE `issue_repro` (
`id` bigint NOT NULL AUTO_INCREMENT,
`category_id` bigint NOT NULL,
`nullable_date` datetime DEFAULT NULL,
`item_name` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_cover` (`category_id`,`nullable_date`,`item_name`)
) ENGINE=InnoDB;
-- 3. Insert generic dummy data
INSERT INTO `issue_repro` (`category_id`, `nullable_date`, `item_name`) VALUES
(100, NULL, 'Item A'),
(100, NULL, 'Item B'),
(100, '2024-01-01 00:00:00', 'Item C'),
(100, NULL, 'Item D'),
(200, NULL, 'Item E');
-- 4. Update statistics
ANALYZE TABLE `issue_repro`;
-- 5. REPRODUCTION QUERY
-- We filter by category_id (constant) and nullable_date (IS NULL).
-- We ORDER BY the exact index definition.
-- EXPECTED: Using index (Range scan)
-- ACTUAL: Using where; Using filesort
EXPLAIN SELECT id, item_name
FROM issue_repro
WHERE category_id = 100
AND nullable_date IS NULL
ORDER BY category_id, nullable_date, item_name;
-- 6. CONTROL TEST
-- Removing the IS NULL column from ORDER BY fixes the plan,
EXPLAIN SELECT id, item_name
FROM issue_repro
WHERE category_id = 100
AND nullable_date IS NULL
ORDER BY category_id, item_name;
Suggested fix:
Update the optimizer's `ORDER BY` simplification logic to recognize `col IS NULL` in the `WHERE` clause as a constant.