Bug #119442 Optimizer fails to simplify ORDER BY when column is filtered by IS NULL, causing filesort
Submitted: 21 Nov 20:13 Modified: 27 Nov 4:52
Reporter: Kishore Kumar (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.4.4 OS:Any
Assigned to: CPU Architecture:Any
Tags: filesort, INDEX, IS NULL, Optimizer, order by

[21 Nov 20:13] Kishore Kumar
Description:
# Description
The MySQL optimizer fails to treat a column as "constant" for ORDER BY simplification purposes when that column is filtered using `IS NULL` in the `WHERE` clause.

According to the MySQL Manual:

1. `NULL` values are not strictly comparable. (https://dev.mysql.com/doc/refman/8.4/en/working-with-null.html)

> “Two `NULL` values are regarded as equal in a `GROUP BY`.
When doing an `ORDER BY`, `NULL` values are presented first if you do `ORDER BY ... ASC` and last if you do `ORDER BY ... DESC`.”

2. https://dev.mysql.com/doc/refman/8.4/en/order-by-optimization.html:

> "all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause." 

However, the optimizer currently only seems to recognize equality operators (e.g., `col = 5`) as constants, failing to recognize that `col IS NULL` also guarantees a single value (`NULL`) for sorting purposes.

# Behavior
When running a query with: `WHERE col_a = [CONST] AND col_b IS NULL ORDER BY col_a, col_b` MySQL resorts to a `filesort` even if a covering index `(col_a, col_b)` exists. If `col_b` is removed from the `ORDER BY`, the optimizer correctly uses the index (Range scan / Using index).

# Analysis
An optimization trace analysis provided the conclusive evidence. For the example below provided in the "How to repeat" section, if we collected and inspected the optimizer trace for the filesort query, we will see the following:

```json
"simplifying_order_by": {
  "original_clause": "`issue_repro`.`category_id`,`issue_repro`.`nullable_date`,`issue_repro`.`item_name`",
  "items": [
    {
      "item": "`issue_repro`.`category_id`",
      "equals_constant_in_where": true
    },
    {
      "item": "`issue_repro`.`nullable_date`"
    },
    {
      "item": "`issue_repro`.`item_name`"
    }
  ],
  "resulting_clause_is_simple": true,
  "resulting_clause": "`issue_repro`.`nullable_date`,`issue_repro`.`item_name`"
}
```

It should have marked `nullable_date` with `"equals_constant_in_where": true` too but it did not. 

How to repeat:
-- 1. Create the database and use it (since you didn't have one selected)
CREATE DATABASE IF NOT EXISTS bug_test;
USE bug_test;

-- 2. Create a table with a nullable column
CREATE TABLE `issue_repro` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `category_id` bigint NOT NULL,
  `nullable_date` datetime DEFAULT NULL,
  `item_name` varchar(255) NOT NULL,  
  PRIMARY KEY (`id`),
  KEY `idx_cover` (`category_id`,`nullable_date`,`item_name`)
) ENGINE=InnoDB;

-- 3. Insert generic dummy data
INSERT INTO `issue_repro` (`category_id`, `nullable_date`, `item_name`) VALUES 
(100, NULL, 'Item A'),
(100, NULL, 'Item B'),
(100, '2024-01-01 00:00:00', 'Item C'),
(100, NULL, 'Item D'),
(200, NULL, 'Item E');

-- 4. Update statistics
ANALYZE TABLE `issue_repro`;

-- 5. REPRODUCTION QUERY
-- We filter by category_id (constant) and nullable_date (IS NULL).
-- We ORDER BY the exact index definition.
-- EXPECTED: Using index (Range scan)
-- ACTUAL: Using where; Using filesort
EXPLAIN SELECT id, item_name 
FROM issue_repro 
WHERE category_id = 100 
  AND nullable_date IS NULL 
ORDER BY category_id, nullable_date, item_name;

-- 6. CONTROL TEST
-- Removing the IS NULL column from ORDER BY fixes the plan,
EXPLAIN SELECT id, item_name 
FROM issue_repro 
WHERE category_id = 100 
  AND nullable_date IS NULL 
ORDER BY category_id, item_name;

Suggested fix:
Update the optimizer's `ORDER BY` simplification logic to recognize `col IS NULL` in the `WHERE` clause as a constant.
[23 Nov 2:07] Kishore Kumar
Marking OS & CPU arch to any.
[25 Nov 23:19] OCA Admin
Contribution submitted via Github - Bug #119442: Fix ORDER BY optimization doesn't recognize IS NULL as constant 
(*) Contribution by Kishore Kumar (Github akcube, mysql-server/pull/637#issuecomment-3577811434): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_3036837769.txt (text/plain), 28.44 KiB.

[27 Nov 4:52] Chaithra Marsur Gopala Reddy
Hi Kishore Kumar,

Thank you for the test case. Verified as described. and thank you also for the contribution.