Bug #111764 Performance bug when ordering by null values
Submitted: 14 Jul 2023 22:33 Modified: 17 Jul 2023 12:55
Reporter: Mihnea Giurgea Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: null, order by, performance

[14 Jul 2023 22:33] Mihnea Giurgea
Description:
We are running into the following deterministic performance issue.

This query `WHERE host_id = 47 AND status = 1 AND start_date > "2020-01-15 ORDER BY host_id, status, start_date, id " LIMIT 3` is bounded in the sense that the work that MySQL does internally (as observed by Handler_read_next) is ~3.

However, changing the above query from `status = 1` to `status IS NULL` breaks the performance of this query, and it's no longer bounded.

How to repeat:
`example_reservation` table has a single composite index on (host_id, status, start_date)

Table        | example_reservation
Create Table | CREATE TABLE `example_reservation` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `host_id` bigint(20) NOT NULL,
  `status` bigint(20) DEFAULT NULL,
  `start_date` datetime DEFAULT NULL,
  `nights` bigint(20) DEFAULT NULL,
  `guest_count` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `tri_index` (`host_id`,`status`,`start_date`)
) ENGINE=InnoDB AUTO_INCREMENT=1025 DEFAULT CHARSET=utf8

We have a bunch of entries with status IS NULL, and status = 1  

mysql root@127.0.0.1:uds_qsplit> 
SELECT count(1) FROM example_reservation
WHERE host_id = 47 AND status IS NULL AND start_date > "2020-11-15";
+----------+
| count(1) |
+----------+
| 24       |
+----------+
SELECT count(1) FROM example_reservation
WHERE host_id = 47 AND status = 1 AND start_date > "2020-11-15";
+----------+
| count(1) |
+----------+
| 269      |
+----------+

EXPECTED - the following query is efficient, i.e. MySQL issues ~3 Handler_read in order to return the LIMIT 3 rows

mysql root@127.0.0.1:uds_qsplit> FLUSH STATUS; 
SELECT id, host_id, status, start_date 
FROM example_reservation 
FORCE INDEX (tri_index) 
WHERE host_id = 47 AND status = 1 AND start_date > "2020-01-15" 
ORDER BY host_id, status, start_date, id 
LIMIT 3; 

SHOW SESSION STATUS LIKE 'Handler_read%';
Query OK, 0 rows affected
Time: 0.002s

+-----+---------+--------+---------------------+
| id  | host_id | status | start_date          |
+-----+---------+--------+---------------------+
| 167 | 47      | 1      | 2020-11-25 20:20:12 |
| 221 | 47      | 1      | 2020-11-25 20:20:12 |
| 341 | 47      | 1      | 2020-11-25 20:20:12 |
+-----+---------+--------+---------------------+
3 rows in set
Time: 0.083s

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 2     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set
Time: 0.071s

NOT EXPECTED - the same query is NOT EFFICIENT, if changing status = 1 to status IS NULL. Why is that?

mysql root@127.0.0.1:uds_qsplit> FLUSH STATUS; 
SELECT id, host_id, status, start_date 
FROM example_reservation FORCE INDEX (tri_index) 
WHERE host_id = 47 AND status IS NULL AND start_date > "2020-11-15" ORDER BY host_id, status, start_date, id 
LIMIT 3; 

SHOW SESSION STATUS LIKE 'Handler_read%';
Query OK, 0 rows affected
Time: 0.002s

+------+---------+--------+---------------------+
| id   | host_id | status | start_date          |
+------+---------+--------+---------------------+
| 1021 | 47      | <null> | 2020-11-18 00:00:00 |
| 1020 | 47      | <null> | 2020-11-19 00:00:00 |
| 1001 | 47      | <null> | 2020-11-20 00:00:00 |
+------+---------+--------+---------------------+
3 rows in set
Time: 0.093s

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 24    |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set
Time: 0.067s

EXPECTED - removing status from the ORDER BY clause makes above query EFFICIENT. Why is that?

mysql root@127.0.0.1:uds_qsplit> FLUSH STATUS; 
SELECT id, host_id, status, start_date 
FROM example_reservation FORCE INDEX (tri_index) 
WHERE host_id = 47 AND status IS NULL AND start_date > “2020-11-15" 
ORDER BY host_id, start_date, id 
LIMIT 3; 

SHOW SESSION STATUS LIKE 'Handler_read%';
Query OK, 0 rows affected
Time: 0.002s

+------+---------+--------+---------------------+
| id   | host_id | status | start_date          |
+------+---------+--------+---------------------+
| 1021 | 47      | <null> | 2020-11-18 00:00:00 |
| 1020 | 47      | <null> | 2020-11-19 00:00:00 |
| 1001 | 47      | <null> | 2020-11-20 00:00:00 |
+------+---------+--------+---------------------+
3 rows in set
Time: 0.091s

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 2     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set
Time: 0.069s
[17 Jul 2023 12:55] MySQL Verification Team
Hi Mr. Giurgea,

Thank you for your bug report.

However, 5.7 is not maintained any more.

Hence, if you can repeat the behaviour with 8.0.33, please send us the entire and fully reproducible test case, with all necessary data, including the table rows.

Unsupported.