Description:
I have a complex index with functional parts, but when do queries using the functional part, the engine is not selecting the right index or not using it optimally (sort results unnecessarily), and thus performing very poorly in these cases.
This only happens when the table have plenty of columns and indexes. Below, I simplified the table to be reproducible, the actual table I'm working with is larger and has even more indices.
I also verified that if I change the `close_time` column to not null, and use it in the index instead of having a functional expression in the index, the engine works as expected: it selects the correct index and without the additional sort step.
How to repeat:
This is simplified version that reproduces the issue.
Table definition:
```
CREATE TABLE test (
namespace_id CHAR(64) NOT NULL,
run_id CHAR(64) NOT NULL,
start_time DATETIME(6) NOT NULL,
close_time DATETIME(6) NULL,
execution_time DATETIME(6) NOT NULL,
workflow_id VARCHAR(255) NOT NULL,
workflow_type_name VARCHAR(255) NOT NULL,
status INT NOT NULL,
data JSON NULL,
key1 VARCHAR(255) GENERATED ALWAYS AS (data->>"$.key1"),
key2 VARCHAR(255) GENERATED ALWAYS AS (data->>"$.key2"),
PRIMARY KEY (namespace_id, run_id)
);
CREATE INDEX default_idx ON test (
namespace_id,
(COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
start_time DESC,
run_id
);
CREATE INDEX by_execution_time ON test (
namespace_id,
execution_time,
(COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
start_time DESC,
run_id
);
CREATE INDEX by_workflow_id ON test (
namespace_id,
workflow_id,
(COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
start_time DESC,
run_id
);
CREATE INDEX by_workflow_type_name ON test (
namespace_id,
workflow_type_name,
(COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
start_time DESC,
run_id
);
CREATE INDEX by_status ON test (
namespace_id,
status,
(COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
start_time DESC,
run_id
);
CREATE INDEX by_key1 ON test (
namespace_id,
key1,
(COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
start_time DESC,
run_id
);
CREATE INDEX by_key2 ON test (
namespace_id,
key2,
(COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
start_time DESC,
run_id
);
```
Here's the procedure I wrote to populate the table with random data:
```
delimiter $$
CREATE PROCEDURE InsertRand(IN num_rows INT)
BEGIN
DECLARE now INT;
DECLARE st INT;
DECLARE ct INT;
DECLARE i INT;
SET now = UNIX_TIMESTAMP();
SET i = 1;
START TRANSACTION;
WHILE i <= num_rows do
SET st = now - FLOOR(RAND() * 60 * 60 * 24 * 30);
SET ct = st + FLOOR(RAND() * 60 * 60 * 24);
IF ct > now THEN
SET ct = NULL;
END IF;
INSERT INTO test (
namespace_id,
run_id,
start_time,
close_time,
execution_time,
workflow_id,
workflow_type_name,
status,
data
) VALUES (
CONCAT('ns-', FLOOR(RAND() * 10)),
UUID(),
FROM_UNIXTIME(st),
FROM_UNIXTIME(ct),
FROM_UNIXTIME(st),
UUID(),
CONCAT('wf-type-', FLOOR(RAND() * 1000)),
IF(ct IS NULL, 1, 2),
IF(RAND() < 0.8, CONCAT('{"key1": "kw-', FLOOR(RAND() * 1000), '"}'), NULL)
);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
delimiter ;
CALL InsertRand(1000000);
```
Finally, if I run this query, you can see that although it's using the index `by_key1`, it's still sorting the results:
```
mysql> explain analyze select * from test where namespace_id='ns-1' and key1='kw-1' order by coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) desc, start_time desc, run_id;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Sort: coalesce(test.close_time,cast('9999-12-31 23:59:59' as datetime)) DESC, test.start_time DESC, test.run_id (cost=98.74 rows=91) (actual time=11.040..11.066 rows=91 loops=1)
-> Filter: (test.namespace_id = 'ns-1') (cost=98.74 rows=91) (actual time=0.151..10.848 rows=91 loops=1)
-> Index lookup on test using by_key1 (namespace_id='ns-1', key1='kw-1') (cost=98.74 rows=91) (actual time=0.148..10.817 rows=91 loops=1)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
```
And if you add the `USE INDEX` hint, the engine is able to use it correctly:
```
mysql> explain analyze select * from test use index (by_key1) where namespace_id='ns-1' and key1='kw-1' order by coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) desc, start_time desc, run_id;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (test.namespace_id = 'ns-1') (cost=98.74 rows=91) (actual time=0.228..4.086 rows=91 loops=1)
-> Index lookup on test using by_key1 (namespace_id='ns-1', key1='kw-1') (cost=98.74 rows=91) (actual time=0.224..4.050 rows=91 loops=1)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```
This other query show that the engine doesn't even select the correct index:
```
mysql> explain analyze select * from test where namespace_id='ns-1' and key1 is null order by coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) desc, start_time desc, run_id;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Sort: coalesce(test.close_time,cast('9999-12-31 23:59:59' as datetime)) DESC, test.start_time DESC, test.run_id (cost=1873.21 rows=159306) (actual time=157.548..159.641 rows=19888 loops=1)
-> Filter: ((test.namespace_id = 'ns-1') and (test.key1 is null)) (cost=1873.21 rows=159306) (actual time=0.255..139.941 rows=19888 loops=1)
-> Index lookup on test using PRIMARY (namespace_id='ns-1') (cost=1873.21 rows=159306) (actual time=0.204..127.426 rows=99717 loops=1)
|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.17 sec)
```