Bug #111037 MySQL not selecting or not using the index optimally for query
Submitted: 15 May 2023 21:34 Modified: 16 May 2023 12:43
Reporter: Rodrigo Zhou Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[15 May 2023 21:34] Rodrigo Zhou
Description:
I have a complex index with functional parts, but when do queries using the functional part, the engine is not selecting the right index or not using it optimally (sort results unnecessarily), and thus performing very poorly in these cases.

This only happens when the table have plenty of columns and indexes. Below, I simplified the table to be reproducible, the actual table I'm working with is larger and has even more indices.

I also verified that if I change the `close_time` column to not null, and use it in the index instead of having a functional expression in the index, the engine works as expected: it selects the correct index and without the additional sort step.

How to repeat:
This is simplified version that reproduces the issue.

Table definition:
```
CREATE TABLE test (
  namespace_id        CHAR(64)      NOT NULL,
  run_id              CHAR(64)      NOT NULL,
  start_time          DATETIME(6)   NOT NULL,
  close_time          DATETIME(6)   NULL,
  execution_time      DATETIME(6)   NOT NULL,
  workflow_id         VARCHAR(255)  NOT NULL,
  workflow_type_name  VARCHAR(255)  NOT NULL,
  status              INT           NOT NULL,
  data                JSON          NULL,

  key1  VARCHAR(255)  GENERATED ALWAYS AS (data->>"$.key1"),
  key2  VARCHAR(255)  GENERATED ALWAYS AS (data->>"$.key2"),

  PRIMARY KEY (namespace_id, run_id)
);

CREATE INDEX default_idx ON test (
  namespace_id,
  (COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
  start_time DESC,
  run_id
);

CREATE INDEX by_execution_time ON test (
  namespace_id,
  execution_time,
  (COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
  start_time DESC,
  run_id
);

CREATE INDEX by_workflow_id ON test (
  namespace_id,
  workflow_id,
  (COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
  start_time DESC,
  run_id
);

CREATE INDEX by_workflow_type_name ON test (
  namespace_id,
  workflow_type_name,
  (COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
  start_time DESC,
  run_id
);

CREATE INDEX by_status ON test (
  namespace_id,
  status,
  (COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
  start_time DESC,
  run_id
);

CREATE INDEX by_key1 ON test (
  namespace_id,
  key1,
  (COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
  start_time DESC,
  run_id
);

CREATE INDEX by_key2 ON test (
  namespace_id,
  key2,
  (COALESCE(close_time, CAST('9999-12-31 23:59:59' AS DATETIME))) DESC,
  start_time DESC,
  run_id
);

```

Here's the procedure I wrote to populate the table with random data:
```
delimiter $$

CREATE PROCEDURE InsertRand(IN num_rows INT)
BEGIN
  DECLARE now INT;
  DECLARE st INT;
  DECLARE ct INT;
  DECLARE i INT;
  SET now = UNIX_TIMESTAMP();
  SET i = 1;
  START TRANSACTION;
  WHILE i <= num_rows do
    SET st = now - FLOOR(RAND() * 60 * 60 * 24 * 30);
    SET ct = st + FLOOR(RAND() * 60 * 60 * 24);
    IF ct > now THEN
      SET ct = NULL;
    END IF;
    INSERT INTO test (
      namespace_id,
      run_id,
      start_time,
      close_time,
      execution_time,
      workflow_id,
      workflow_type_name,
      status,
      data
    ) VALUES (
      CONCAT('ns-', FLOOR(RAND() * 10)),
      UUID(),
      FROM_UNIXTIME(st),
      FROM_UNIXTIME(ct),
      FROM_UNIXTIME(st),
      UUID(),
      CONCAT('wf-type-', FLOOR(RAND() * 1000)),
      IF(ct IS NULL, 1, 2),
      IF(RAND() < 0.8, CONCAT('{"key1": "kw-', FLOOR(RAND() * 1000), '"}'), NULL)
    );
    SET i = i + 1;
  END WHILE;
  COMMIT;
END$$

delimiter ;

CALL InsertRand(1000000);
```

Finally, if I run this query, you can see that although it's using the index `by_key1`, it's still sorting the results:
```
mysql> explain analyze select * from test where namespace_id='ns-1' and key1='kw-1' order by coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) desc, start_time desc, run_id;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Sort: coalesce(test.close_time,cast('9999-12-31 23:59:59' as datetime)) DESC, test.start_time DESC, test.run_id  (cost=98.74 rows=91) (actual time=11.040..11.066 rows=91 loops=1)
    -> Filter: (test.namespace_id = 'ns-1')  (cost=98.74 rows=91) (actual time=0.151..10.848 rows=91 loops=1)
        -> Index lookup on test using by_key1 (namespace_id='ns-1', key1='kw-1')  (cost=98.74 rows=91) (actual time=0.148..10.817 rows=91 loops=1)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
```

And if you add the `USE INDEX` hint, the engine is able to use it correctly:
```
mysql> explain analyze select * from test use index (by_key1) where namespace_id='ns-1' and key1='kw-1' order by coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) desc, start_time desc, run_id;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (test.namespace_id = 'ns-1')  (cost=98.74 rows=91) (actual time=0.228..4.086 rows=91 loops=1)
    -> Index lookup on test using by_key1 (namespace_id='ns-1', key1='kw-1')  (cost=98.74 rows=91) (actual time=0.224..4.050 rows=91 loops=1)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

This other query show that the engine doesn't even select the correct index:
```
mysql> explain analyze select * from test where namespace_id='ns-1' and key1 is null order by coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) desc, start_time desc, run_id;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Sort: coalesce(test.close_time,cast('9999-12-31 23:59:59' as datetime)) DESC, test.start_time DESC, test.run_id  (cost=1873.21 rows=159306) (actual time=157.548..159.641 rows=19888 loops=1)
    -> Filter: ((test.namespace_id = 'ns-1') and (test.key1 is null))  (cost=1873.21 rows=159306) (actual time=0.255..139.941 rows=19888 loops=1)
        -> Index lookup on test using PRIMARY (namespace_id='ns-1')  (cost=1873.21 rows=159306) (actual time=0.204..127.426 rows=99717 loops=1)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.17 sec)
```
[16 May 2023 12:43] MySQL Verification Team
Hi Mr. Zhou,

Thank you for your bug report.

We were able to reproduce the results that you get with 8.0.33.

Hence, this is now a verified bug report. 

Do note, however, that this is a marginal problem in the optimisation process, due to the usage of JSON, functional indices etc ..... In short, this is not a high severity bug.

Verified as reported.