Bug #110117 Lack of instrumentation for index condition impact on the query plan
Submitted: 17 Feb 2023 13:22 Modified: 17 Feb 2023 14:01
Reporter: Przemyslaw Malkowski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: explain, ICP, Optimizer

[17 Feb 2023 13:22] Przemyslaw Malkowski
Description:
Apparently contrary to the documentation:
https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html
In some conditions, the ICP optimization is able to take advantage of compound index parts that are not the left-most prefix. Yet this advantage is not well reflected in the EXPLAIN estimates, nor the optimizer trace.

Also, ICP documentation doesn't seem to explain the ability to utilize such index part: 
https://dev.mysql.com/doc/refman/8.0/en/index-condition-pushdown-optimization.html

I cannot find any place to actually explain the huge difference in execution stats for the two index use cases. The only difference in the EXPLAIN plan is "Using where" in extra info the index without the second column, and "index_condition" info in JSON format.

How to repeat:
CREATE TABLE `v1` (
  `id` bigint NOT NULL,
  `id_col1` int NOT NULL,
  `col2` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL,
  `col3` varchar(500) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin DEFAULT NULL,
  `col4` datetime NOT NULL,
  `col5` tinyint(1) NOT NULL DEFAULT '0',
  `col6` tinyint(1) NOT NULL DEFAULT '1',
  `col7` varchar(20) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL,
  `col8` varchar(20) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin DEFAULT NULL,
  `col9` tinyint(1) NOT NULL DEFAULT '0',
  `col10` datetime DEFAULT NULL,
  `col11` datetime DEFAULT NULL,
  `col12` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`,`id_col1`),
  UNIQUE KEY `uk` (`id_col1`,`col2`,`col7`) USING BTREE,
  KEY `id_col1` (`id_col1`),
  KEY `composite1` (`id_col1`,`col7`,`col5`,`col6`,`col9`,`col4`),
  KEY `col10` (`col10`),
  KEY `col11` (`col11`),
  KEY `composite2` (`id_col1`,`col12`),
  KEY `composite3` (`id_col1`,`col7`,`col8`,`col5`,`col9`,`col4`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_bin;

-- Load data dump from #110111.

mysql > explain select id, col2, col4 from v1 FORCE KEY(composite1) where id_col1 in (1178451,1001264,1045769,580,1038583,196) and col4 > now() - interval 48 HOUR\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: v1
   partitions: NULL
         type: range
possible_keys: composite1
          key: composite1
      key_len: 4
          ref: NULL
         rows: 50720
     filtered: 33.33
        Extra: Using index condition
1 row in set, 1 warning (0.00 sec)

mysql > explain select id, col2, col4 from v1 FORCE KEY(composite2) where id_col1 in (1178451,1001264,1045769,580,1038583,196) and col4 > now() - interval 48 HOUR\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: v1
   partitions: NULL
         type: range
possible_keys: composite2
          key: composite2
      key_len: 4
          ref: NULL
         rows: 52952
     filtered: 33.33
        Extra: Using index condition; Using where
1 row in set, 1 warning (0.00 sec)

mysql > explain format=json select id, col2, col4 from v1 FORCE KEY(composite1) where id_col1 in (1178451,1001264,1045769,580,1038583,196) and col4 > now() - interval 48 HOUR\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "60870.01"
    },
    "table": {
      "table_name": "v1",
      "access_type": "range",
      "possible_keys": [
        "composite1"
      ],
      "key": "composite1",
      "used_key_parts": [
        "id_col1"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 50720,
      "rows_produced_per_join": 16904,
      "filtered": "33.33",
      "index_condition": "((`db1`.`v1`.`id_col1` in (1178451,1001264,1045769,580,1038583,196)) and (`db1`.`v1`.`col4` > <cache>((now() - interval 48 hour))))",
      "cost_info": {
        "read_cost": "55798.01",
        "eval_cost": "1690.50",
        "prefix_cost": "60870.01",
        "data_read_per_join": "39M"
      },
      "used_columns": [
        "id",
        "id_col1",
        "col2",
        "col4"
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)

mysql > explain format=json select id, col2, col4 from v1 FORCE KEY(composite2) where id_col1 in (1178451,1001264,1045769,580,1038583,196) and col4 > now() - interval 48 HOUR\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "63548.41"
    },
    "table": {
      "table_name": "v1",
      "access_type": "range",
      "possible_keys": [
        "composite2"
      ],
      "key": "composite2",
      "used_key_parts": [
        "id_col1"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 52952,
      "rows_produced_per_join": 17648,
      "filtered": "33.33",
      "index_condition": "(`db1`.`v1`.`id_col1` in (1178451,1001264,1045769,580,1038583,196))",
      "cost_info": {
        "read_cost": "58253.21",
        "eval_cost": "1764.89",
        "prefix_cost": "63548.41",
        "data_read_per_join": "40M"
      },
      "used_columns": [
        "id",
        "id_col1",
        "col2",
        "col4"
      ],
      "attached_condition": "(`db1`.`v1`.`col4` > <cache>((now() - interval 48 hour)))"
    }
  }
}
1 row in set, 1 warning (0.00 sec)

As seen above, the estimated rows and costs are very similar, and used_key_parts show only the first column used, while clearly in the composite1 index case, col4 is used as well.

Despite the above, execution stats are very different:

composite1:
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 6     |
| Handler_read_last          | 0     |
| Handler_read_next          | 50    |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

composite2:
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 6     |
| Handler_read_last          | 0     |
| Handler_read_next          | 24937 |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

Suggested fix:
Fix the visibility in the EXPLAIN, and perhaps improve the documentation as well.
[17 Feb 2023 13:49] MySQL Verification Team
Hi Mr. Malkowski,

Thank you for your bug report.

However, your bug is not defined well. 

It is an easy conclusion that what you are reporting is a bug in our Optimiser. However, in your text you are asking for greater visibility and better documentation.

So, what kind of bug are you reporting ??? Feature request, Documentation bug or something third.

Do note that we maintain only Reference Manual and we do not have nor maintain Users Manual.
[17 Feb 2023 13:57] Przemyslaw Malkowski
Hi,
Good point.
Well, the main problem here is that the EXPLAIN plan is giving misleading and inaccurate information. It suggests almost identical costs for both queries, which in reality are way different.
My test example is simplified, but the same problem was seen in production, where the execution time diff was minutes vs seconds. Again, not reflected in the EXPLAIN plan at all.

Another issue is that the documentation seems to claim that only the leftmost parts of the index can be used in filtering.
[17 Feb 2023 14:01] MySQL Verification Team
Hi Mr. Malkowski,

We have verified your report as a documentation bug. However, in that process, other bugs might be filed, internally in our own process, for the Optimiser itself.

Verified.