Bug #106005 Used key parts reported in JSON optimizer plan is wrong
Submitted: 30 Dec 2021 1:49 Modified: 30 Dec 2021 18:53
Reporter: Ritwik Yadav Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version: OS:CentOS
Assigned to: MySQL Verification Team CPU Architecture:x86

[30 Dec 2021 1:49] Ritwik Yadav
Description:
The set of used key parts listed in the JSON formatted EXPLAIN plan seems to be erroneous. Extra columns are listed that are not actually used to seek to construct prefixes to scan an index.

How to repeat:
1. Create a simple table as such:

CREATE TABLE `tbl1` (
  `col1` bigint NOT NULL AUTO_INCREMENT,
  `col2` varchar(512) DEFAULT NULL,
  `col3` varchar(512) DEFAULT NULL,
  `col4` double DEFAULT NULL,
  `col5` bigint DEFAULT NULL,
  `col6` bigint NOT NULL DEFAULT '1',
  PRIMARY KEY (`col1`),
  KEY `cd1` (`col3`,`col5`,`col4`),
  KEY `cd2` (`col2`,`col5`,`col3`,`col4`),
  KEY `cd3` (`col4`,`col3`,`col5`),
  KEY `self_extra` (`col5`,`col4`,`col6`)
);

2. Run explain plan with JSON format.

EXPLAIN format=json SELECT col5 FROM tbl1 WHERE col5 = col6 \G;
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "496287.50"
    },
    "table": {
      "table_name": "tbl1",
      "access_type": "index",
      "key": "self_extra",
      "used_key_parts": [
        "col5",
        "col4",
        "col6"
      ],
      "key_length": "26",
      "rows_examined_per_scan": 4915215,
      "rows_produced_per_join": 4915215,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "4766.00",
        "eval_cost": "491521.50",
        "prefix_cost": "496287.50",
        "data_read_per_join": "4G"
      },
      "used_columns": [
        "col5",
        "col6"
      ],
      "attached_condition": "(`test`.`tbl1`.`col6` = `test`.`tbl1`.`col5`)"
    }
  }
}
[30 Dec 2021 12:13] MySQL Verification Team
Hi,

This is not a bug.

Your condition is:

col5 = col6 

and your keys are:

  PRIMARY KEY (`col1`),
  KEY `cd1` (`col3`,`col5`,`col4`),
  KEY `cd2` (`col2`,`col5`,`col3`,`col4`),
  KEY `cd3` (`col4`,`col3`,`col5`),
  KEY `self_extra` (`col5`,`col4`,`col6`)

* primary key does not containin col5 nor col6 so can't be used
* cd1, cd2 and cd3 start with col3, col2 and col4 so can't be used
So only self_extra key can possibly be used for col5=col6

Now the json shows:

      "used_key_parts": [
        "col5",
        "col4",
        "col6"
      ],

MySQL read key left to right so in order to use 5 and 6 it needs to read 4 too.

Used columns is represented properly too:

      "used_columns": [
        "col5",
        "col6"
      ],

What part of this are you considering a bug?

thanks
Bogdan
[30 Dec 2021 18:53] Ritwik Yadav
Hi Bogdan,

Thanks for taking a look. I think the bug should be reopened. 

I thought you might use that line of reasoning which leads to the definition of used_key_parts. As per the source code, it is:

> Number of key parts used to access the index

However, based on your comments, the suggested definition of used_key_parts seems to be the shortest prefix that needed to be read in order to execute the query. There are two problems as far as I can tell.

One, the documentation is not clear (might require another bug). Second, even if I go with the latter definition based on your comment, I have a counter-example to that as well.

Table Definition:

CREATE TABLE `tbl1` (
  `col1` bigint NOT NULL AUTO_INCREMENT,
  `col2` varchar(512) DEFAULT NULL,
  `col3` varchar(512) DEFAULT NULL,
  `col4` double DEFAULT NULL,
  `col5` bigint DEFAULT NULL,
  PRIMARY KEY (`col1`),
  KEY `cd1` (`col3`,`col5`,`col4`),
  KEY `cd2` (`col2`,`col5`,`col3`,`col4`)
)

Query: 

EXPLAIN format=json SELECT col3 from tbl1 WHERE col2 = 'HIU' AND col3 > 'FPG' AND col4 > 0.9 AND col5 >= 99874;

Result:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "397.70"
    },
    "table": {
      "table_name": "tbl1",
      "access_type": "range",
      "possible_keys": [
        "cd1",
        "cd2"
      ],
      "key": "cd2",
      "used_key_parts": [
        "col2",
        "col5"
      ],
      "key_length": "1039",
      "rows_examined_per_scan": 1221,
      "rows_produced_per_join": 915,
      "filtered": "75.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "306.13",
        "eval_cost": "91.58",
        "prefix_cost": "397.70",
        "data_read_per_join": "944K"
      },
      "used_columns": [
        "col2",
        "col3",
        "col4",
        "col5"
      ],
      "attached_condition": "((`test`.`tbl1`.`col2` = 'HIU') and (`test`.`tbl1`.`col3` > 'FPG') and (`test`.`tbl1`.`col4` > 0.9) and (`test`.`tbl1`.`col5` >= 99874))"
    }
  }
}

The index used cd2 covers the query in question. The used_key_parts field contains only col2 and col5 which form the prefix used to "access" rows from the index. However, as per your revised definition, it should have included col3 and col4. Right?
[31 Dec 2021 4:56] MySQL Verification Team
Hi,

In case of "where c5 = c6" and key that is 5,4,6 IMHO using 5,4,6 is ok as there is only other way to use only 5 (what would be my guess) or 5,4,6. MySQL cannot use 5,6 as composite key is read left to right while eq and last one is range, after range you can't use it any more.

In second case you have "where c2 = CONSTANT AND c3 > RANGE AND c4 > RANGE AND c5 >= RANGE" with key 2,5,3,4 so what can we use c2 is constant so use, and then we have ranges so next column in key is 5 so we use 2,5 as after 5 we can't do anything as 5 was range

In both instances explain is showing what optimizer decided to do. Now why optimizer for e.g. in first instance decided to use whole key and not only the c5 I can't say, I assume cause it is faster to read c6 from index than from the table but it does not need to be that reason... Anyhow it is not a bug, optimizer did decide do execute it like that and explain shown it as is.

But e.g. if your key is in "proper order", as you can't read 5 and 6 if 4 is between, you have to read 4 too but if your key was 5,6 than you have this which is not expected and something I'll discuss with optimizer team after holidays as in this case there is no reason for rest of they key to be used:

mysql [localhost:8027] {msandbox} (b106005) > create table t2 (
    ->   `col1` bigint NOT NULL AUTO_INCREMENT,
    ->   `col2` varchar(512) DEFAULT NULL,
    ->   `col3` varchar(512) DEFAULT NULL,
    ->   `col4` double DEFAULT NULL,
    ->   `col5` bigint DEFAULT NULL,
    ->   `col6` bigint NOT NULL DEFAULT '1',
    ->   PRIMARY KEY (`col1`),
    ->  KEY `properlyorderedkey` (`col5`, `col6`, `col2`, `col3`, `col4`));
Query OK, 0 rows affected (0.05 sec)

mysql [localhost:8027] {msandbox} (b106005) > EXPLAIN format=json SELECT col5 FROM t2 WHERE col5 = col6 \G;
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "0.35"
    },
    "table": {
      "table_name": "t2",
      "access_type": "index",
      "key": "properlyorderedkey",
      "used_key_parts": [
        "col5",
        "col6",
        "col2",
        "col3",
        "col4"
      ],
      "key_length": "1056",
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "0.25",
        "eval_cost": "0.10",
        "prefix_cost": "0.35",
        "data_read_per_join": "1K"
      },
      "used_columns": [
        "col5",
        "col6"
      ],
      "attached_condition": "(`b106005`.`t2`.`col6` = `b106005`.`t2`.`col5`)"
    }
  }
}
1 row in set, 1 warning (0.00 sec)
[31 Dec 2021 4:58] MySQL Verification Team
Probably has to with with how columnx=columny is handled as in case of consts:

mysql [localhost:8027] {msandbox} (b106005) > EXPLAIN format=json SELECT col5 FROM t2 WHERE col5 = 10 and col6=20 \G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "0.35"
    },
    "table": {
      "table_name": "t2",
      "access_type": "ref",
      "possible_keys": [
        "properlyorderedkey"
      ],
      "key": "properlyorderedkey",
      "used_key_parts": [
        "col5",
        "col6"
      ],
      "key_length": "17",
      "ref": [
        "const",
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 1,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "0.25",
        "eval_cost": "0.10",
        "prefix_cost": "0.35",
        "data_read_per_join": "1K"
      },
      "used_columns": [
        "col5",
        "col6"
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)

so might be a bug here, we'll see when the optimizer team is back

all best
Bogdan