Bug #89182 Optimizer unable to ignore index part entering less optimal query plan
Submitted: 11 Jan 9:51 Modified: 11 Jan 13:09
Reporter: Przemyslaw Malkowski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.38, 5.7.20, 8.0.3 OS:Any
Assigned to: CPU Architecture:Any

[11 Jan 9:51] Przemyslaw Malkowski
Description:
In simple two columns (with unique values) filtering, when available, MySQL chooses index on single column as it provides optimal plan. However, if only compound key is available, it seems to be unable to ignore the second index column when it is also mentioned in where clause.

And when using second index column is not possible (as not referenced in where), it uses the first column successfully and follows optimal, less costly plan.

How to repeat:
Below tested on 8.0.3, but same results are seen on 5.7.20 and 5.6.38.

mysql > show create table test5\G
*************************** 1. row ***************************
       Table: test5
Create Table: CREATE TABLE `test5` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  KEY `a_d` (`a`,`d`),
  KEY `a_b` (`a`,`b`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.11 sec)

$ for i in {1..1000};do mysql -S /tmp/mysql_sandbox12411.sock -umsandbox -pmsandbox test -e "insert into test5 values ($i,$i,$i,$i)"; done

mysql > EXPLAIN format=json select * FROM test5 WHERE a=1 AND b IN (1,2,3,4,5,6) \G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "0.35"
    },
    "table": {
      "table_name": "test5",
      "access_type": "ref",
      "possible_keys": [
        "a_d",
        "a_b"
      ],
      "key": "a_d",
      "used_key_parts": [
        "a"
      ],
      "key_length": "5",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 0,
      "filtered": "50.00",
      "cost_info": {
        "read_cost": "0.25",
        "eval_cost": "0.05",
        "prefix_cost": "0.35",
        "data_read_per_join": "12"
      },
      "used_columns": [
        "a",
        "b",
        "c",
        "d"
      ],
      "attached_condition": "(`test`.`test5`.`b` in (1,2,3,4,5,6))"
    }
  }
}
1 row in set, 1 warning (0.00 sec)

mysql > alter table test5 drop key a_d;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql > EXPLAIN format=json select * FROM test5 WHERE a=1 AND b IN (1,2,3,4,5,6) \G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4.21"
    },
    "table": {
      "table_name": "test5",
      "access_type": "range",
      "possible_keys": [
        "a_b"
      ],
      "key": "a_b",
      "used_key_parts": [
        "a",
        "b"
      ],
      "key_length": "10",
      "rows_examined_per_scan": 6,
      "rows_produced_per_join": 6,
      "filtered": "100.00",
      "index_condition": "((`test`.`test5`.`a` = 1) and (`test`.`test5`.`b` in (1,2,3,4,5,6)))",
      "cost_info": {
        "read_cost": "3.61",
        "eval_cost": "0.60",
        "prefix_cost": "4.21",
        "data_read_per_join": "144"
      },
      "used_columns": [
        "a",
        "b",
        "c",
        "d"
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)

master [localhost] {msandbox} (test) > alter table test5 add key(a);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

master [localhost] {msandbox} (test) > EXPLAIN format=json select * FROM test5 WHERE a=1 AND b IN (1,2,3,4,5,6) \G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "0.35"
    },
    "table": {
      "table_name": "test5",
      "access_type": "ref",
      "possible_keys": [
        "a_b",
        "a"
      ],
      "key": "a",
      "used_key_parts": [
        "a"
      ],
      "key_length": "5",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 1,
      "rows_produced_per_join": 0,
      "filtered": "50.00",
      "cost_info": {
        "read_cost": "0.25",
        "eval_cost": "0.05",
        "prefix_cost": "0.35",
        "data_read_per_join": "12"
      },
      "used_columns": [
        "a",
        "b",
        "c",
        "d"
      ],
      "attached_condition": "(`test`.`test5`.`b` in (1,2,3,4,5,6))"
    }
  }
}
1 row in set, 1 warning (0.00 sec)

When only (a,b) index is defined, query execution results in 'Handler_read_key' counter equal to number of IN values. And situation is the same even if there are 1K of IN values.

Suggested fix:
If in the above example, using key on 'a' alone provides better plan then when using 'a,b' pair - "access_type": "ref", "query_cost": "0.35" VS "access_type": "range", "query_cost": "4.21", then why optimizer cannot just ignore the 'b' key part from index (a,b) in this case?

When both (a) and (a,b) keys are defined, optimizer chooses (a) anyway.
[11 Jan 13:09] Umesh Shastry
Hello Przemyslaw,

Thank you for the report and test case.

Thanks,
Umesh