MySQL Bugs: #101411: Optimizer uses range predicate on single table query but not on a join

Bug #101411	Optimizer uses range predicate on single table query but not on a join
Submitted:	2 Nov 2020 3:07	Modified:	2 Nov 2020 6:12
Reporter:	Mark Callaghan	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	8.0.21	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
This is a simplified example of a query that TSBS, Time Series Benchmark Suite, runs. I have work in progress to get TSBS running for MySQL.

For the test case, there is a table "cpu" that has an index on (id, ts) and a range scan is done with predicates on id and ts for queries like:
select * from cpu where ts > ? and ts < ? and id in (1)
select * from cpu where ts > ? and ts < ? and id in (1, 3)

When the last part of the query is changed to a subquery like:
... and id in (select tags_id from tags where h = ?)

Then a range scan is still done on cpu, but the index predicate (via keylen in explain output) is limited to the first index column (id) unless I add a unique index on tags.h. I assume the benefit of that unique index is to allow the subquery to be executed during optimization.

For the real query I can't add a unique index on tags.h. Therefore, the query plans that I get ends up scanning too much from the cpu table.

How to repeat:
Part 1 -> this does a single table query on cpu to show that MySQL can use predicates on both index columns whether or not the in-list for tags_id is limited to one or more than one value...

drop table if exists tags;
drop table if exists cpu;

create table tags (hostname varchar(256), id int primary key);
create table cpu (tags_id int not null, ts int not null);

insert into tags(hostname, id) values ('h1', 1), ('h2', 2), ('h3', 3), ('h4', 4), ('h5', 5);
insert into cpu(tags_id, ts) values (1,1), (1,2), (1,3), (1,4), (1,5);
insert into cpu(tags_id, ts) values (2,1), (2,2), (2,3), (2,4), (2,5);
insert into cpu(tags_id, ts) values (3,1), (3,2), (3,3), (3,4), (3,5);
insert into cpu(tags_id, ts) values (4,1), (4,2), (4,3), (4,4), (4,5);

create index x on tags(hostname);
analyze table tags;

create index id_ts on cpu(tags_id, ts);
analyze table cpu;

explain select * from cpu where ts > 1 and ts < 4 and tags_id in (1)\G
explain select * from cpu where ts > 1 and ts < 4 and tags_id in (1,3)\G

alter table tags drop index x;
create unique index x on tags(hostname);

explain select * from cpu where ts > 1 and ts < 4 and tags_id in (1)\G
explain select * from cpu where ts > 1 and ts < 4 and tags_id in (1,3)\G

And an example plan is shows that keylen=8, so predicates are used for both indexed columns (equality for id, range for ts)

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cpu
   partitions: NULL
         type: range
possible_keys: id_ts
          key: id_ts
      key_len: 8
          ref: NULL
         rows: 2
     filtered: 100.00
        Extra: Using where; Using index
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cpu
   partitions: NULL
         type: range
possible_keys: id_ts
          key: id_ts
      key_len: 8
          ref: NULL
         rows: 4
     filtered: 100.00
        Extra: Using where; Using index

Part 2 -> this does the real query and the only way to get keylen=8 is to add a unique index on tags.h, but that won't work for the real application.

drop table if exists tags;
drop table if exists cpu;

create table tags (hostname varchar(256), id int primary key);
create table cpu (tags_id int not null, ts int not null);

insert into tags(hostname, id) values ('h1', 1), ('h2', 2), ('h3', 3), ('h4', 4), ('h5', 5);
insert into cpu(tags_id, ts) values (1,1), (1,2), (1,3), (1,4), (1,5);
insert into cpu(tags_id, ts) values (2,1), (2,2), (2,3), (2,4), (2,5);
insert into cpu(tags_id, ts) values (3,1), (3,2), (3,3), (3,4), (3,5);
insert into cpu(tags_id, ts) values (4,1), (4,2), (4,3), (4,4), (4,5);

create index x on tags(hostname);
analyze table tags;

create index id_ts on cpu(tags_id, ts);
analyze table cpu;

explain select * from cpu where ts > 1 and ts < 4 and tags_id in (select id from tags where hostname = 'h1')\G
set optimizer_trace="enabled=on";
select * from cpu where ts > 1 and ts < 4 and tags_id in (select id from tags where hostname = 'h1');
select * from information_schema.optimizer_trace\G
set optimizer_trace="enabled=off";

alter table tags drop index x;
create unique index x on tags(hostname);
analyze table tags;

explain select * from cpu where ts > 1 and ts < 4 and tags_id in (select id from tags where hostname = 'h1')\G
set optimizer_trace="enabled=on";
select * from cpu where ts > 1 and ts < 4 and tags_id in (select id from tags where hostname = 'h1');
select * from information_schema.optimizer_trace\G
set optimizer_trace="enabled=off";

And this is the plan when there isn't a unique index and keylen=4
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tags
   partitions: NULL
         type: ref
possible_keys: PRIMARY,x
          key: x
      key_len: 259
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: cpu
   partitions: NULL
         type: ref
possible_keys: id_ts
          key: id_ts
      key_len: 4
          ref: t.tags.id
         rows: 5
     filtered: 11.11
        Extra: Using where; Using index

For the real application that I didn't share;

Schema, queries and plan
https://gist.github.com/mdcallag/99ba5feccb62002b2b63ee23bbcb55fa

Optimizer trace
https://gist.github.com/mdcallag/0909abdcc5d3f021a2576960584b384d

Optimizer trace has "not_applicable" for the index that I want 
https://gist.github.com/mdcallag/0909abdcc5d3f021a2576960584b384d#file-gistfile1-txt-L156

Hello Mark,

Thank you for the feedback and reasonable feature request!

Thanks,
Umesh