MySQL Bugs: #101391: Performance Schema storage engine doesn't use index optimizations on group by

Bug #101391	Performance Schema storage engine doesn't use index optimizations on group by
Submitted:	30 Oct 2020 3:04	Modified:	19 Nov 2020 16:09
Reporter:	Agustín G	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Performance Schema	Severity:	S5 (Performance)
Version:	8.0, 8.0.22	OS:	Any
Assigned to:	Marc ALFF	CPU Architecture:	Any

Description:
When querying a table using the performance schema storage engine, a simple query with group by will need to create a temporary table, even if it should be possible to not do so, by using an index.
I have checked the manual, but couldn't find anything on this so far, so I'm creating this new bug report.

InnoDB table explain plan and handler data:
mysql [localhost:24249] {msandbox} (test) > explain select mdl_blocker.owner_thread_id
    -> from test.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
    -> where mdl_blocker.object_type='TABLE'
    ->   and mdl_blocker.lock_type='SHARED_READ'
    ->   and mdl_blocker.lock_status='GRANTED'
    -> group by mdl_blocker.owner_thread_id;
+----+-------------+-------------+------------+-------+-----------------+-----------------+---------+------+------+----------+-------------+
| id | select_type | table       | partitions | type  | possible_keys   | key             | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------------+------------+-------+-----------------+-----------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | mdl_blocker | NULL       | index | OWNER_THREAD_ID | OWNER_THREAD_ID | 18      | NULL |    4 |    25.00 | Using where |
+----+-------------+-------------+------------+-------+-----------------+-----------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:24249] {msandbox} (test) > show status like 'handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 4     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

Performance Schema table explain plan and handler data:
mysql [localhost:24249] {msandbox} (test) > explain select mdl_blocker.owner_thread_id
    -> from performance_schema.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
    -> where mdl_blocker.object_type='TABLE'
    ->   and mdl_blocker.lock_type='SHARED_READ'
    ->   and mdl_blocker.lock_status='GRANTED'
    -> group by mdl_blocker.owner_thread_id;
+----+-------------+-------------+------------+------+-----------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table       | partitions | type | possible_keys   | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------------+------------+------+-----------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | mdl_blocker | NULL       | ALL  | OWNER_THREAD_ID | NULL | NULL    | NULL | 1024 |     0.10 | Using where; Using temporary |
+----+-------------+-------------+------------+------+-----------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:24249] {msandbox} (test) > show status like 'handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 0     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 8     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 4     |
+----------------------------+-------+

Note that without the group by, we would get duplicated rows:
mysql [localhost:24249] {msandbox} (test) > select mdl_blocker.owner_thread_id
    -> from performance_schema.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
    -> where mdl_blocker.object_type='TABLE'
    ->   and mdl_blocker.lock_type='SHARED_READ'
    ->   and mdl_blocker.lock_status='GRANTED';
+-----------------+
| owner_thread_id |
+-----------------+
|              49 |
|              49 |
|              49 |
|              48 |
+-----------------+
4 rows in set (0.00 sec)

versus:

mysql [localhost:24249] {msandbox} (test) > select mdl_blocker.owner_thread_id
    -> from performance_schema.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
    -> where mdl_blocker.object_type='TABLE'
    ->   and mdl_blocker.lock_type='SHARED_READ'
    ->   and mdl_blocker.lock_status='GRANTED'
    -> group by mdl_blocker.owner_thread_id;
+-----------------+
| owner_thread_id |
+-----------------+
|              49 |
|              48 |
+-----------------+
2 rows in set (0.01 sec)

How to repeat:
-- Create some tables so we can populate the performance_schema.metadata_locks table with rows, for doing some reproducible tests

use test;
create table t1 (id int primary key);
insert into t1 values (1), (2);
create table t2 (id int primary key);
insert into t2 values (1), (2);
create table t3 (id int primary key);
insert into t3 values (1), (2);

begin;
select * from t1;
select * from t2;
select * from t3;
-- leave this running in one session, and use another for next steps

-- Create copy of performance_schema.metadata_locks table and insert same rows as the one in performance_schema in it

drop table if exists test.metadata_locks;
create table test.metadata_locks(
  OBJECT_TYPE varchar(64) NOT NULL,
  OBJECT_SCHEMA varchar(64) DEFAULT NULL,
  OBJECT_NAME varchar(64) DEFAULT NULL,
  COLUMN_NAME varchar(64) DEFAULT NULL,
  OBJECT_INSTANCE_BEGIN bigint unsigned NOT NULL,
  LOCK_TYPE varchar(32) NOT NULL,
  LOCK_DURATION varchar(32) NOT NULL,
  LOCK_STATUS varchar(32) NOT NULL,
  SOURCE varchar(64) DEFAULT NULL,
  OWNER_THREAD_ID bigint unsigned DEFAULT NULL,
  OWNER_EVENT_ID bigint unsigned DEFAULT NULL,
  PRIMARY KEY (OBJECT_INSTANCE_BEGIN),
  KEY OBJECT_TYPE (OBJECT_TYPE,OBJECT_SCHEMA,OBJECT_NAME,COLUMN_NAME),
  KEY OWNER_THREAD_ID (OWNER_THREAD_ID,OWNER_EVENT_ID)
) ENGINE=InnoDB;

use test;
insert into metadata_locks select * from performance_schema.metadata_locks where OBJECT_SCHEMA <> 'test' or OBJECT_NAME in ('t1','t2','t3');
-- I'm filtering out some rows that are product of the `insert into ... select` query that we are running to populate the test table. This isn't needed, but it's done for clarity, as both tables will have the same rows

-- Get outputs from InnoDB table created as copy, to see how it doesn't need to use a temp table

select * from test.metadata_locks mdl_blocker;

explain select mdl_blocker.owner_thread_id
from test.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
where mdl_blocker.object_type='TABLE'
  and mdl_blocker.lock_type='SHARED_READ'
  and mdl_blocker.lock_status='GRANTED'
group by mdl_blocker.owner_thread_id;

flush status;
select mdl_blocker.owner_thread_id
from test.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
where mdl_blocker.object_type='TABLE'
  and mdl_blocker.lock_type='SHARED_READ'
  and mdl_blocker.lock_status='GRANTED'
group by mdl_blocker.owner_thread_id;

show status like 'handler%';

-- Get outputs from Performance Schema table, to see how it uses a temp table

select * from performance_schema.metadata_locks mdl_blocker;

explain select mdl_blocker.owner_thread_id
from performance_schema.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
where mdl_blocker.object_type='TABLE'
  and mdl_blocker.lock_type='SHARED_READ'
  and mdl_blocker.lock_status='GRANTED'
group by mdl_blocker.owner_thread_id;

flush status;
select mdl_blocker.owner_thread_id
from performance_schema.metadata_locks mdl_blocker force index (OWNER_THREAD_ID)
where mdl_blocker.object_type='TABLE'
  and mdl_blocker.lock_type='SHARED_READ'
  and mdl_blocker.lock_status='GRANTED'
group by mdl_blocker.owner_thread_id;

show status like 'handler%';

Suggested fix:
Implement index usage for group by in Performance Schema storage engine, or document why it's not possible to use it.

Hello Agustín,

Thank you for the report.

thanks,
Umesh

Thanks for the detailed report.

This is actually not a bug, explanations below.

---

When executing a query, the optimizer considers different execution plans, and based on costs estimates for each plan, picks the most efficient one.

Having an index allows the optimizer to consider it, but is by no mean a guarantee that the index WILL be used.

In particular, for a table with very few rows, the optimizer may still decide to perform a full table scan and apply a where clause, instead of using an index, even if a proper index is available.

The fact that the same query, even on the same data, can be executed with different plans using a performance schema table or an innodb table is due to the different costs estimates returned by each engines, which have different characteristics.

As long as the following conditions are true:
- the data returned is correct
- the execution time for the plan chosen by the optimizer is satisfactory
then there is no "bug" in the query execution.

Put another way, to have a confirmed bug:
- either the data returned is incorrect,
- or the execution time fails catastrophically, typically because a wrong execution plan was used, pointing to bad cost estimation.

This does not seem to be the case here.

ref
"When executing a query, the optimizer considers different execution plans, and based on costs estimates for each plan, picks the most efficient one."

I mean, yeah, most of the time it works alright, but it's no secret that it's an imperfect process with some guesswork - it doesn't always pick the most efficient one.  Else we wouldn't need fixes to the optimizer in every release, yeah?

So it sounds like the cost estimate figures for that engine are a bit off.  Which means this is still a bug, but the fix would be to review and tweak those, vs changing the optimizer code.  That sounds easier, the optimizer code is large and complex.

Don't see how "the solution is easier than assumed" makes this "not a bug" though.

Hi Marc,

I think we all very much welcomed indexes in P_S as a result of https://dev.mysql.com/worklog/task/?id=6616, in a hope that things will be now more efficient. Unfortunately, for some reason it appears to be not picking the indexes in all cases, as efficient as for InnoDB engine, which I guess we'd all benefit if could be improved.

Maybe the query provided in the test case does not meet this requirement:
"- or the execution time fails catastrophically, typically because a wrong execution plan was used, pointing to bad cost estimation."

It is true that for small number of rows the optimizer may choose table scan, but I can see the issue applies also to big number of rows as well.
The problem is that a similar P_S query may be fairly fast when there is no active locking situation in the instance. But for high concurrent environment with many tables, same query may become extremely slow.

Here is another query showing more significant speed difference between P_S and InnoDB when we have 10k (empty) tables opened in a few transactions. Interestingly, for this one EXPLAIN shows the index is used, but estimated rows is far different as well as the execution time is ~150x (!) faster on InnoDB. The only visible change in the plan is usage of "Using index condition" for InnoDB.

Test case:
$ for i in {1..10000}; do mysql test -e "create table if not exists t$i (id int primary key)"; done
$ echo "use test; begin;" > trx.sql
$ for i in {1..10000}; do echo "select * from t$i;" >> trx.sql; done 

-- session 1
$ mysql -A 
mysql > source trx.sql

-- session 2
$ mysql -A 
mysql > source trx.sql

-- session 3
$ mysql -A 
mysql > source trx.sql

-- session 4
$ mysql -A 
mysql > source trx.sql

-- Above sessions are kept open

-- session 5

mysql > select count(*) from performance_schema.metadata_locks\G
*************************** 1. row ***************************
count(*): 40809
1 row in set (0.03 sec)

mysql > select thd_blocker.processlist_id
    -> from performance_schema.metadata_locks mdl_blocker
    -> join performance_schema.metadata_locks mdl_exclusive
    -> on mdl_blocker.object_schema=mdl_exclusive.object_schema
    -> and mdl_blocker.object_name=mdl_exclusive.object_name
    -> and mdl_blocker.owner_thread_id!=mdl_exclusive.owner_thread_id
    -> join performance_schema.metadata_locks mdl_blocked
    -> on mdl_blocked.object_schema=mdl_exclusive.object_schema
    -> and mdl_blocked.object_name=mdl_exclusive.object_name
    -> and mdl_blocked.owner_thread_id!=mdl_exclusive.owner_thread_id
    -> join performance_schema.threads thd_blocker
    -> on thd_blocker.thread_id=mdl_blocker.owner_thread_id
    -> where mdl_exclusive.object_type='TABLE'
    -> and mdl_exclusive.lock_type='EXCLUSIVE'
    -> and mdl_exclusive.lock_status='PENDING'
    -> and mdl_blocker.object_type='TABLE'
    -> and mdl_blocker.lock_type='SHARED_READ'
    -> and mdl_blocker.lock_status='GRANTED'
    -> and mdl_blocked.object_type='TABLE'
    -> and mdl_blocked.lock_type='SHARED_READ'
    -> and mdl_blocked.lock_status='PENDING'
    -> group by mdl_blocker.owner_thread_id;
Empty set (1 min 47.90 sec)

Same query for InnoDB engine (while all the locking threads are still there):

mysql > truncate test.metadata_locks; insert into test.metadata_locks select * fro
m performance_schema.metadata_locks;
Query OK, 0 rows affected (0.05 sec)

Query OK, 40823 rows affected (1.87 sec)
Records: 40823  Duplicates: 0  Warnings: 0

mysql > select thd_blocker.processlist_id
    -> from test.metadata_locks mdl_blocker
    -> join test.metadata_locks mdl_exclusive
    -> on mdl_blocker.object_schema=mdl_exclusive.object_schema
    -> and mdl_blocker.object_name=mdl_exclusive.object_name
    -> and mdl_blocker.owner_thread_id!=mdl_exclusive.owner_thread_id
    -> join test.metadata_locks mdl_blocked
    -> on mdl_blocked.object_schema=mdl_exclusive.object_schema
    -> and mdl_blocked.object_name=mdl_exclusive.object_name
    -> and mdl_blocked.owner_thread_id!=mdl_exclusive.owner_thread_id
    -> join performance_schema.threads thd_blocker
    -> on thd_blocker.thread_id=mdl_blocker.owner_thread_id
    -> where mdl_exclusive.object_type='TABLE'
    -> and mdl_exclusive.lock_type='EXCLUSIVE'
    -> and mdl_exclusive.lock_status='PENDING'
    -> and mdl_blocker.object_type='TABLE'
    -> and mdl_blocker.lock_type='SHARED_READ'
    -> and mdl_blocker.lock_status='GRANTED'
    -> and mdl_blocked.object_type='TABLE'
    -> and mdl_blocked.lock_type='SHARED_READ'
    -> and mdl_blocked.lock_status='PENDING'
    -> group by mdl_blocker.owner_thread_id;
Empty set (0.72 sec)

EXPLAIN for P_S:

mysql > explain for connection 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mdl_blocker
   partitions: NULL
         type: ref
possible_keys: OBJECT_TYPE,OWNER_THREAD_ID
          key: OBJECT_TYPE
      key_len: 258
          ref: const
         rows: 10
     filtered: 1.00
        Extra: Using where; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: thd_blocker
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: performance_schema.mdl_blocker.OWNER_THREAD_ID
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: mdl_exclusive
   partitions: NULL
         type: ref
possible_keys: OBJECT_TYPE
          key: OBJECT_TYPE
      key_len: 776
          ref: const,performance_schema.mdl_blocker.OBJECT_SCHEMA,performance_schema.mdl_blocker.OBJECT_NAME
         rows: 147
     filtered: 0.90
        Extra: Using where
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: mdl_blocked
   partitions: NULL
         type: ref
possible_keys: OBJECT_TYPE
          key: OBJECT_TYPE
      key_len: 776
          ref: const,performance_schema.mdl_blocker.OBJECT_SCHEMA,performance_schema.mdl_blocker.OBJECT_NAME
         rows: 147
     filtered: 100.00
        Extra: Using where
4 rows in set (0.00 sec)

EXPLAIN for InnoDB:

mysql > explain for connection 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mdl_blocker
   partitions: NULL
         type: ref
possible_keys: OBJECT_TYPE,OWNER_THREAD_ID
          key: OBJECT_TYPE
      key_len: 258
          ref: const
         rows: 18995
     filtered: 1.00
        Extra: Using index condition; Using where; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: mdl_exclusive
   partitions: NULL
         type: ref
possible_keys: OBJECT_TYPE
          key: OBJECT_TYPE
      key_len: 776
          ref: const,test.mdl_blocker.OBJECT_SCHEMA,test.mdl_blocker.OBJECT_NAME
         rows: 3
     filtered: 1.31
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: mdl_blocked
   partitions: NULL
         type: ref
possible_keys: OBJECT_TYPE
          key: OBJECT_TYPE
      key_len: 776
          ref: const,test.mdl_blocker.OBJECT_SCHEMA,test.mdl_blocker.OBJECT_NAME
         rows: 3
     filtered: 1.31
        Extra: Using where
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: thd_blocker
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: test.mdl_blocker.OWNER_THREAD_ID
         rows: 1
     filtered: 100.00
        Extra: NULL
4 rows in set (0.00 sec)

Tested on MySQL 8.0.22.

Hi Przemyslaw,

Thank you for the more detailed query and use case.

Based on this, the bug is now verified then.

A couple of remarks:

To analyze, but I think the issue is not really whether to force the optimizer to use an index or not, but rather the execution cost of an actual index fetch in the performance schema.

For a "real" hash index, one can expect a cost in O(1).
For a "real" tree index, one can expect a cost in O(LOG(N))
For a performance schema index (definitively fake then),
the code internally scans the whole table to filter rows.
The filter is simple, but the internal complexity is then O(N) for each fetch,
which makes a full index scan O(N^2) per table.

Performance schema indexes are good to support simple PK or FK lookups,
but fail here when used to implement a join.

With table test.metadata_locks joined 3 times, and N=40K, this is now a problem.

Regards,
-- Marc Alff