Bug #118181 Performance regression on query statements after commit 0fa789a
Submitted: 13 May 12:36 Modified: 1 Jul 14:06
Reporter: jinze si Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S5 (Performance)
Version:8.0.42 OS:Linux
Assigned to: CPU Architecture:x86
Tags: performance regression

[13 May 12:36] jinze si
Description:
After commit 0fa789a, we see that mysql is experiencing performance regression when executing the query statement: 
```
SELECT DISTINCT u.username, COUNT(p.id) AS post_count, AVG(LENGTH(p.content)) AS avg_content_length FROM users u LEFT JOIN posts p ON u.id = p.user_id GROUP BY u.username HAVING COUNT(p.id) > 1 AND AVG(LENGTH(p.content)) > 100;
```

We find commit 0fa789a is used to fix incorrect behavior with EXPLAIN and subqueries. So we guess that this introduces some additional overhead for the query statement.

We tested it using mysqlslap with the following test statement:
```
 ./mysqlslap --concurrency=1 --iterations=1 --create-schema=test --query="$mysql_inst" -uroot -S $(pwd)/bin/mysql.sock --number-of-queries=100000
```
The result of the test statement is as follows:

Before 0fa789a: 
        Average number of seconds to run all queries: 791.807 seconds
        Minimum number of seconds to run all queries: 791.807 seconds
        Maximum number of seconds to run all queries: 791.807 seconds
        Number of clients running queries: 1
        Average number of queries per client: 100000

After 0fa789a: 
        Average number of seconds to run all queries: 828.848 seconds
        Minimum number of seconds to run all queries: 828.848 seconds
        Maximum number of seconds to run all queries: 828.848 seconds
        Number of clients running queries: 1
        Average number of queries per client: 100000

How to repeat:
We provide our pre-populated data and simply execute the query statement we provided above.