| Bug #116612 | Performance degradation of distinct operator from 5.7 to 8.0 | ||
|---|---|---|---|
| Submitted: | 11 Nov 2024 11:46 | Modified: | 11 Nov 2024 14:07 |
| Reporter: | Jingqi Tian (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
| Version: | 8.0.40, 8.0.41, 8.4.4 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | regression | ||
[11 Nov 2024 14:07]
MySQL Verification Team
Hello Jingqi Tian, Thank you for the report and test case. regards, Umesh
[7 May 7:17]
MySQL Verification Team
Hello Jingqi Tian, Internally development proposed a workaround, I'm quoting here for your reference. Posted By Developer ================== Given the query: select distinct t1.* from t1 left join t2 on t1.col1 = t2.col1 left join t3 on t1.col1 = t3.col1 left join t4 on t1.col1 = t4.col1; The only table referred to in the SELECT list and WHERE clause is t1. Since t1 is also the only table on the left side of a LEFT JOIN, we know the join will return all rows in t1, regardless of whether there are any matches in the other tables. Since DISTINCT is specified, we don't need to know how many rows match. So we can remove all the different tables and reduce the query to this: For example: select distinct t1.* from t1 left join t2 on t1.col1 = t2.col1 left join t3 on t1.col1 = t3.col1 left join t4 on t1.col1 = t4.col1; The result is the same as the result of select distinct t1.* from t1; Hence rest of the joins and table access are not needed in the given test case. regards, Umesh

Description: Performance degradation of distinct operator from 5.7 to 8.0. How to repeat: 1. Create table CREATE TABLE t1 ( id int PRIMARY KEY, col1 int ); CREATE TABLE t2 ( id int PRIMARY KEY, col1 int, col2 varchar(1000), col3 varchar(1000), col4 varchar(1000), KEY index_col1 (col1) ); CREATE TABLE t3 ( id int PRIMARY KEY, col1 int, col2 varchar(1000), col3 varchar(1000), col4 varchar(1000), KEY index_col1 (col1) ); CREATE TABLE t4 ( id int PRIMARY KEY, col1 int, col2 varchar(1000), col3 varchar(1000), col4 varchar(1000), KEY index_col1 (col1) ); 2. Create procedure CREATE PROCEDURE insert_data () BEGIN DECLARE num int; SET num = 1; WHILE num < 10000 DO INSERT INTO t1 VALUES (num, num / 100); IF num < 1000 THEN INSERT INTO t2 VALUES (num, num / 10, repeat('a', 1000), repeat('a', 1000), repeat('a', 1000)); INSERT INTO t3 VALUES (num, num / 10, repeat('a', 1000), repeat('a', 1000), repeat('a', 1000)); INSERT INTO t4 VALUES (num, num / 10, repeat('a', 1000), repeat('a', 1000), repeat('a', 1000)); END IF; SET num = num + 1; END WHILE; END; 3. Call procedure call insert_data(); 4. Execute the same sql on MySQL 5.7 and MySQL 8.0 select distinct t1.* from t1 left join t2 on t1.col1 = t2.col1 left join t3 on t1.col1 = t3.col1 left join t4 on t1.col1 = t4.col1; On MySQL 5.7, it takes 0.04s, while on MySQL 8.0, it takes 4.36s.