Description:
Query performance when using DISTINCT and LEFT JOIN on large tables has changed from 5.6 and 5.7 to 8.0.
It does not seem to be related with TempTable storage engine since when changing to MEMORY I've got same performance.
I'll upload the test case to reproduce this issue.
How to repeat:
Below you can see an example where the query drastically increases elapsed time on 8.0:
# 5.7
mysql [localhost:5727] {msandbox} (test) > source ./test_case.sql
...
mysql [localhost:5727] {msandbox} (test) > explain SELECT distinct t1.c1
-> FROM t1
-> LEFT JOIN t2
-> ON t1.c1 = t2.c1
-> LEFT JOIN t3
-> ON t1.c1 = t3.c1
-> WHERE t1.c2 = 15308 \g
+----+-------------+-------+------------+------+---------------+---------+---------+------------+--------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+------------+--------+----------+------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 194304 | 10.00 | Using where; Using temporary |
| 1 | SIMPLE | t2 | NULL | ref | PRIMARY | PRIMARY | 4 | test.t1.c1 | 14 | 100.00 | Using index; Distinct |
| 1 | SIMPLE | t3 | NULL | ref | PRIMARY | PRIMARY | 4 | test.t1.c1 | 65 | 100.00 | Using index; Distinct |
+----+-------------+-------+------------+------+---------------+---------+---------+------------+--------+----------+------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql [localhost:5727] {msandbox} (test) > pager md5sum ;
PAGER set to 'md5sum'
mysql [localhost:5727] {msandbox} (test) > SELECT distinct t1.c1
-> FROM t1
-> LEFT JOIN t2
-> ON t1.c1 = t2.c1
-> LEFT JOIN t3
-> ON t1.c1 = t3.c1
-> WHERE t1.c2 = 15308 \g
ee30c16e7b8af0c7e0403c19d86e477f -
1605 rows in set (0.05 sec)
# 8.0
mysql [localhost:4581] {msandbox} (test) > select @@version ;
+-----------+
| @@version |
+-----------+
| 8.0.21 |
+-----------+
1 row in set (0.00 sec)
mysql [localhost:4581] {msandbox} (test) > source ./test_case.sql
...
mysql [localhost:4581] {msandbox} (test) > explain SELECT distinct t1.c1
-> FROM t1
-> LEFT JOIN t2
-> ON t1.c1 = t2.c1
-> LEFT JOIN t3
-> ON t1.c1 = t3.c1
-> WHERE t1.c2 = 15308 \g
+----+-------------+-------+------------+------+---------------+---------+---------+------------+--------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+------------+--------+----------+------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 194304 | 10.00 | Using where; Using temporary |
| 1 | SIMPLE | t2 | NULL | ref | PRIMARY | PRIMARY | 4 | test.t1.c1 | 10 | 100.00 | Using index; Distinct |
| 1 | SIMPLE | t3 | NULL | ref | PRIMARY | PRIMARY | 4 | test.t1.c1 | 72 | 100.00 | Using index; Distinct |
+----+-------------+-------+------------+------+---------------+---------+---------+------------+--------+----------+------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql [localhost:4581] {msandbox} (test) > pager md5sum ;
PAGER set to 'md5sum'
mysql [localhost:4581] {msandbox} (test) > SELECT distinct t1.c1
-> FROM t1
-> LEFT JOIN t2
-> ON t1.c1 = t2.c1
-> LEFT JOIN t3
-> ON t1.c1 = t3.c1
-> WHERE t1.c2 = 15308 \g
ee30c16e7b8af0c7e0403c19d86e477f -
1605 rows in set (1 min 37.81 sec)