| Bug #99244 | bad performance in hash join because join with no join condition between tables | ||
|---|---|---|---|
| Submitted: | 14 Apr 2020 3:22 | Modified: | 21 Apr 2020 16:24 |
| Reporter: | chunyang xu | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
| Version: | 8.0.19 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | bad performance, Cartesian Product temp result., hash join | ||
[14 Apr 2020 13:15]
MySQL Verification Team
Hi Mr. xu,
Thank you for your bug report.
I have run your test case, in several different EXPLAIN formats. It is clear to me that you are not actually correct.
In the first case, we have a linear relationship T1 -> T2 -> T3.
This relationship had to change, since you have added rows to T2, which has no indices. That has changed a relationship from a linear one to the hierarchical one. Hence, what you have now is T2 ->T3 and T2->T1, without any interaction between T1 and T3. Looks something like this:
T2
/ \
T3 T1
So, everything is fine and this is not a bug.
[17 Apr 2020 13:29]
chunyang xu
hell , if you will, you can do a lab like this .
insert into t1 values('a0','name0','addr0');
insert into t1 values('a1','name1','addr1');
insert into t1 values('a2','name2','addr2');
.........
insert into t1 values('a998','name998','addr998');
insert into t1 values('a999','name999','addr999');
T1 table has 1000 records.
then insert recort into t2, 50000 records.
insert into t2 values('a0','name0','addr0');
insert into t2 values('a1','name1','addr1');
insert into t2 values('a2','name2','addr2');
insert into t2 values('a3','name3','addr3');
insert into t2 values('a4','name4','addr4');
....................................
......................
insert into t2 values('a49997','name49997','addr49997');
insert into t2 values('a49998','name49998','addr49998');
insert into t2 values('a49999','name49999','addr49999');
and then for T3, total 4000 record.
insert into t3 values('a0','name0','addr0');
insert into t3 values('a1','name1','addr1');
insert into t3 values('a2','name2','addr2');
insert into t3 values('a3','name3','addr3');
insert into t3 values('a4','name4','addr4');
insert into t3 values('a5','name5','addr5');
..................................
............................
insert into t3 values('a3997','name3997','addr3997');
insert into t3 values('a3998','name3998','addr3998');
insert into t3 values('a3999','name3999','addr3999');
then test the following SQL
first SQL
select count(*) from t1 straight_join t2 straight_join t3 where t1_id=t2_id and t2_addr=t3_addr ;
mysql> select count(*) from t1 straight_join t2 straight_join t3 where t1_id=t2_id and t2_addr=t3_addr ;
+----------+
| count(*) |
+----------+
| 1021 |
+----------+
1 row in set (1.57 sec)
secord SQL
select count(*) from t1 , t2 ,t3 where t1_id=t2_id and t2_addr=t3_addr ;
must wait a long time before the result come out.
[21 Apr 2020 16:24]
MySQL Verification Team
Here is a testcase:
--
drop table if exists t1,t2,t3;
create table t1 ( t1_id varchar(20),t1_name varchar(20) ,t1_addr varchar(20));
create table t2 ( t2_id varchar(20),t2_name varchar(20) ,t2_addr varchar(20));
create table t3 ( t3_id varchar(20),t3_name varchar(20) ,t3_addr varchar(20));
insert into t1 values('a0','name0','addr0'),('a1','name1','addr1'),('a2','name2','addr2');
set @n:=3;
insert into t1 select concat('a',@n:=@n+1),concat('name',@n),concat('addr',@n) from t1 a,t1 b,t1 c,t1 d,t1 e;
insert into t1 select concat('a',@n:=@n+1),concat('name',@n),concat('addr',@n) from t1 a,t1 b,t1 c limit 9150;
select count(*) from t1;
insert into t2 values('a0','name0','addr0'),('a1','name1','addr1'),('a2','name2','addr2');
set @n:=3;
insert into t2 select concat('a',@n:=@n+1),concat('name',@n),concat('addr',@n) from t2 a,t2 b,t2 c,t2 d,t2 e;
insert into t2 select concat('a',@n:=@n+1),concat('name',@n),concat('addr',@n) from t2 a,t2 b,t2 c limit 497540;
select count(*) from t2;
insert into t3 values('a0','name0','addr0'),('a1','name1','addr1'),('a2','name2','addr2');
set @n:=3;
insert into t3 select concat('a',@n:=@n+1),concat('name',@n),concat('addr',@n) from t3 a,t3 b,t3 c,t3 d,t3 e;
insert into t3 select concat('a',@n:=@n+1),concat('name',@n),concat('addr',@n) from t3 a,t3 b,t3 c limit 37540;
select count(*) from t3;
analyze table t1,t2,t3;
explain select count(*) from t1 straight_join t2 straight_join t3 where t1_id=t2_id and t2_addr=t3_addr ;
explain analyze select count(*) from t1 straight_join t2 straight_join t3 where t1_id=t2_id and t2_addr=t3_addr ;
flush status;
select count(*) from t1 straight_join t2 straight_join t3 where t1_id=t2_id and t2_addr=t3_addr ;
show status like 'handler%';
explain select count(*) from t1 , t2 ,t3 where t1_id=t2_id and t2_addr=t3_addr ;
explain analyze select count(*) from t1 , t2 ,t3 where t1_id=t2_id and t2_addr=t3_addr ;
flush status;
select count(*) from t1 , t2 ,t3 where t1_id=t2_id and t2_addr=t3_addr ;
show status like 'handler%';
---
I will attach the outputs shortly.
[21 Apr 2020 16:46]
MySQL Verification Team
output of testcase on 8.0.19
Attachment: o.txt (text/plain), 8.61 KiB.
[21 Apr 2020 17:23]
MySQL Verification Team
what the "perf" looks like for the explain analyze and select queries
Attachment: perf_profile_8.0.19.txt (text/plain), 18.31 KiB.

Description: low performance in hash join because join with no join condition between tables, Temporary rows of Cartesian product quantity causes disk space to run out。 How to repeat: create table t1 ( t1_id varchar(20),t1_name varchar(20) ,t1_addr varchar(20)); create table t2 ( t2_id varchar(20),t2_name varchar(20) ,t2_addr varchar(20)); create table t3 ( t3_id varchar(20),t3_name varchar(20) ,t3_addr varchar(20)); insert into t1 values('a','a','a'); insert into t2 values('a','a','a'); insert into t3 values('a','a','a'); then execute the following SQL: select * from t1,t2,t3 where t1_id=t2_id and t2_addr=t3_addr ; the SQL plan as bollowed -> Inner hash join (t3.t3_addr = t2.t2_addr) (cost=1.05 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Inner hash join (t2.t2_id = t1.t1_id) (cost=0.70 rows=1) -> Table scan on t2 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1) the hash join order of SQL plan is "T1->T2->T3" ,it normal join order. and then , insert more rows into T2, and the SQL plan is changed. insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); insert into t2 values('a','a','a'); -> Inner hash join (t2.t2_addr = t3.t3_addr), (t2.t2_id = t1.t1_id) (cost=2.05 rows=1) -> Table scan on t2 (cost=0.35 rows=11) -> Hash -> Inner hash join (cost=0.70 rows=1) -> Table scan on t3 (cost=0.35 rows=1) -> Hash -> Table scan on t1 (cost=0.35 rows=1) the join order is "T1->T3->T2", but no join condition between T1 and T3, it will product lots of Temporary rows , if t1 and t3 both have much rows , it causes disk space to run out。 Suggested fix: Avoid joins without join conditions between more that two tables join