Bug #97302 Force Hash Joins even if index is available
Submitted: 20 Oct 2019 17:17 Modified: 21 Oct 2019 14:43
Reporter: Tibor Korocz Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D5 (Feature request)

[20 Oct 2019 17:17] Tibor Korocz
Description:
Hi, 

Optimiser only uses Hash Joins if there is no available index, but based on some testing Hash Join can be actually faster the normal joins based on indexes. 

An optimizer hint to force to use Hash Joins even if there are indexes would be a nice feature. 

How to repeat:
CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT ,
  `c1` int(11) NOT NULL DEFAULT '0',
  `c2` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

CREATE TABLE `t2` (
  `id` int(11) NOT NULL AUTO_INCREMENT ,
  `c1` int(11) NOT NULL DEFAULT '0',
  `c2` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

insert into t1 values (1,1,1),(2,2,2),(3,3,3),(4,4,4);
insert into t2 values (1,1,1),(2,2,2),(3,3,3),(4,4,4);

Repeat the following queries until tables get big enough:
INSERT INTO t1 (c1,c2) SELECT FLOOR( 1 + RAND( ) * 1000 ),FLOOR( 1 + RAND( ) * 1000 ) FROM t1; 
INSERT INTO t2 (c1,c2) SELECT FLOOR( 1 + RAND( ) * 1000 ),FLOOR( 1 + RAND( ) * 1000 ) FROM t2; 

Query with Hash Join:

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (0.73 sec)

Query based on indexes:

create index idx_c2 on t1(c2);
create index idx_c2 on t2(c2); 
mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (2.62 sec)

As we can see Hash Join takes 0.73s while normal Join takes 2.6s

Suggested fix:
Add an optimiser hint to force to use Hash Joins.
[21 Oct 2019 7:30] Umesh Shastry
Hello Tibor Korocz,

Thank you for the feature request!

regards,
Umesh
[21 Oct 2019 14:43] Tibor Korocz
Hi, 

Some extra informations, I was able to force the optimiser to use Hash Joins even if an index is available by using ignore index:

mysql> explain format=tree select count(*) from t1 ignore index (idx_c2) join t2 ignore index (idx_c2) on t1.c2 = t2.c2 where t1.c2=t2.c2\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=17336898)
-> Table scan on t2 (cost=0.00 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)
 
1 row in set (0.00 sec)
 
I still think would be nice if I can tell to the optimiser with a hint to use Hash Joins even if an index is available, so we do not have to ignore indexes on all the tables manually.