MySQL Bugs: #97302: Force Hash Joins even if index is available

Bug #97302	Force Hash Joins even if index is available
Submitted:	20 Oct 2019 17:17	Modified:	21 Oct 2019 14:43
Reporter:	Tibor Korocz	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi, 

Optimiser only uses Hash Joins if there is no available index, but based on some testing Hash Join can be actually faster the normal joins based on indexes. 

An optimizer hint to force to use Hash Joins even if there are indexes would be a nice feature. 

How to repeat:
CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT ,
  `c1` int(11) NOT NULL DEFAULT '0',
  `c2` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

CREATE TABLE `t2` (
  `id` int(11) NOT NULL AUTO_INCREMENT ,
  `c1` int(11) NOT NULL DEFAULT '0',
  `c2` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

insert into t1 values (1,1,1),(2,2,2),(3,3,3),(4,4,4);
insert into t2 values (1,1,1),(2,2,2),(3,3,3),(4,4,4);

Repeat the following queries until tables get big enough:
INSERT INTO t1 (c1,c2) SELECT FLOOR( 1 + RAND( ) * 1000 ),FLOOR( 1 + RAND( ) * 1000 ) FROM t1; 
INSERT INTO t2 (c1,c2) SELECT FLOOR( 1 + RAND( ) * 1000 ),FLOOR( 1 + RAND( ) * 1000 ) FROM t2; 

Query with Hash Join:

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (0.73 sec)

Query based on indexes:

create index idx_c2 on t1(c2);
create index idx_c2 on t2(c2); 
mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (2.62 sec)

As we can see Hash Join takes 0.73s while normal Join takes 2.6s

Suggested fix:
Add an optimiser hint to force to use Hash Joins.

Hello Tibor Korocz,

Thank you for the feature request!

regards,
Umesh

Hi, 

Some extra informations, I was able to force the optimiser to use Hash Joins even if an index is available by using ignore index:

mysql> explain format=tree select count(*) from t1 ignore index (idx_c2) join t2 ignore index (idx_c2) on t1.c2 = t2.c2 where t1.c2=t2.c2\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=17336898)
-> Table scan on t2 (cost=0.00 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)
 
1 row in set (0.00 sec)
 
I still think would be nice if I can tell to the optimiser with a hint to use Hash Joins even if an index is available, so we do not have to ignore indexes on all the tables manually.