Bug #97979 MySQL Feature Request: hash joins should be able to use an index
Submitted: 13 Dec 2019 5:15 Modified: 7 Jan 13:41
Reporter: 희윤 윤 Email Updates:
Status: Need Feedback Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.18 OS:Any
Assigned to: CPU Architecture:Any
Tags: hash, hash join, hashjoin, parallel

[13 Dec 2019 5:15] 희윤 윤
Description:
In version 8.0.18 we tested the function of hash join and parallels.

But hash join can only be done when full scan.

Nested loop joins with index were better than hash joins.

It is essential to modify the function so that you can use index in the hash join.

For parallel , use innodb_parallel_read_threads to set 8 and 1

For select count(*) from table (4.7 billion), there was no difference in speed, and 8 may be slower.

For table management of bulk data, parallels must be performed in various places, such as create index.

How to repeat:
develop these two features

Suggested fix:
develop these two features
[16 Dec 2019 11:52] Kenny Gryp
Please make a separate request for each request/bug you want to bring to our attention. I will change this issue into the 'Hash Join' request.

Please create a separate bug for the performance issue/difference you observed with innodb parallel read threads.
[16 Dec 2019 11:54] Kenny Gryp
Here's a new description of the issue, limiting to the hash join request

# Description:
In version 8.0.18 we tested the function of hash join.

But hash join can only be done when doing full table scan.
Nested loop joins with indexes were better than hash joins.

# How to repeat:

__missing__

# Suggested fix:

Add support to Hash Joins to support indexes
[16 Dec 2019 11:56] Kenny Gryp
Hello  희윤 윤: 

Can you give the example of your queries where nested loop was faster than hash joins?

Please provide all the steps (with it's output) you took to reproduce this behavior.
[18 Dec 2019 0:44] 희윤 윤
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Inner hash join (i.ITEM_ID = b.ITEM_ID) (cost=329069752927261.88 rows=79082838)

-> Table scan on I (cost=4.17 rows=42102214)

-> Hash

-> Index scan on B using DISP_CTG_ITEM_IX03 (cost=8301335.84 rows=78159647)

|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT /*+ HASH_JOIN(I,B) */ COUNT(*) FROM ITEM I IGNORE INDEX(PRIMARY) JOIN DISP_CTG_ITEM B IGNORE INDEX( DISP_CTG_ITEM_IX04, DISP_CTG_ITEM_IX02) ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (8 min 6.89 sec)

mysql> explain format=tree SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Aggregate: count(0)

-> Nested loop inner join (cost=47335893.88 rows=107230050)

-> Index scan on I using PRIMARY (cost=4742342.98 rows=42102214)

-> Index lookup on B using DISP_CTG_ITEM_IX04 (ITEM_ID=i.ITEM_ID) (cost=0.76 rows=3)

|

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (5 min 44.29 sec)
[18 Dec 2019 15:19] Sinisa Milivojevic
Hi,

Thank you for your bug report.

Can you prove your claim about the inefficiency of hash joins by some solid benchmarking ????
[23 Dec 2019 6:15] 희윤 윤
hi Sinisa Milivojevic

I left a SQL plan and execution result in the previous comment.
[7 Jan 13:41] Sinisa Milivojevic
Hello  희윤 윤: 

Can you give the example of your queries where nested loop was faster
than hash joins?

Please provide all the steps (with it's output) you took to reproduce
this behavior.

We have taken a look at what you have sent us and did not find a significant difference in the speed. Optimiser can not predict small differences in the execution speed, which is why we have optimiser switches, options and hints .....

So far, I do not see any bug here.

Please do pay attention to what our Kenny Grip has written in his comments.