Bug #97979 MySQL Feature Request: hash joins should be able to use an index
Submitted: 13 Dec 2019 5:15 Modified: 12 Feb 2020 0:50
Reporter: 희윤 윤 Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.18 OS:Any
Assigned to: CPU Architecture:Any
Tags: hash, hash join, hashjoin, parallel

[13 Dec 2019 5:15] 희윤 윤
Description:
In version 8.0.18 we tested the function of hash join and parallels.

But hash join can only be done when full scan.

Nested loop joins with index were better than hash joins.

It is essential to modify the function so that you can use index in the hash join.

For parallel , use innodb_parallel_read_threads to set 8 and 1

For select count(*) from table (4.7 billion), there was no difference in speed, and 8 may be slower.

For table management of bulk data, parallels must be performed in various places, such as create index.

How to repeat:
develop these two features

Suggested fix:
develop these two features
[16 Dec 2019 11:52] Kenny Gryp
Please make a separate request for each request/bug you want to bring to our attention. I will change this issue into the 'Hash Join' request.

Please create a separate bug for the performance issue/difference you observed with innodb parallel read threads.
[16 Dec 2019 11:54] Kenny Gryp
Here's a new description of the issue, limiting to the hash join request

# Description:
In version 8.0.18 we tested the function of hash join.

But hash join can only be done when doing full table scan.
Nested loop joins with indexes were better than hash joins.

# How to repeat:

__missing__

# Suggested fix:

Add support to Hash Joins to support indexes
[16 Dec 2019 11:56] Kenny Gryp
Hello  희윤 윤: 

Can you give the example of your queries where nested loop was faster than hash joins?

Please provide all the steps (with it's output) you took to reproduce this behavior.
[18 Dec 2019 0:44] 희윤 윤
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Inner hash join (i.ITEM_ID = b.ITEM_ID) (cost=329069752927261.88 rows=79082838)

-> Table scan on I (cost=4.17 rows=42102214)

-> Hash

-> Index scan on B using DISP_CTG_ITEM_IX03 (cost=8301335.84 rows=78159647)

|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT /*+ HASH_JOIN(I,B) */ COUNT(*) FROM ITEM I IGNORE INDEX(PRIMARY) JOIN DISP_CTG_ITEM B IGNORE INDEX( DISP_CTG_ITEM_IX04, DISP_CTG_ITEM_IX02) ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (8 min 6.89 sec)

mysql> explain format=tree SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Aggregate: count(0)

-> Nested loop inner join (cost=47335893.88 rows=107230050)

-> Index scan on I using PRIMARY (cost=4742342.98 rows=42102214)

-> Index lookup on B using DISP_CTG_ITEM_IX04 (ITEM_ID=i.ITEM_ID) (cost=0.76 rows=3)

|

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (5 min 44.29 sec)
[18 Dec 2019 15:19] MySQL Verification Team
Hi,

Thank you for your bug report.

Can you prove your claim about the inefficiency of hash joins by some solid benchmarking ????
[23 Dec 2019 6:15] 희윤 윤
hi Sinisa Milivojevic

I left a SQL plan and execution result in the previous comment.
[7 Jan 2020 13:41] MySQL Verification Team
Hello  희윤 윤: 

Can you give the example of your queries where nested loop was faster
than hash joins?

Please provide all the steps (with it's output) you took to reproduce
this behavior.

We have taken a look at what you have sent us and did not find a significant difference in the speed. Optimiser can not predict small differences in the execution speed, which is why we have optimiser switches, options and hints .....

So far, I do not see any bug here.

Please do pay attention to what our Kenny Grip has written in his comments.
[8 Feb 2020 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[11 Feb 2020 0:16] 희윤 윤
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Inner hash join (i.ITEM_ID = b.ITEM_ID) (cost=329069752927261.88 rows=79082838)

-> Table scan on I (cost=4.17 rows=42102214)

-> Hash

-> Index scan on B using DISP_CTG_ITEM_IX03 (cost=8301335.84 rows=78159647)

|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT /*+ HASH_JOIN(I,B) */ COUNT(*) FROM ITEM I IGNORE INDEX(PRIMARY) JOIN DISP_CTG_ITEM B IGNORE INDEX( DISP_CTG_ITEM_IX04, DISP_CTG_ITEM_IX02) ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (8 min 6.89 sec)

mysql> explain format=tree SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Aggregate: count(0)

-> Nested loop inner join (cost=47335893.88 rows=107230050)

-> Index scan on I using PRIMARY (cost=4742342.98 rows=42102214)

-> Index lookup on B using DISP_CTG_ITEM_IX04 (ITEM_ID=i.ITEM_ID) (cost=0.76 rows=3)

|

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (5 min 44.29 sec)
[11 Feb 2020 13:24] MySQL Verification Team
Hi,

Due to a bug in 8.0.18, can you please use only table format in your explain output ???

Thank you very much in advance,
[12 Feb 2020 0:50] 희윤 윤
CREATE TABLE `item` (
  `ITEM_ID` varchar(13) NOT NULL ,
  `ITEM_REG_DIV_CD` varchar(2) DEFAULT NULL ,
  `SPL_VEN_ID` varchar(10) DEFAULT NULL ,
  `LRNK_SPL_VEN_ID` varchar(10) DEFAULT NULL ,
  `BRAND_ID` varchar(10) DEFAULT NULL ,
  `EXUS_ITEM_DIV_CD` varchar(2) DEFAULT NULL ,
  `EXUS_ITEM_DTL_CD` varchar(2) DEFAULT NULL ,
  `WHOUT_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `FRG_SHPP_PSBL_YN` varchar(1) DEFAULT NULL ,
  `FRG_SHPP_ITEMGR_WGT_CRITN_ID` varchar(10) DEFAULT NULL ,
  `ITEM_TOT_WGT` decimal(10,2) DEFAULT NULL ,
  `L_CAPA_YN` varchar(1) DEFAULT NULL ,
  `SHPP_MAIN_CD` varchar(6) DEFAULT NULL ,
  `SHPP_MTHD_CD` varchar(6) DEFAULT NULL ,
  `DELI_PSBL_YN` varchar(1) DEFAULT NULL ,
  `STD_CTG_ID` varchar(10) DEFAULT NULL ,
  `ITEM_SELL_TYPE_CD` varchar(2) DEFAULT NULL ,
  `ITEM_SELL_TYPE_DTL_CD` varchar(2) DEFAULT NULL ,
  `SPE_SALESTR_NO` varchar(4) DEFAULT NULL ,
  `CPN_APL_EXCL_YN` varchar(1) DEFAULT NULL ,
  `UNCPN_APL_EXCL_YN` varchar(1) DEFAULT NULL ,
  `B2E_PRC_MNG_MTHD_CD` varchar(2) DEFAULT NULL ,
  `ITEM_REG_DTL_CD` varchar(2) DEFAULT NULL ,
  `WHOUT_ADDR_ID` varchar(10) DEFAULT NULL ,
  `SNBK_ADDR_ID` varchar(10) DEFAULT NULL ,
  `RET_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `EXCH_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `ITEM_CHRCT_DIV_CD` varchar(2) DEFAULT NULL ,
  `ISMTAR_ADD_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `JEJU_ADD_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `TXN_DIV_CD` varchar(6) DEFAULT NULL ,
  `PERDC_SHPP_TGT_YN` varchar(1) DEFAULT NULL ,
  PRIMARY KEY (`ITEM_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
;

CREATE TABLE `disp_ctg_item` (
  `DISP_CTG_ID` varchar(10) NOT NULL ,
  `ITEM_ID` varchar(13) NOT NULL ,
  `DISP_ORDR` decimal(8,0) DEFAULT NULL ,
  `REP_DISP_ORDR` decimal(8,0) DEFAULT NULL ,
  PRIMARY KEY (`DISP_CTG_ID`,`ITEM_ID`),
  KEY `DISP_CTG_ITEM_IX02` (`ITEM_ID`,`DISP_CTG_ID`),
  KEY `DISP_CTG_ITEM_IX03` (`DISP_CTG_ID`,`DISP_ORDR`,`ITEM_ID`),
  KEY `DISP_CTG_ITEM_IX04` (`ITEM_ID`,`REP_DISP_ORDR`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
;
[12 Feb 2020 13:06] MySQL Verification Team
Thank you.

However, what we asked for is to have EXPLAIN in the table format.

We did not ask for table definitions.

Thanks in advance.