MySQL Bugs: #97979: MySQL Feature Request: hash joins should be able to use an index

Bug #97979	MySQL Feature Request: hash joins should be able to use an index
Submitted:	13 Dec 2019 5:15	Modified:	12 Feb 2020 0:50
Reporter:	희윤 윤	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	8.0.18	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	hash, hash join, hashjoin, parallel

Description:
In version 8.0.18 we tested the function of hash join and parallels.

But hash join can only be done when full scan.

Nested loop joins with index were better than hash joins.

It is essential to modify the function so that you can use index in the hash join.

For parallel , use innodb_parallel_read_threads to set 8 and 1

For select count(*) from table (4.7 billion), there was no difference in speed, and 8 may be slower.

For table management of bulk data, parallels must be performed in various places, such as create index.

How to repeat:
develop these two features

Suggested fix:
develop these two features

Please make a separate request for each request/bug you want to bring to our attention. I will change this issue into the 'Hash Join' request.

Please create a separate bug for the performance issue/difference you observed with innodb parallel read threads.

Here's a new description of the issue, limiting to the hash join request

# Description:
In version 8.0.18 we tested the function of hash join.

But hash join can only be done when doing full table scan.
Nested loop joins with indexes were better than hash joins.

# How to repeat:

__missing__

# Suggested fix:

Add support to Hash Joins to support indexes

Hello  희윤 윤: 

Can you give the example of your queries where nested loop was faster than hash joins?

Please provide all the steps (with it's output) you took to reproduce this behavior.

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Inner hash join (i.ITEM_ID = b.ITEM_ID) (cost=329069752927261.88 rows=79082838)

-> Table scan on I (cost=4.17 rows=42102214)

-> Hash

-> Index scan on B using DISP_CTG_ITEM_IX03 (cost=8301335.84 rows=78159647)

|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT /*+ HASH_JOIN(I,B) */ COUNT(*) FROM ITEM I IGNORE INDEX(PRIMARY) JOIN DISP_CTG_ITEM B IGNORE INDEX( DISP_CTG_ITEM_IX04, DISP_CTG_ITEM_IX02) ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (8 min 6.89 sec)

mysql> explain format=tree SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Aggregate: count(0)

-> Nested loop inner join (cost=47335893.88 rows=107230050)

-> Index scan on I using PRIMARY (cost=4742342.98 rows=42102214)

-> Index lookup on B using DISP_CTG_ITEM_IX04 (ITEM_ID=i.ITEM_ID) (cost=0.76 rows=3)

|

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (5 min 44.29 sec)

Hi,

Thank you for your bug report.

Can you prove your claim about the inefficiency of hash joins by some solid benchmarking ????

hi Sinisa Milivojevic

I left a SQL plan and execution result in the previous comment.

Hello  희윤 윤: 

Can you give the example of your queries where nested loop was faster
than hash joins?

Please provide all the steps (with it's output) you took to reproduce
this behavior.

We have taken a look at what you have sent us and did not find a significant difference in the speed. Optimiser can not predict small differences in the execution speed, which is why we have optimiser switches, options and hints .....

So far, I do not see any bug here.

Please do pay attention to what our Kenny Grip has written in his comments.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Inner hash join (i.ITEM_ID = b.ITEM_ID) (cost=329069752927261.88 rows=79082838)

-> Table scan on I (cost=4.17 rows=42102214)

-> Hash

-> Index scan on B using DISP_CTG_ITEM_IX03 (cost=8301335.84 rows=78159647)

|

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT /*+ HASH_JOIN(I,B) */ COUNT(*) FROM ITEM I IGNORE INDEX(PRIMARY) JOIN DISP_CTG_ITEM B IGNORE INDEX( DISP_CTG_ITEM_IX04, DISP_CTG_ITEM_IX02) ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (8 min 6.89 sec)

mysql> explain format=tree SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| EXPLAIN |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| -> Aggregate: count(0)

-> Nested loop inner join (cost=47335893.88 rows=107230050)

-> Index scan on I using PRIMARY (cost=4742342.98 rows=42102214)

-> Index lookup on B using DISP_CTG_ITEM_IX04 (ITEM_ID=i.ITEM_ID) (cost=0.76 rows=3)

|

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM ITEM I JOIN DISP_CTG_ITEM B ON I.ITEM_ID = B.ITEM_ID;

+----------+

| COUNT(*) |

+----------+

| 71839152 |

+----------+

1 row in set (5 min 44.29 sec)

Hi,

Due to a bug in 8.0.18, can you please use only table format in your explain output ???

Thank you very much in advance,

CREATE TABLE `item` (
  `ITEM_ID` varchar(13) NOT NULL ,
  `ITEM_REG_DIV_CD` varchar(2) DEFAULT NULL ,
  `SPL_VEN_ID` varchar(10) DEFAULT NULL ,
  `LRNK_SPL_VEN_ID` varchar(10) DEFAULT NULL ,
  `BRAND_ID` varchar(10) DEFAULT NULL ,
  `EXUS_ITEM_DIV_CD` varchar(2) DEFAULT NULL ,
  `EXUS_ITEM_DTL_CD` varchar(2) DEFAULT NULL ,
  `WHOUT_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `FRG_SHPP_PSBL_YN` varchar(1) DEFAULT NULL ,
  `FRG_SHPP_ITEMGR_WGT_CRITN_ID` varchar(10) DEFAULT NULL ,
  `ITEM_TOT_WGT` decimal(10,2) DEFAULT NULL ,
  `L_CAPA_YN` varchar(1) DEFAULT NULL ,
  `SHPP_MAIN_CD` varchar(6) DEFAULT NULL ,
  `SHPP_MTHD_CD` varchar(6) DEFAULT NULL ,
  `DELI_PSBL_YN` varchar(1) DEFAULT NULL ,
  `STD_CTG_ID` varchar(10) DEFAULT NULL ,
  `ITEM_SELL_TYPE_CD` varchar(2) DEFAULT NULL ,
  `ITEM_SELL_TYPE_DTL_CD` varchar(2) DEFAULT NULL ,
  `SPE_SALESTR_NO` varchar(4) DEFAULT NULL ,
  `CPN_APL_EXCL_YN` varchar(1) DEFAULT NULL ,
  `UNCPN_APL_EXCL_YN` varchar(1) DEFAULT NULL ,
  `B2E_PRC_MNG_MTHD_CD` varchar(2) DEFAULT NULL ,
  `ITEM_REG_DTL_CD` varchar(2) DEFAULT NULL ,
  `WHOUT_ADDR_ID` varchar(10) DEFAULT NULL ,
  `SNBK_ADDR_ID` varchar(10) DEFAULT NULL ,
  `RET_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `EXCH_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `ITEM_CHRCT_DIV_CD` varchar(2) DEFAULT NULL ,
  `ISMTAR_ADD_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `JEJU_ADD_SHPPCST_ID` varchar(10) DEFAULT NULL ,
  `TXN_DIV_CD` varchar(6) DEFAULT NULL ,
  `PERDC_SHPP_TGT_YN` varchar(1) DEFAULT NULL ,
  PRIMARY KEY (`ITEM_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
;

CREATE TABLE `disp_ctg_item` (
  `DISP_CTG_ID` varchar(10) NOT NULL ,
  `ITEM_ID` varchar(13) NOT NULL ,
  `DISP_ORDR` decimal(8,0) DEFAULT NULL ,
  `REP_DISP_ORDR` decimal(8,0) DEFAULT NULL ,
  PRIMARY KEY (`DISP_CTG_ID`,`ITEM_ID`),
  KEY `DISP_CTG_ITEM_IX02` (`ITEM_ID`,`DISP_CTG_ID`),
  KEY `DISP_CTG_ITEM_IX03` (`DISP_CTG_ID`,`DISP_ORDR`,`ITEM_ID`),
  KEY `DISP_CTG_ITEM_IX04` (`ITEM_ID`,`REP_DISP_ORDR`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
;

Thank you.

However, what we asked for is to have EXPLAIN in the table format.

We did not ask for table definitions.

Thanks in advance.