Bug #108621 mysql 8.0 Chapter (8.2.1.4 Hash Join Optimization) Document content error
Submitted: 27 Sep 2022 8:38 Modified: 6 Oct 2022 16:25
Reporter: fulang qin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:current OS:Any
Assigned to: CPU Architecture:Any
Tags: 8.0, document, hash join

[27 Sep 2022 8:38] fulang qin
Description:
In the current MySQL8.0 documentation, section 8.2.1.4, the paragraph "In MySQL 8.0.20 and later, it is no longer necessary for the join to contain at least one equi-join condition in order for a hash join to be used. This means that the types of queries which can be optimized using hash joins include those in the following list (with examples):".Hash join is used in the following cases. The EXPLAIN in the example of half-join and anti-join does not seem to be correct.
The original:
Semijoin:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t1 
    ->     WHERE t1.c1 IN (SELECT t2.c2 FROM t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join
    -> Filter: (t1.c1 is not null)  (cost=0.85 rows=6)
        -> Table scan on t1  (cost=0.85 rows=6)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (c2=t1.c1)
        -> Materialize with deduplication
            -> Filter: (t2.c2 is not null)  (cost=0.85 rows=6)
                -> Table scan on t2  (cost=0.85 rows=6)
Antijoin:
mysql> EXPLAIN FORMAT=TREE SELECT * FROM t2 
    ->     WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop antijoin
    -> Table scan on t2  (cost=0.85 rows=6)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (c1=t2.c1)
        -> Materialize with deduplication
            -> Filter: (t1.c1 is not null)  (cost=0.85 rows=6)
                -> Table scan on t1  (cost=0.85 rows=6)

The example given here does not seem to work with Hash Join
The actual local test is:
CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);

Semijoin:
mysql> explain format=tree select * from t1 where t1.c1 in (select t2.c2 from t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Hash semijoin (t2.c2 = t1.c1)  (cost=0.70 rows=1)
    -> Table scan on t1  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t2  (cost=0.35 rows=1)
Antijoin:
mysql> explain format=tree select * from t2 where not exists (select * from t1 where t1.c1 =t2.c1 )\G
*************************** 1. row ***************************
EXPLAIN: -> Hash antijoin (t1.c1 = t2.c1)  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t1  (cost=0.35 rows=1)

 I wonder if this document test does not use a version later than 8.0.20, or is it my own fault?

 

 

How to repeat:
CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);

Semijoin:
mysql> explain format=tree select * from t1 where t1.c1 in (select t2.c2 from t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Hash semijoin (t2.c2 = t1.c1)  (cost=0.70 rows=1)
    -> Table scan on t1  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t2  (cost=0.35 rows=1)
Antijoin:
mysql> explain format=tree select * from t2 where not exists (select * from t1 where t1.c1 =t2.c1 )\G
*************************** 1. row ***************************
EXPLAIN: -> Hash antijoin (t1.c1 = t2.c1)  (cost=0.70 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t1  (cost=0.35 rows=1)

Suggested fix:
Please test whether the problem is with the document
[27 Sep 2022 8:48] MySQL Verification Team
Hello fulang qin,

Thank you for the report and feedback.

regards,
Umesh
[6 Oct 2022 16:25] Jon Stephens
I have updated the example output using an 8.0.31 server and client.

Thanks!

Closed.