Bug #98287 Explanation of hash joins is inconsistent across EXPLAIN formats
Submitted: 19 Jan 2020 22:17 Modified: 20 Jan 2020 15:39
Reporter: Saverio Miroddi Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.19 OS:Any
Assigned to: CPU Architecture:Any

[19 Jan 2020 22:17] Saverio Miroddi
Description:
When a hash join is performed, EXPLAIN shows inconsistent information about the joins strategy, across the different formats.

How to repeat:
```
CREATE TABLE hj1 (c1 INT, c2 INT);
CREATE TABLE hj2 (c1 INT, c2 INT);

INSERT INTO hj1 VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5);
INSERT INTO hj2 SELECT * FROM hj1;

# Correct

EXPLAIN FORMAT=TREE SELECT * FROM hj1 JOIN hj2 USING (c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (hj2.c1 = hj1.c1)  (cost=3.50 rows=5)
    -> Table scan on hj2  (cost=0.07 rows=5)
    -> Hash
        -> Table scan on hj1  (cost=0.75 rows=5)

# Wrong

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | hj1   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | NULL                                               |
|  1 | SIMPLE      | hj2   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |    20.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+

# Wrong

EXPLAIN FORMAT=JSON SELECT * FROM hj1 JOIN hj2 USING (c1)\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3.50"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "hj1",
          "access_type": "ALL",
          "rows_examined_per_scan": 5,
          "rows_produced_per_join": 5,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "0.25",
            "eval_cost": "0.50",
            "prefix_cost": "0.75",
            "data_read_per_join": "80"
          },
          "used_columns": [
            "c1",
            "c2"
          ]
        }
      },
      {
        "table": {
          "table_name": "hj2",
          "access_type": "ALL",
          "rows_examined_per_scan": 5,
          "rows_produced_per_join": 5,
          "filtered": "20.00",
          "using_join_buffer": "Block Nested Loop",
          "cost_info": {
            "read_cost": "0.25",
            "eval_cost": "0.50",
            "prefix_cost": "3.50",
            "data_read_per_join": "80"
          },
          "used_columns": [
            "c1",
            "c2"
          ],
          "attached_condition": "(`temp`.`hj2`.`c1` = `temp`.`hj1`.`c1`)"
        }
      }
    ]
  }
}

Suggested fix:
Display "Inner Hash Join" in all the EXPLAIN formats.
[20 Jan 2020 13:27] MySQL Verification Team
Hello Mr. Saverio,

Thank you for your bug report.

It turns out that you are right.  I have tried three different types of output and this is what I have got:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	t1	NULL	ALL	NULL	NULL	NULL	NULL	10	100.00	NULL
1	SIMPLE	t2	NULL	ALL	NULL	NULL	NULL	NULL	10	10.00	Using where; Using join buffer (Block Nested Loop)

EXPLAIN
-> Inner hash join (t2.c1 = t1.c1)  (cost=11.50 rows=10)\n    -> Table scan on t2  (cost=0.04 rows=10)\n    -> Hash\n        -> Table scan on t1  (cost=1.25 rows=10)\n

EXPLAIN
{\n  "query_block": {\n    "select_id": 1,\n    "cost_info": {\n      "query_cost": "11.50"\n    },\n    "nested_loop": [\n      {\n        "table": {\n          "table_name": "t1",\n          "access_type": "ALL",\n          "rows_examined_per_scan": 10,\n          "rows_produced_per_join": 10,\n          "filtered": "100.00",\n          "cost_info": {\n            "read_cost": "0.25",\n            "eval_cost": "1.00",\n            "prefix_cost": "1.25",\n            "data_read_per_join": "160"\n          },\n          "used_columns": [\n            "c1",\n            "c2"\n          ]\n        }\n      },\n      {\n        "table": {\n          "table_name": "t2",\n          "access_type": "ALL",\n          "rows_examined_per_scan": 10,\n          "rows_produced_per_join": 10,\n          "filtered": "10.00",\n          "using_join_buffer": "Block Nested Loop",\n          "cost_info": {\n            "read_cost": "0.25",\n            "eval_cost": "1.00",\n            "prefix_cost": "11.50",\n            "data_read_per_join": "160"\n          },\n          "used_columns": [\n            "c1",\n            "c2"\n          ],\n          "attached_condition": "(`test`.`t2`.`c1` = `test`.`t1`.`c1`)"\n        }\n      }\n    ]\n  }\n}

It turns out that TREE format is wrong as the tables are too small for hash join to be used. Instead, nested loop join is used, which is quite evident.

Version 5.7 is not affected for very good reason.

Verified as reported.
[20 Jan 2020 15:37] Steinar Gunderson
This bug is a duplicate of #30444550, which will be fixed in 8.0.20.

FORMAT=TREE is (always) correct. There is no such thing as "too small" for
hash join.
[20 Jan 2020 15:39] Steinar Gunderson
Dup of #97299.
[21 Jan 2020 12:54] MySQL Verification Team
Thank you, Steinar.