Bug #100783 wrong result with hash join
Submitted: 9 Sep 2020 12:29 Modified: 10 Sep 2020 4:50
Reporter: lou shuai (OCA) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.19 OS:Any
Assigned to: CPU Architecture:Any
Tags: hash join

[9 Sep 2020 12:29] lou shuai
Description:
Hash Join iterator combine Weedout Iterator will generate wrong result.

How to repeat:
CREATE TABLE a (
  pk INTEGER ,
  col_varchar VARCHAR(1),
  col_varchar_key VARCHAR(1)
)
partition by hash(pk) partitions 11;
CREATE TABLE b (
  pk INTEGER NOT NULL AUTO_INCREMENT,
  col_varchar VARCHAR(1),
  col_varchar_key VARCHAR(1),
  PRIMARY KEY (pk),
  KEY varchar_key (col_varchar_key)
) 
partition by key() partitions 11;
INSERT INTO a VALUES (1, 'N', '0'), (1, 'N', '0');
INSERT INTO b VALUES (1, '8', 'r'), (2, 'v', 'C'), (3, 'b', 'p'), (4, '7', 'W');

SELECT 1 FROM (b AS table1
    INNER JOIN a  AS table2 ON table2.pk = table1.pk OR table1.col_varchar < 'D')
    WHERE (NOT EXISTS
      (SELECT 1 FROM (b AS alias3 STRAIGHT_JOIN a AS alias4
        ON alias4.col_varchar = alias3.col_varchar_key)
        WHERE alias3.pk >= table1.pk));

Suggested fix:
the qep_tab's rowid_status should set to NEED_TO_CALL_POSITION_FOR_ROWID before HashJoinIterator Constructor is called.
[9 Sep 2020 13:49] lou shuai
hash join bug test file

Attachment: hj_bug.test (application/octet-stream, text), 1.10 KiB.

[9 Sep 2020 14:11] lou shuai
plz ignore the hj_bug.test file. this patch file contains test file, result file. And i tried to fix it

Attachment: hash_join_weedout_patch.diff (application/octet-stream, text), 5.50 KiB.

[9 Sep 2020 15:43] Steinar Gunderson
Hi,

The test case in your .diff passes on 8.0.21 (as well as trunk), and it does not use weedout in either of the query plans. Neither does the one you initially posted.
[10 Sep 2020 3:36] lou shuai
hi,

You are right.
I find the 3e1f79bccfa7 and b64df9f33122 fixed this bug in mysql-8.0.20.

3e1f79bccfa7 fix the wrong rowid_status problem

```diff
@@ -1740,6 +1750,19 @@ static Substructure FindSubstructure(
     is_semijoin = false;
   }

+  // If we found any unhandled duplicates, mark in the QEP_TABs that a row ID is
+  // needed. This will notify iterators (e.g., HashJoinIterator) that they need
+  // to store and restore the row ID.
+  if (*unhandled_duplicates != 0) {
+    qep_tab_map table_range =
+        TablesBetween(first_idx, last_idx) & ~*unhandled_duplicates;
+    for (QEP_TAB *tab : TablesContainedIn(qep_tab->join(), table_range)) {
+      if (tab->rowid_status == NO_ROWID_NEEDED) {
+        tab->rowid_status = NEED_TO_CALL_POSITION_FOR_ROWID;
+      }
+    }
+  }
```

and b64df9f33122 modified the execute plan, use limit 1 iterator to remove the weedout iterator.
[10 Sep 2020 4:50] MySQL Verification Team
Thank you for the report.
If I understood Steinar correctly, this issue is already fixed and no longer seen in latest available GA builds.  Closing this as can't repeat for now. Thank you!

regards,
Umesh