Bug #78029 MISSING ROWS FROM RESULT OF SEMIJOIN WITH UTF8 VARCHAR AND STRAIGHT_JOIN
Submitted: 11 Aug 2015 19:30 Modified: 26 Aug 2015 13:20
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.5 OS:Any
Assigned to: CPU Architecture:Any

[11 Aug 2015 19:30] John Embretsen
Description:
The following query returns wrong results with default optimizer settings in 5.7.5 and above:

SELECT *
FROM where_subselect_25321
WHERE (field1, col_varchar_255_utf8_key) IN (
  SELECT
    STRAIGHT_JOIN table2.col_varchar_255_utf8_key AS field1,
    table2.col_varchar_255_utf8_key
  FROM t1 AS table1
    LEFT JOIN t1 AS table2
    ON table1.col_varchar_255_latin1_key > table2.col_varchar_10_utf8_key
);

Results with 5.7.4 or earlier, and with 5.7.5 or later when STRAIGHT_JOIN is removed or optimizer_switch='materialization=off':

+--------+--------------------------+
| field1 | col_varchar_255_utf8_key |
+--------+--------------------------+
| so     | so                       |
| PKRWA  | PKRWA                    |
| a      | a                        |
+--------+--------------------------+
3 rows in set (0.00 sec)

Results with above query with 5.7.5 or later using default settings:

+--------+--------------------------+
| field1 | col_varchar_255_utf8_key |
+--------+--------------------------+
| a      | a                        |
+--------+--------------------------+
1 row in set (0.01 sec)

Two rows are missing.

Verified against
  mysql-5.7.5 DMR,
  mysql-5.7 commit 476a6d7 (Aug 07) and
  mysql-trunk commit c8c9abc (Aug 10)

Not reproducible against 5.7.4 DMR, 5.6 nor 5.5.

How to repeat:
CREATE TABLE t1 (
  col_varchar_255_latin1_key varchar(255) DEFAULT NULL,
  col_varchar_255_utf8_key varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  col_varchar_10_utf8_key varchar(10) CHARACTER SET utf8 DEFAULT NULL,
  KEY col_varchar_255_latin1_key (col_varchar_255_latin1_key),
  KEY col_varchar_255_utf8_key (col_varchar_255_utf8_key),
  KEY col_varchar_10_utf8_key (col_varchar_10_utf8_key)
);

INSERT INTO t1 VALUES ('f','so','had');
INSERT INTO t1 VALUES ('back','PKRWA','sxa');
INSERT INTO t1 VALUES ('WQHYL','a','just');

SELECT
    STRAIGHT_JOIN table2.col_varchar_255_utf8_key AS field1,
    table2.col_varchar_255_utf8_key
FROM t1 AS table1
  LEFT JOIN t1 AS table2
  ON table1.col_varchar_255_latin1_key > table2.col_varchar_10_utf8_key
;

CREATE TABLE where_subselect_25321
  SELECT
    STRAIGHT_JOIN table2.col_varchar_255_utf8_key AS field1,
    table2.col_varchar_255_utf8_key
  FROM t1 AS table1
    LEFT JOIN t1 AS table2
    ON table1.col_varchar_255_latin1_key > table2.col_varchar_10_utf8_key
;

SELECT *
FROM where_subselect_25321
WHERE (field1, col_varchar_255_utf8_key) IN (
  SELECT
    STRAIGHT_JOIN table2.col_varchar_255_utf8_key AS field1,
    table2.col_varchar_255_utf8_key
  FROM t1 AS table1
    LEFT JOIN t1 AS table2
    ON table1.col_varchar_255_latin1_key > table2.col_varchar_10_utf8_key
);

-- Then repeat the same query without STRAIGHT_JOIN

-- Then repeat the same query after setting:
SET optimizer_switch='materialization=off';
[26 Aug 2015 13:20] Erlend Dahl
Fixed as

Bug#21055139 SUBQUERY HAVING COUNT WITH GROUP BY GIVES INCORRECT RESULT
[4 Feb 2016 19:48] Paul DuBois
Noted in 5.7.9, 5.8.0 changelogs.

Subqueries having COUNT() with GROUP BY could yield incorrect
results.