Bug #86610 Mysql 5.6 and 5.7.17 enterprise "semi join" results are different
Submitted: 7 Jun 2017 14:36 Modified: 14 Jun 2017 10:55
Reporter: Weihao Luo Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7 OS:Linux (centos7)
Assigned to: CPU Architecture:Any
Tags: semi join

[7 Jun 2017 14:36] Weihao Luo
Description:
The 5.7 version of the sub queries in the change to "semi join" after conversion results changed in version 5.6 , sub query in the change to "semi join" after conversion results have not changed
The 5.7 version of the modify Parameters "semijoin=off", the results of the query will return to normal.

How to repeat:
1. 5.7.17-enterprise Optimizer_switch Parameters
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on

2. Table Structure and Data Record
CREATE TABLE `items` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `short_description` varchar(255) DEFAULT NULL,
  `description` text,
  `example` text,
  `explanation` text,
  `additional` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;

CREATE TABLE `items_links` (
  `iid` int(11) DEFAULT NULL,
  `linkid` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;

INSERT INTO `test`.`items_links` (`iid`, `linkid`) VALUES ('1', '1');
INSERT INTO `test`.`items_links` (`iid`, `linkid`) VALUES ('2', '2');
INSERT INTO `test`.`items_links` (`iid`, `linkid`) VALUES ('3', '3');

INSERT INTO `test`.`items` (`id`, `short_description`, `description`, `example`, `explanation`, `additional`) VALUES ('1', 'aaaaa', NULL, NULL, NULL, NULL);
INSERT INTO `test`.`items` (`id`, `short_description`, `description`, `example`, `explanation`, `additional`) VALUES ('2', 'bbbbb', NULL, NULL, NULL, NULL);
INSERT INTO `test`.`items` (`id`, `short_description`, `description`, `example`, `explanation`, `additional`) VALUES ('3', 'ccccc', NULL, NULL, NULL, NULL);
INSERT INTO `test`.`items` (`id`, `short_description`, `description`, `example`, `explanation`, `additional`) VALUES ('4', 'ddddd', NULL, NULL, NULL, NULL);

3. Table items_links  has not id field name field name (only the iid field name), SQL syntax is error, and that the implementation of this return result is 12

EXPLAIN SELECT COUNT(*) FROM items WHERE id IN (SELECT id FROM items_links) ;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+

show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                   |
+-------+------+-----------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'test.items.id' of SELECT #2 was resolved in SELECT #1                                 |
| Note  | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `test`.`items` semi join (`test`.`items_links`) where 1 |
+-------+------+-----------------------------------------------------------------------------------------------------------+

SELECT COUNT(*) FROM items WHERE id IN (SELECT id FROM items_links) ;
+----------+
| COUNT(*) |
+----------+
|       12 |
+----------+

4. Use sql hint rewrite the "semi join" off, the query results are 4
EXPLAIN SELECT /*+ NO_SEMIJOIN(@subq1) */ COUNT(*) FROM items WHERE id IN (SELECT /*+ QB_NAME(subq1) */  id FROM items_links) ;
+----+--------------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type        | table       | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+----+--------------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
|  1 | PRIMARY            | items       | NULL       | index | NULL          | PRIMARY | 4       | NULL |    4 |   100.00 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | items_links | NULL       | ALL   | NULL          | NULL    | NULL    | NULL |    2 |   100.00 | Using where              |

show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------+
| Level | Code | Message
                                                                                                          |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'test.items.id' of SELECT #2 was resolved in SELECT #1
                                                                                                          |
| Note  | 1003 | /* select#1 */ select /*+ NO_SEMIJOIN(@`subq1`) */ count(0) AS `COUNT(*)` from `test`.`items` where <in_optimizer>(`test`.`items`.`id`,<exists>(/* select
NAME(`subq1`) */ 1 from `test`.`items_links` where (<cache>(`test`.`items`.`id`) = `test`.`items`.`id`)))

SELECT /*+ NO_SEMIJOIN(@subq1) */ COUNT(*) FROM items WHERE id IN (SELECT /*+ QB_NAME(subq1) */  id FROM items_links) ;
+----------+
| COUNT(*) |
+----------+
|        4 |
+----------+

5. mysql 5.6.25 version of the implementation plan is modified to semi join, but the query results are 4
optimizer_switch=index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on

explain EXTENDED  SELECT COUNT(*) FROM items WHERE id IN (SELECT id FROM items_links) ;
+----+-------------+-------------+-------+---------------+---------+---------+------+------+----------+-------------------------------------------------------------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                                                             |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+----------+-------------------------------------------------------------------+
|  1 | SIMPLE      | items_links | ALL   | NULL          | NULL    | NULL    | NULL |    2 |   100.00 | Start temporary                                                   |
|  1 | SIMPLE      | items       | index | NULL          | PRIMARY | 4       | NULL |    4 |   100.00 | Using index; End temporary; Using join buffer (Block Nested Loop) |

show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                   |
+-------+------+-----------------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'test.items.id' of SELECT #2 was resolved in SELECT #1                                 |
| Note  | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `test`.`items` semi join (`test`.`items_links`) where 1

SELECT COUNT(*) FROM items WHERE id IN (SELECT id FROM items_links) ;
+----------+
| COUNT(*) |
+----------+
|        4 |
+----------+
[14 Jun 2017 10:55] Roy Lyseng
The query is probably mistaken: There is an outer reference in the column selected in the subquery.

The query should probably be:

  select count(*) from items where id in (select iid from items_links);

That said, the COUNT result is still wrong, so it seems that during the rewrite to semijoin, the query is actually rewritten into a regular join.