MySQL Bugs: #59259: Incorrect rows returned for a correlated subquery when ICP is on

Bug #59259	Incorrect rows returned for a correlated subquery when ICP is on
Submitted:	3 Jan 2011 12:03	Modified:	10 Feb 2011 23:43
Reporter:	SaiKumar V	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.6.1	OS:	Linux (RedHat x86)
Assigned to:	Marko Mäkelä	CPU Architecture:	Any
Tags:	index_condition_pushdown, optimizer_switch

Description:
With ICP on(deafault) in 5.6.1 the below query returns more rows. 

mysql> SELECT col_int_key  
    -> FROM C OUTR  
    -> WHERE pk  IN (  
    -> SELECT INNR .pk  
    -> FROM CC  JOIN CC INNR  ON INNR .col_int_key=INNR .col_int_key 
    -> WHERE OUTR .col_int_key  )   ;
+-------------+
| col_int_key |
+-------------+
|           0 |
|           0 |
|           1 |
|           2 |
|           5 |
|           5 |
|           6 |
|           9 |
|          53 |
|         166 |
+-------------+
10 rows in set (0.00 sec)

EXPLAIN SELECT col_int_key   FROM C OUTR   WHERE pk  IN (   SELECT INNR .pk   FROM CC  JOIN CC INNR  ON INNR .col_int_key=INNR .col_int_key  WHERE OUTR .col_int_key  );
+----+--------------------+-------+--------+---------------+-------------+---------+------+------+------------------------------------+
| id | select_type        | table | type   | possible_keys | key         | key_len | ref  | rows | Extra                              |
+----+--------------------+-------+--------+---------------+-------------+---------+------+------+------------------------------------+
|  1 | PRIMARY            | OUTR  | index  | NULL          | col_int_key | 5       | NULL |   20 | Using where; Using index           |
|  2 | DEPENDENT SUBQUERY | CC    | index  | NULL          | col_int_key | 5       | NULL |   20 | Using index                        |
|  2 | DEPENDENT SUBQUERY | INNR  | eq_ref | PRIMARY       | PRIMARY     | 4       | func |    1 | Using index condition; Using where |
+----+--------------------+-------+--------+---------------+-------------+---------+------+------+------------------------------------+
3 rows in set (0.00 sec)

mysql> set optimizer_switch='index_condition_pushdown=off';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT col_int_key   FROM C OUTR   WHERE pk  IN (   SELECT INNR .pk   FROM CC  JOIN CC INNR  ON INNR .col_int_key=INNR .col_int_key  WHERE OUTR .col_int_key  );
+-------------+
| col_int_key |
+-------------+
|           1 |
|           2 |
|           5 |
|           5 |
|           6 |
|           9 |
|          53 |
|         166 |
+-------------+
8 rows in set (0.00 sec)

How to repeat:
Found on release binary,  mysqld Ver 14.14 Distrib 5.6.1-m5

The RQG grammar used was subquery_semijoin.yy , with seed=1293523528.

CREATE TABLE C (
  pk int(11) NOT NULL AUTO_INCREMENT,
  col_int_key int(11) DEFAULT NULL,
  col_datetime_key datetime DEFAULT NULL,
  col_varchar_key varchar(1) DEFAULT NULL,
  PRIMARY KEY (pk),
  KEY col_int_key (col_int_key),
  KEY col_datetime_key (col_datetime_key),
  KEY col_varchar_key (col_varchar_key,col_int_key)
) ENGINE=InnoDB AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
INSERT INTO C VALUES (1,2,'2004-10-11 18:13:16','w');
INSERT INTO C VALUES (2,9,NULL,'m');
INSERT INTO C VALUES (3,3,'1900-01-01 00:00:00','m');
INSERT INTO C VALUES (4,9,'2009-07-25 00:00:00','k');
INSERT INTO C VALUES (5,NULL,NULL,'r');
INSERT INTO C VALUES (6,9,'2008-07-27 00:00:00','t');
INSERT INTO C VALUES (7,3,'2002-11-13 16:37:31','j');
INSERT INTO C VALUES (8,8,'1900-01-01 00:00:00','u');
INSERT INTO C VALUES (9,8,'2003-12-10 00:00:00','h');
INSERT INTO C VALUES (10,53,'2001-12-21 22:38:22','o');
INSERT INTO C VALUES (11,0,'2008-12-13 23:16:44',NULL);
INSERT INTO C VALUES (12,5,'2005-08-15 12:39:41','k');
INSERT INTO C VALUES (13,166,NULL,'e');
INSERT INTO C VALUES (14,3,'2006-09-11 12:06:14','n');
INSERT INTO C VALUES (15,0,'2007-12-15 12:39:34','t');
INSERT INTO C VALUES (16,1,'2005-08-09 00:00:00','c');
INSERT INTO C VALUES (17,9,'2001-09-02 22:50:02','m');
INSERT INTO C VALUES (18,5,'2005-12-16 22:58:11','y');
INSERT INTO C VALUES (19,6,'2007-04-19 00:19:53','f');
INSERT INTO C VALUES (20,2,'1900-01-01 00:00:00','d');
CREATE TABLE CC (
  pk int(11) NOT NULL AUTO_INCREMENT,
  col_int_key int(11) DEFAULT NULL,
  col_datetime_key datetime DEFAULT NULL,
  col_varchar_key varchar(1) DEFAULT NULL,
  PRIMARY KEY (pk),
  KEY col_int_key (col_int_key),
  KEY col_datetime_key (col_datetime_key),
  KEY col_varchar_key (col_varchar_key,col_int_key)
) ENGINE=InnoDB AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;
INSERT INTO CC VALUES (10,8,'2002-02-26 06:14:37','v');
INSERT INTO CC VALUES (11,9,'1900-01-01 00:00:00','r');
INSERT INTO CC VALUES (12,9,'2006-12-03 09:37:26','a');
INSERT INTO CC VALUES (13,186,'2008-05-26 12:27:10','m');
INSERT INTO CC VALUES (14,NULL,'2004-12-14 16:37:30','y');
INSERT INTO CC VALUES (15,2,'2003-02-11 21:19:41','j');
INSERT INTO CC VALUES (16,3,'2009-10-18 02:27:49','d');
INSERT INTO CC VALUES (17,0,'2000-09-26 07:45:57','z');
INSERT INTO CC VALUES (18,133,NULL,'e');
INSERT INTO CC VALUES (19,1,'2005-11-10 12:40:29','h');
INSERT INTO CC VALUES (20,8,'2009-04-25 00:00:00','b');
INSERT INTO CC VALUES (21,5,'2002-11-27 00:00:00','s');
INSERT INTO CC VALUES (22,5,'2004-01-26 20:32:32','e');
INSERT INTO CC VALUES (23,8,'2007-10-26 11:41:40','j');
INSERT INTO CC VALUES (24,6,'2005-10-07 00:00:00','e');
INSERT INTO CC VALUES (25,51,'2000-07-15 05:00:34','f');
INSERT INTO CC VALUES (26,4,'2000-04-03 16:33:32','v');
INSERT INTO CC VALUES (27,7,NULL,'x');
INSERT INTO CC VALUES (28,6,'2001-04-25 01:26:12','m');
INSERT INTO CC VALUES (29,4,'2000-12-27 00:00:00','c');

 
SELECT col_int_key  
FROM C OUTR  
WHERE pk  IN (  
SELECT INNR .pk  
FROM CC  JOIN CC INNR  ON INNR .col_int_key=INNR .col_int_key 
WHERE OUTR .col_int_key  )   ;

Reduced/simplified version of test case to reproduce the problem:
=================================================================

CREATE TABLE t1 (
  pk INTEGER NOT NULL,
  i1 INTEGER NOT NULL
) ENGINE=InnoDB;

INSERT INTO t1 VALUES (11,0);
INSERT INTO t1 VALUES (12,5);
INSERT INTO t1 VALUES (15,0);

CREATE TABLE t2 (
  pk INTEGER NOT NULL,
  i1 INTEGER NOT NULL,
  PRIMARY KEY (pk)
) ENGINE=InnoDB;

INSERT INTO t2 VALUES (11,1);
INSERT INTO t2 VALUES (12,2);
INSERT INTO t2 VALUES (15,4);

SELECT *
FROM t1  
WHERE pk IN (  
  SELECT it.pk  
  FROM t2 JOIN t2 AS it ON it.i1=it.i1
  WHERE t1.i1);

DROP TABLE t1, t2;

Explain output for the simplified test case:
============================================

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       PRIMARY t1      ALL     NULL    NULL    NULL    NULL    3       Using where
2       DEPENDENT SUBQUERY      t2      index   NULL    PRIMARY 4       NULL    3       Using index
2       DEPENDENT SUBQUERY      it      eq_ref  PRIMARY PRIMARY 4       func    1       Using index condition

The index condition that gets pushed down to InnoDB when running the simplified test case is pushed down on the table "it" (alias for "t2") for the primary key (pk) index. It contains the following condition:

  `test`.`t1`.`i1`

The cause for the extra rows in the result set is that an index
condition is pushed down to InnoDB on the primary key but this is
never evaluated by InnoDB. So in this case the where condition "WHERE
t1.i1" is never evaluated neither by InnoDB nor the server and thus
the extra rows get included in the result set instead of being
filtered out.

The problem seems to occur in PHASE 2 of row_search_for_mysql() (in
row0sel.c). The retrieval of data from the "it" table (alias for "t2")
satisfy the critiera for using the "adaptive hash index search":

	/*-------------------------------------------------------------*/
	/* PHASE 2: Try fast adaptive hash index search if possible */

	/* Next test if this is the special case where we can use the fast
	adaptive hash index to try the search. Since we must release the
	search system latch when we retrieve an externally stored field, we
	cannot use the adaptive hash index in a search in the case the row
	may be long and there may be externally stored fields */

	if (UNIV_UNLIKELY(direction == 0)
	    && unique_search
	    && dict_index_is_clust(index)
	    && !prebuilt->templ_contains_blob
	    && !prebuilt->used_in_HANDLER
	    && (prebuilt->mysql_row_len < UNIV_PAGE_SIZE / 8)) {

		mode = PAGE_CUR_GE;

		if (trx->mysql_n_tables_locked == 0
		    && prebuilt->select_lock_type == LOCK_NONE
		    && trx->isolation_level > TRX_ISO_READ_UNCOMMITTED
		    && trx->read_view) {

Here InnoDB finds the correct row and will thus first got to the label
"release_search_latch_if_needed" and then to the label "func_exit". By
jumping directly to "func_exit" the code that checks for index
condtions is bypassed and thus the pushed index condition is not
evaluated.

Thank you, Olav! I should have thought about the adaptive hash index when reviewing the ICP code. The adaptive hash index lookup could still make sense with index condition pushdown.

The fix should be simple: when row_sel_try_search_shortcut_for_mysql() returns SEL_FOUND, evaluate prebuilt->idx_cond if it exists. This is an InnoDB bug.

Bug#59302 and Bug#59347 are likely duplicates of this bug.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128515

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128516

Pushed into mysql-trunk 5.6.2 (revid:vasil.dimov@oracle.com-20110130165639-1pr3opz839b98q5j) (version source revid:vasil.dimov@oracle.com-20110130165522-m0o6al0pn5ig9kv3) (merge vers: 5.6.2) (pib:24)