Bug #81316 Optimizer broke for large in clauses
Submitted: 4 May 2016 20:39 Modified: 5 May 2016 7:24
Reporter: David Conger-Eldeen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.7.10,5.7.12 OS:CentOS (CentOS release 6.7 (Final))
Assigned to: CPU Architecture:Any
Tags: clause, DML, IN, Optimizer, statement

[4 May 2016 20:39] David Conger-Eldeen
Description:
Testing an upgrade from 5.6 to 5.7.10 we found an issue with the IN clause which only allowed only 2,098 values before the query plan broke and did a full scan of the PK taking 14 minutes instead of 20s (before it was cached). We upgraded to 5.7.12 and the number increased to 35,855 allowed in an IN clause before it would switch to a full scan of the PK. This one took 12 to 15 minutes instead of around 20s (before it was cached). The table has 916,534,679 rows in it. This is the case for both InnoDB and MyISAM.

The same query runs in around 1 second on 5.6 with the full IN clause of 88,192 values. The section "B.5.7 Known issues in MySQL" states "Subquery optimization for IN is not as effective as for =", but it shouldn't cause full scans of the index in the common part of the optimizer.

NOTES:

* The InnoDB tables were dumped and reloaded so the innodb_file_format = Barracuda and the ROW_FORMAT = Dynamic.
* The MyISAM tables were checked for upgrade and didn't require upgrading.

How to repeat:
For 5.7.12: Create this table with 1.2 million records in it. Note: This is with the InnoDB table, but is the same for MyISAM.

CREATE TABLE test_table (
  id1 bigint(20) NOT NULL,
  id2 char(1) NOT NULL DEFAULT '.',
  id3 char(1) NOT NULL DEFAULT '.',
  id4 char(1) NOT NULL DEFAULT '.',
  val1 varchar(40) DEFAULT NULL,
  val2 varchar(15) DEFAULT NULL,
  val3 varchar(40) DEFAULT NULL,
  ind_id tinyint(4) DEFAULT NULL,
  val4 varchar(5) DEFAULT NULL,
  val5 varchar(5) DEFAULT NULL,
  val6 varchar(5) DEFAULT NULL,
  val7 varchar(5) DEFAULT NULL,
  len1 tinyint(4) DEFAULT '0',
  val8 varchar(5) DEFAULT NULL,
  val9 varchar(5) DEFAULT NULL,
  val10 varchar(5) DEFAULT NULL,
  PRIMARY KEY (id1,id2,id3,id4) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
;

Run this query with EXPLAIN EXTENDED.

explain extended
select SQL_NO_CACHE
    id1,
    id2,
    id3,
    id4,
    ifnull(val1, ''),
    ifnull(val2, ''),
    ifnull(val3, ''),
    ifnull(ind_id, ''),
    ifnull(val4, ''),
    ifnull(val5, ''),
    ifnull(val6, ''),
    ifnull(val7, ''),
    ifnull(len1, '0'),
    ifnull(val8, ''),
    ifnull(val9, ''),
    ifnull(val10, '')
from test_table
where id1 in (<a list of 35,855 ids>)
;

This will return the following:

+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | test_table | NULL       | range | PRIMARY       | PRIMARY | 8       | NULL | 35855 |   100.00 | Using where |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-------------+

Now add 1 more id to the list and rerun. The following is returned.

+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | test_table | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL | 1195950 |    50.00 | Using where |
+----+-------------+------------+------------+------+---------------+------+---------+------+---------+----------+-------------+

It scans all records! This works in 5.6.
[4 May 2016 21:09] David Conger-Eldeen
Note: I stated this works in 5.6. To clarify, the optimizer doesn't break in 5.6. We can put as many IDs in the IN clause as max_allowed_packet will allow.
[4 May 2016 22:03] David Conger-Eldeen
Please close this ticket. The set global range_optimizer_max_mem_size=0; fixes the problem.
[4 May 2016 22:06] David Conger-Eldeen
This needs to be added to the general release notes for 5.7. It can affect auto query generator tools.
[5 May 2016 1:33] MySQL Verification Team
so, this is known: 
https://bugs.mysql.com/bug.php?id=78973
https://bugs.mysql.com/bug.php?id=78752
https://bugs.mysql.com/bug.php?id=78907
[5 May 2016 7:24] MySQL Verification Team
Thank you for the report.
Closing the report as requested in [4 May 22:03] David Conger-Eldeen

Imho documentation will be covered as part of related bugs.

https://bugs.mysql.com/bug.php?id=78973
https://bugs.mysql.com/bug.php?id=78752
https://bugs.mysql.com/bug.php?id=78907