Bug #9021 Query with subquery does not use index in outer query
Submitted: 7 Mar 2005 13:23 Modified: 26 Mar 2012 19:37
Reporter: Pete French Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:4.1.7, 5.6.4 OS:FreeBSD (FreeBSD)
Assigned to: Sergey Petrunya
Triage: D3 (Medium) / R5 (Severe) / E5 (Major)

[7 Mar 2005 13:23] Pete French
Description:
I have two tables - one of languages, one of customers.
The languages table has, for this example, no indices,
the customer data tbale is indexed on the transaction ID
column 'trans_id'.

I am performing two queries to get all customers speaking irish.

1) select trans_id from trans_languages where generic_lang_code = 'ie';
2) select * from trans_cust_data where trans_id in ( 'XXXX', 'YYYY' );

XXXX and YYYY are the two transaction ID's retruned by query 1.

This is fast. I rewrote it as a sub-query thus:

 select * from trans_cust_data where trans_id in ( select trans_id from trans_languages where generic_lang_code = 'ie' );

This is very slow. The two separate queries complete in 0.04 and 0.07 respectively.
The combined query takes 15.05 seconds to do the same job.

Running 'explain' on the three queries gives these results:

mysql> explain select trans_id from trans_languages where generic_lang_code = 'ie';
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table           | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | trans_languages | ALL  | NULL          | NULL |    NULL | NULL | 22466 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+

mysql> explain select * from trans_cust_data where trans_id in ( 'XXXX', 'YYYY' );
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref  | rows | Extra       |

+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | trans_cust_data | range | PRIMARY       | PRIMARY |      48 | NULL |    2 | Using where |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+

mysql> explain select * from trans_cust_data where trans_id in ( select trans_id from trans_languages where generic_lang_code = 'ie' );
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type        | table           | type            | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | trans_cust_data | ALL             | NULL          | NULL    |    NULL | NULL | 976855 | Using where              |
|  2 | DEPENDENT SUBQUERY | trans_languages | unique_subquery | PRIMARY       | PRIMARY |      48 | func |      1 | Using index; Using where |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+--------------------------+

So it seems that the subquery example is not using the primary key to pull
out the transaction ID, and it seems to have found an index to use in the
subquery, despite that table having no indices on any of the things being sought by
the 'where' clause.

How to repeat:
See descriptiuon above. This happens with any two tables
aas far as I can make out.

Suggested fix:
For now I am coding all such queries as two operations as
the time difference is substantial, a factor of 136 times faster!
[15 Mar 2005 21:27] Hooman Mozaffari
I had same problem too. 
Some other facts:

  1- It didn’t work even when I forced queries to use a specific index using USE INDEX option.

  2- When I change the query from:  
    SELECT * FROM table1 WHERE field IN (SELECT field FROM table2 WHERE ...) 
    to:
    SELECT * FROM table1 WHERE field = (SELECT field FROM table2 WHERE ...)
    It uses indexes.
[16 Mar 2005 20:00] Pete French
Sadly I cant re-write my query the same way as my subquery returns
multiple rows. Nice to see someone else hit the same issue though.
[18 Mar 2005 15:59] Jorge del Conde
The problem is in the way IN is optimized
[11 May 2005 12:49] Jon Bionda
I disagree with the Severity level of this bug.  Our J2EE app is successfully performance tested on other databases but fails miserably on MySQL because of this bug. How does someone get a bug's severity level increased?
[12 May 2005 15:35] Sergei Golubchik
I increased the severity level.
However, to make this and similar queries faster we need to implement a special optimization for IN subqueries. And it cannot be done before 5.1 (5.0-beta and 4.1-stable are feature-frozen)
[19 Sep 2005 10:31] Georg Richter
This behaviour is the main show stopper in SAP SSQJ Benchmarks: Several parts which use IN and subqueries fail.

Changed Severity to S1 (there is no workaround available, unless someone rewrites ABAP to use JOINS instead of subqueries).
[28 Sep 2005 13:16] Hooman Mozaffari
Just installed MySQL 5.0.13-rc-nt candidate version and still see the bug:

CREATE TABLE `table1` (
  `ID` int(11) NOT NULL,
  `NAME` varchar(10) default NULL,
  PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `table2` (
  `ID` int(11) NOT NULL,
  `NAME` varchar(10) default NULL,
  PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO table1 VALUES(1,'A');
INSERT INTO table1 VALUES(2,'B');
INSERT INTO table1 VALUES(3,'C');

INSERT INTO table2 VALUES(1,'A');
INSERT INTO table2 VALUES(2,'B');

EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type        | table  | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY            | table1 | ALL   | NULL          | NULL    | NULL    | NULL  |    3 | Using where |
|  2 | DEPENDENT SUBQUERY | table2 | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using index |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

Optimizer doesn't pick the index in 'table1' (outer query)
[28 Sep 2005 13:36] Valerii Kravchuk
Queries like this one will be more efficient generally, not only in MySQL, if your rewrite them as join instead of (potentially correlated) subquery:

mysql> select version();
+----------------+
| version()      |
+----------------+
| 5.0.12-beta-nt |
+----------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type        | table  | type  | possible_keys | key     | key_len |
 ref   | rows | Extra       |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY            | table1 | ALL   | NULL          | NULL    | NULL    |
 NULL  |    3 | Using where |
|  2 | DEPENDENT SUBQUERY | table2 | const | PRIMARY       | PRIMARY | 4       |
 const |    1 | Using index |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+------+
| ID | NAME |
+----+------+
|  2 | B    |
+----+------+
1 row in set (0.00 sec)

mysql> SELECT table1.* FROM table1, table2 WHERE table1.ID = table2.ID and table2.ID=2;
+----+------+
| ID | NAME |
+----+------+
|  2 | B    |
+----+------+
1 row in set (0.00 sec)

mysql> explain SELECT table1.* FROM table1, table2 WHERE table1.ID = table2.ID a
nd table2.ID=2;
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref
| rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | table1 | const | PRIMARY       | PRIMARY | 4       | const
|    1 |             |
|  1 | SIMPLE      | table2 | const | PRIMARY       | PRIMARY | 4       | const
|    1 | Using index |
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

So, anybody who cares about performance, will rewrite such a query. Even some optimizers do... Let's wait for MySQL 5.1, as explained before.
[4 Oct 2006 16:34] John Swapceinski
BTW, here is an even earlier reporting of this bug:

http://bugs.mysql.com/bug.php?id=8139
[18 Aug 2007 5:25] Igor Babaev
Currently 5.2 supports:

1. Materialization of a subquery with an access to the materialized
table from outer tables.

2. Inside-out strategy without materialization.
 
The customer needs materialization with an access from the materialized
table.

Currently Sergey Petrunia is actively working on the design of the task that will include this optimization.
We plan it to be ready by the end of September.
[8 Oct 2007 12:18] Tobias Asplund
As per Igor's comments, this works in 5.2:

mysql> use test
Database changed
mysql> CREATE TABLE `table1` (
    ->   `ID` int(11) NOT NULL,
    ->   `NAME` varchar(10) default NULL,
    ->   PRIMARY KEY  (`ID`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.02 sec)

mysql> 
mysql> CREATE TABLE `table2` (
    ->   `ID` int(11) NOT NULL,
    ->   `NAME` varchar(10) default NULL,
    ->   PRIMARY KEY  (`ID`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.03 sec)

mysql> 
mysql> INSERT INTO table1 VALUES(1,'A');
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO table1 VALUES(2,'B');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO table1 VALUES(3,'C');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> INSERT INTO table2 VALUES(1,'A');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO table2 VALUES(2,'B');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY     | table1 | const | PRIMARY       | PRIMARY | 4       | const |    1 |             | 
|  1 | PRIMARY     | table2 | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using index | 
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.03 sec)
[27 Feb 2008 23:52] Trudy Pelzer
Closed in MySQL 6.0
[2 Nov 2008 18:27] Valerii Kravchuk
Bug #8139 was marked as a duplicate of this one.
[13 Jun 2011 14:46] Shane Bester
I am lead to believe this is fixed in mysql-next-mr-opt-backpointing but not in mysql-5.6 ?
[23 Mar 2012 15:22] Meinolf Schulte-Döinghaus
Hi there,
in between I've tested all current versions in attention to this problem which are 5.0.67, 5.5.20, 5.6.4 M7 and 6.0.11 Alpha. The tests have been done on the InnoDB Engine (to support transactions)

The test I've done is the one described above by Tobias Asplund [8 Oct 2007 12:18] Just simply create the two tables filled with data and have a look what EXPLAIN says.

EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);

Here is the result:
5.0.67 --> no index is used
5.5.20 --> no index is used
5.6.4 M7 --> no index is used
6.0.11 Alpha --> index is used!! but 6.0.11 Alpha development has been stopped!! You will be redirected to 5.6.4 M7 which does not use a index.

I read in a blog about some workaround when using the "exists" function but "exists" does not use any index either.

The only option you have using MySQL is to redefine your SQL statement using JOIN but this is not always possible or causes a huge modification effort.

Another option (and possibly the best ;-) is to change to a different database provider.

What I'm wondering is why there is noone able to provide a patch to this. Especially sind Object Relational Mapper like Hibernate in many cases using subselects by default. So everyone who uses Hibernate should possibly not use MySQL.

Cheers
[23 Mar 2012 15:23] Meinolf Schulte-Döinghaus
Hi there,
in between I've tested all current versions in attention to this problem which are 5.0.67, 5.5.20, 5.6.4 M7 and 6.0.11 Alpha. The tests have been done on the InnoDB Engine (to support transactions)

The test I've done is the one described above by Tobias Asplund [8 Oct 2007 12:18] Just simply create the two tables filled with data and have a look what EXPLAIN says.

EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);

Here is the result:
5.0.67 --> no index is used
5.5.20 --> no index is used
5.6.4 M7 --> no index is used
6.0.11 Alpha --> index is used!! but 6.0.11 Alpha development has been stopped!! You will be redirected to 5.6.4 M7 which does not use a index.

I read in a blog about some workaround when using the "exists" function but "exists" does not use any index either.

The only option you have using MySQL is to redefine your SQL statement using JOIN but this is not always possible or causes a huge modification effort.

Another option (and possibly the best ;-) is to change to a different database provider.

What I'm wondering is why there is noone able to provide a patch to this. Especially sind Object Relational Mapper like Hibernate in many cases using subselects by default. So everyone who uses Hibernate should possibly not use MySQL.

Cheers
[26 Mar 2012 19:37] Paul Dubois
Noted in 5.6.5 changelog.

Several subquery performance issues were resolved through the
implementation of semi-join subquery optimization strategies.