Bug #9021 Query with subquery does not use index in outer query
Submitted: 7 Mar 2005 14:23 Modified: 28 Feb 2008 0:52
Reporter: Pete French
Status: Closed
Category:Server: Optimizer Severity:S1 (Critical)
Version:4.1.7 OS:FreeBSD (FreeBSD)
Assigned to: Bugs System Target Version:6.0
Triage: D3 (Medium) / R5 (Severe) / E5 (Major)

[7 Mar 2005 14:23] Pete French
Description:
I have two tables - one of languages, one of customers.
The languages table has, for this example, no indices,
the customer data tbale is indexed on the transaction ID
column 'trans_id'.

I am performing two queries to get all customers speaking irish.

1) select trans_id from trans_languages where generic_lang_code = 'ie';
2) select * from trans_cust_data where trans_id in ( 'XXXX', 'YYYY' );

XXXX and YYYY are the two transaction ID's retruned by query 1.

This is fast. I rewrote it as a sub-query thus:

 select * from trans_cust_data where trans_id in ( select trans_id from trans_languages
where generic_lang_code = 'ie' );

This is very slow. The two separate queries complete in 0.04 and 0.07 respectively.
The combined query takes 15.05 seconds to do the same job.

Running 'explain' on the three queries gives these results:

mysql> explain select trans_id from trans_languages where generic_lang_code = 'ie';
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table           | type | possible_keys | key  | key_len | ref  |
rows  | Extra       |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | trans_languages | ALL  | NULL          | NULL |    NULL | NULL |
22466 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+

mysql> explain select * from trans_cust_data where trans_id in ( 'XXXX', 'YYYY' );
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref  |
rows | Extra       |

+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | trans_cust_data | range | PRIMARY       | PRIMARY |      48 | NULL |
   2 | Using where |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+

mysql> explain select * from trans_cust_data where trans_id in ( select trans_id from
trans_languages where generic_lang_code = 'ie' );
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type        | table           | type            | possible_keys | key     |
key_len | ref  | rows   | Extra                    |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+--------------------------+
|  1 | PRIMARY            | trans_cust_data | ALL             | NULL          | NULL    |
   NULL | NULL | 976855 | Using where              |
|  2 | DEPENDENT SUBQUERY | trans_languages | unique_subquery | PRIMARY       | PRIMARY |
     48 | func |      1 | Using index; Using where |
+----+--------------------+-----------------+-----------------+---------------+---------+---------+------+--------+--------------------------+

So it seems that the subquery example is not using the primary key to pull
out the transaction ID, and it seems to have found an index to use in the
subquery, despite that table having no indices on any of the things being sought by
the 'where' clause.

How to repeat:
See descriptiuon above. This happens with any two tables
aas far as I can make out.

Suggested fix:
For now I am coding all such queries as two operations as
the time difference is substantial, a factor of 136 times faster!
[15 Mar 2005 22:27] Hooman Mozaffari
I had same problem too. 
Some other facts:

  1- It didn’t work even when I forced queries to use a specific index using USE INDEX
option.

  2- When I change the query from:  
    SELECT * FROM table1 WHERE field IN (SELECT field FROM table2 WHERE ...) 
    to:
    SELECT * FROM table1 WHERE field = (SELECT field FROM table2 WHERE ...)
    It uses indexes.
[16 Mar 2005 21:00] Pete French
Sadly I cant re-write my query the same way as my subquery returns
multiple rows. Nice to see someone else hit the same issue though.
[18 Mar 2005 16:59] Jorge del Conde
The problem is in the way IN is optimized
[11 May 2005 14:49] Jon Bionda
I disagree with the Severity level of this bug.  Our J2EE app is successfully performance
tested on other databases but fails miserably on MySQL because of this bug. How does
someone get a bug's severity level increased?
[12 May 2005 17:35] Sergei Golubchik
I increased the severity level.
However, to make this and similar queries faster we need to implement a special
optimization for IN subqueries. And it cannot be done before 5.1 (5.0-beta and 4.1-stable
are feature-frozen)
[19 Sep 2005 12:31] Georg Richter
This behaviour is the main show stopper in SAP SSQJ Benchmarks: Several parts which use IN
and subqueries fail.

Changed Severity to S1 (there is no workaround available, unless someone rewrites ABAP to
use JOINS instead of subqueries).
[28 Sep 2005 15:16] Hooman Mozaffari
Just installed MySQL 5.0.13-rc-nt candidate version and still see the bug:

CREATE TABLE `table1` (
  `ID` int(11) NOT NULL,
  `NAME` varchar(10) default NULL,
  PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `table2` (
  `ID` int(11) NOT NULL,
  `NAME` varchar(10) default NULL,
  PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO table1 VALUES(1,'A');
INSERT INTO table1 VALUES(2,'B');
INSERT INTO table1 VALUES(3,'C');

INSERT INTO table2 VALUES(1,'A');
INSERT INTO table2 VALUES(2,'B');

EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type        | table  | type  | possible_keys | key     | key_len | ref   |
rows | Extra       |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY            | table1 | ALL   | NULL          | NULL    | NULL    | NULL  | 
  3 | Using where |
|  2 | DEPENDENT SUBQUERY | table2 | const | PRIMARY       | PRIMARY | 4       | const | 
  1 | Using index |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

Optimizer doesn't pick the index in 'table1' (outer query)
[28 Sep 2005 15:36] Valeriy Kravchuk
Queries like this one will be more efficient generally, not only in MySQL, if your rewrite
them as join instead of (potentially correlated) subquery:

mysql> select version();
+----------------+
| version()      |
+----------------+
| 5.0.12-beta-nt |
+----------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type        | table  | type  | possible_keys | key     | key_len |
 ref   | rows | Extra       |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY            | table1 | ALL   | NULL          | NULL    | NULL    |
 NULL  |    3 | Using where |
|  2 | DEPENDENT SUBQUERY | table2 | const | PRIMARY       | PRIMARY | 4       |
 const |    1 | Using index |
+----+--------------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+------+
| ID | NAME |
+----+------+
|  2 | B    |
+----+------+
1 row in set (0.00 sec)

mysql> SELECT table1.* FROM table1, table2 WHERE table1.ID = table2.ID and table2.ID=2;
+----+------+
| ID | NAME |
+----+------+
|  2 | B    |
+----+------+
1 row in set (0.00 sec)

mysql> explain SELECT table1.* FROM table1, table2 WHERE table1.ID = table2.ID a
nd table2.ID=2;
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref
| rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | table1 | const | PRIMARY       | PRIMARY | 4       | const
|    1 |             |
|  1 | SIMPLE      | table2 | const | PRIMARY       | PRIMARY | 4       | const
|    1 | Using index |
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

So, anybody who cares about performance, will rewrite such a query. Even some optimizers
do... Let's wait for MySQL 5.1, as explained before.
[4 Oct 2006 18:34] John Swapceinski
BTW, here is an even earlier reporting of this bug:

http://bugs.mysql.com/bug.php?id=8139
[18 Aug 2007 7:25] Igor Babaev
Currently 5.2 supports:

1. Materialization of a subquery with an access to the materialized
table from outer tables.

2. Inside-out strategy without materialization.
 
The customer needs materialization with an access from the materialized
table.

Currently Sergey Petrunia is actively working on the design of the task that will include
this optimization.
We plan it to be ready by the end of September.
[8 Oct 2007 14:18] Tobias Asplund
As per Igor's comments, this works in 5.2:

mysql> use test
Database changed
mysql> CREATE TABLE `table1` (
    ->   `ID` int(11) NOT NULL,
    ->   `NAME` varchar(10) default NULL,
    ->   PRIMARY KEY  (`ID`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.02 sec)

mysql> 
mysql> CREATE TABLE `table2` (
    ->   `ID` int(11) NOT NULL,
    ->   `NAME` varchar(10) default NULL,
    ->   PRIMARY KEY  (`ID`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.03 sec)

mysql> 
mysql> INSERT INTO table1 VALUES(1,'A');
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO table1 VALUES(2,'B');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO table1 VALUES(3,'C');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> INSERT INTO table2 VALUES(1,'A');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO table2 VALUES(2,'B');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> EXPLAIN SELECT * FROM table1 WHERE ID IN (SELECT ID FROM table2 WHERE ID=2);
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref   | rows |
Extra       |
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY     | table1 | const | PRIMARY       | PRIMARY | 4       | const |    1 | 
           | 
|  1 | PRIMARY     | table2 | const | PRIMARY       | PRIMARY | 4       | const |    1 |
Using index | 
+----+-------------+--------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.03 sec)
[28 Feb 2008 0:52] Trudy Pelzer
Closed in MySQL 6.0
[2 Nov 2008 19:27] Valeriy Kravchuk
Bug #8139 was marked as a duplicate of this one.