Bug #25796 Query plan is not using expected index
Submitted: 23 Jan 2007 19:08 Modified: 23 Jan 2008 0:52
Reporter: Yogish Baliga Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:4.1.20, 5.0.36-BK OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[23 Jan 2007 19:08] Yogish Baliga
Description:
I am running a query against a database having ~12,000 rows. The following query takes long time to execute ( more than 3 seconds ).

select post.url_id from POST as post   where post.id IN ( SELECT p1.id from POST as p1 where p1.user_id = 'acd' );

The query plan shown is:

mysql> explain select post.url_id from POST as post   where post.id IN ( SELECT p1.id from POST as p1 where p1.user_id = 'acd' );

| id | select_type        | table | type            | possible_keys   | key     | key_len | ref  | rows   | Extra                    |

|  1 | PRIMARY            | post  | index           | NULL            | url     |      67 | NULL | 221425 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | p1    | unique_subquery | PRIMARY,user_id | PRIMARY |      66 | func |      1 | Using index; Using where |

2 rows in set (0.00 sec)

Here is the output of show create table POST;

| POST  | CREATE TABLE `POST` (
  `id` varchar(22) NOT NULL default '',
  `url_id` varchar(22) default NULL,
  `user_id` varchar(22) default NULL,
  `title` varchar(255) default NULL,
  `note` text,
  `private` int(1) NOT NULL default '0',
  `spam` int(1) NOT NULL default '0',
  `createdate` int(14) default NULL,
  `userdate` int(14) default NULL,
  `last_modified` int(14) default NULL,
  PRIMARY KEY  (`id`),
  KEY `url` (`url_id`),
  KEY `user_id` (`user_id`),
  KEY `spam` (`spam`),
  KEY `private` (`private`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

My questions are:

1. Why the sub-query is shown as dependent sub-query when the sub-query is not dependent on the outer query?
2. Why the outer query is not using the primary index when the query is clearly on the "id" column which is a primary key?

How to repeat:
You need some dummy data to produce the result. You can get the query plan without any data too.
[24 Jan 2007 16:01] Heikki Tuuri
This is more probably an optimizer bug report.
[26 Jan 2007 16:08] Valeriy Kravchuk
Thank you for a problem report. What index you expected to be used? Can you try to force its usage and check if query will run faster?

I've got the following results on dummy data:

mysql> explain extended select post.url_id from POST as post   where post.id IN
 ( SELECT p1.id from POST as p1 where p1.user_id = 'acd' )\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: post
         type: index
possible_keys: NULL
          key: url
      key_len: 69
          ref: NULL
         rows: 10490
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: p1
         type: unique_subquery
possible_keys: PRIMARY,user_id
          key: PRIMARY
      key_len: 68
          ref: func
         rows: 1
        Extra: Using index; Using where
2 rows in set, 1 warning (0.01 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `test`.`post`.`url_id` AS `url_id` from `test`.`POST` `post` whe
re <in_optimizer>(`test`.`post`.`id`,<exists>(<primary_index_lookup>(<cache>(`te
st`.`post`.`id`) in POST on PRIMARY where (`test`.`p1`.`user_id` = _utf8'acd')))
)
1 row in set (0.00 sec) 

Data were generated by the following script:

i=1;
while [ $i -le 12000 ]; 
do 
  bin/mysql -uroot test -e "insert into POST(id, url_id, user_id) values($i, $i, 'acd')"; 
  let i=i+1;
done

I do not see any problem with this plan (other than calling subquery DEPENDENT while it is not, really, but this is a know problem to be fixed later).

So, why do you think this is a bug? 

Query (different, but with the same plan) runs fast enough for me:

mysql> select count(post.url_id) from POST as post   where post.id IN ( SELECT
p1.id from POST as p1 where p1.user_id = 'acd' );
+--------------------+
| count(post.url_id) |
+--------------------+
|              12000 |
+--------------------+
1 row in set (0.06 sec)

mysql> explain select count(post.url_id) from POST as post   where post.id IN (
 SELECT p1.id from POST as p1 where p1.user_id = 'acd' )\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: post
         type: index
possible_keys: NULL
          key: url
      key_len: 69
          ref: NULL
         rows: 10490
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: p1
         type: unique_subquery
possible_keys: PRIMARY,user_id
          key: PRIMARY
      key_len: 68
          ref: func
         rows: 1
        Extra: Using index; Using where
2 rows in set (0.00 sec)
[26 Jan 2007 20:32] Yogish Baliga
The query plan for dependent sub-query shows that it is using the PRIMARY KEY instead of user_id index (sub-query's WHERE condition is on user_id ). 

Also because the the sub-query is considered as dependent sub-query, I think the sub-query gets executed for every row of the result. I think so because the query takes more than 5 seconds to execute. If I run the sub-query independently I get the results within micro-seconds. and then if I cut & paste the ids I got from the sub-query into the original queries IN clause I get the result within micro-seconds too.
[28 Jan 2007 8:00] Valeriy Kravchuk
As I already noted, the problem with subqueries like this considered as dependent (while they are NOT) is well-known. It should be fixed in 5.1 GA. PRIMARY KEY usage instead of other index is a result of subquery treatment as dependent. 

So, this is a known problem to be fixed later.
[23 Jan 2008 0:52] Sergey Petrunya
This subquery should be handled by new subquery optimizations in MySQL 6.0.

Changing status to Closed. 

Yogish, do you still have the dataset? If yes, could you try with MySQL 6.0? Feel free to re-open the bug if it is still slow.