Bug #8086 Mysterious very poor performance w/ IN (select...) vs. =(correlated subquery)
Submitted: 22 Jan 2005 6:38 Modified: 26 Feb 2005 21:25
Reporter: Joseph Hall Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.8 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[22 Jan 2005 6:38] Joseph Hall
Description:
Using InnoDB tables and all join/range columns are indexed.

I can't figure this one out.

  select max(available_seq)
  from x_available_icmp_ping
  group by host_id

returns 24 rows and executes in <.05 sec.

select host_name, host_ip, tstamp, average, ok
  from host h
  left join x_available_icmp_ping ip
  on h.host_id = ip.host_id
  and ip.available_seq = (select max(available_seq)
    from x_available_icmp_ping
    where host_id = h.host_id
   )

returns 24 rows and executes in <.05 sec. BUT:

select host_name, host_ip, tstamp, average, ok
  from host h
  left join x_available_icmp_ping ip
  on h.host_id = ip.host_id
  and ip.available_seq in (select max(available_seq)
    from x_available_icmp_ping
    group by host_id
  )

takes a very long time to execute - 20-30 sec.

I must be missing something, but the query planner should be able to figure out that the GROUP BY is going to return at most the number of rows given by the cardinality of the host_id index, which is 24, and treat those as if IN (some small number of constants).

For the correlated subquery:

+----+--------------------+-----------------------+--------+-----------------+---------+---------+-------------------+------+-------------+
| id | select_type        | table                 | type   | possible_keys   | key     | key_len | ref               | rows | Extra       |
+----+--------------------+-----------------------+--------+-----------------+---------+---------+-------------------+------+-------------+
|  1 | PRIMARY            | h                     | ALL    | NULL            | NULL    |    NULL | NULL              |   24 |             |
|  1 | PRIMARY            | ip                    | eq_ref | PRIMARY,host_id | PRIMARY |       8 | func              |    1 |             |
|  2 | DEPENDENT SUBQUERY | x_available_icmp_ping | ref    | host_id         | host_id |       4 | monitor.h.host_id |  116 | Using index |
+----+--------------------+-----------------------+--------+-----------------+---------+---------+-------------------+------+-------------+

For the GROUP BY subquery:

+----+--------------------+-----------------------+-------+---------------+---------+---------+-------------------+------+-------------+
| id | select_type        | table                 | type  | possible_keys | key     | key_len | ref               | rows | Extra       |
+----+--------------------+-----------------------+-------+---------------+---------+---------+-------------------+------+-------------+
|  1 | PRIMARY            | h                     | ALL   | NULL          | NULL    |    NULL | NULL              |   24 |             |
|  1 | PRIMARY            | ip                    | ref   | host_id       | host_id |       4 | monitor.h.host_id |  116 |             |
|  2 | DEPENDENT SUBQUERY | x_available_icmp_ping | index | NULL          | host_id |       4 | NULL              | 4673 | Using index |
+----+--------------------+-----------------------+-------+---------------+---------+---------+-------------------+------+-------------+

How to repeat:
If the problem isn't obvious, I'll attempt to whittle my schema and data down to re-post.
[26 Jan 2005 21:18] Jorge del Conde
Hi!

Can you please attatch a file with the tables used by your queries so that we can reproduce this problem ?

Thanks !
[27 Feb 2005 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".