Bug #54518 Very bad performance with subselect
Submitted: 15 Jun 2010 14:25 Modified: 15 Oct 2012 13:31
Reporter: Christoph Moser Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:5.0.22 OS:Windows (Vista)
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: semijoin, subselect

[15 Jun 2010 14:25] Christoph Moser
Description:
Because I don't understand the categories I post it here.
I have installed
  1) mysql-connector-net-6.2.2
  3) mysql-connector-odbc-5.1.6-win32.msi
  2) mysql-workbench-oss-5.2.22-rc-win32.msi
the performance-issue occures in all of them.

The following select takes over an hour (no joke!) for a table with 5454 rows ... i started the query at 15:30 and interrupted it at 16:30:

SELECT max(fld_1) as SomeName 
FROM `db`.`tbl` 
WHERE fld_1 IN 
   (SELECT fld_1 
    FROM `db`.`tbl` 
    WHERE fld_2 = '1' 
    AND fld_3 = 'xy');

How to repeat:
Every time at my environment.
[15 Jun 2010 14:33] Valeriy Kravchuk
Please, send the results of:

explain SELECT max(fld_1) as SomeName 
FROM `db`.`tbl` 
WHERE fld_1 IN 
   (SELECT fld_1 
    FROM `db`.`tbl` 
    WHERE fld_2 = '1' 
    AND fld_3 = 'xy');

show create table `db`.`tbl`\G

select version();

from mysql command like client.
[16 Jun 2010 13:42] Christoph Moser
1) Result for the explain Select you asked:
1.1) Result after about 1,5 hours:
+-------+
| hansi |
+-------+
| 18072 |
+-------+
1.2) The command-line-client did not work with the same syntax as the workbench - I had do adapt the select to the following:

SELECT max(fld_1) as SomeName 
FROM tbl 
WHERE fld_1 IN 
   (SELECT fld_1 
    FROM tbl 
    WHERE fld_2 = '1' 
    AND fld_3 = 'xy');

_____________________________________________________________________________

2) Result of "show create table `db`.`tbl`\G":

C:\Program Files\MySQL\MySQL Workbench 5.2 OSS>mysql -h e3-server -u root -p -e "show create table `
e3Build`.`SvnRevisionInfo`\G" -D e3Build
Enter password: ********
*************************** 1. row ***************************
       Table: SvnRevisionInfo
Create Table: CREATE TABLE `SvnRevisionInfo` (
  `Id` bigint(20) NOT NULL auto_increment,
  `Rev` bigint(20) NOT NULL,
  `RepoUrl` varchar(2000) NOT NULL,
  `User` varchar(50) NOT NULL,
  `Commitmessage` varchar(30000) NOT NULL,
  `Changelog` varchar(2000) NOT NULL,
  `Commitzeit` datetime NOT NULL,
  `Solutionname` varchar(50) NOT NULL,
  `Fehler` longtext NOT NULL,
  `BuildConfiguration` varchar(45) NOT NULL,
  `isBuildable` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
_____________________________________________________________________________
3) Result of "select version()":
+----------------------------------+
| version()                        |
+----------------------------------+
| 5.0.22-Debian_0ubuntu6.06.11-log |
+----------------------------------+
[16 Jun 2010 15:37] Valeriy Kravchuk
Please, send the results of:

EXPLAIN SELECT max(fld_1) as SomeName 
FROM tbl
WHERE fld_1 IN 
   (SELECT fld_1 
    FROM tbl 
    WHERE fld_2 = '1' 
    AND fld_3 = 'xy');

from mysql command line client. They should have several columns, like this:

mysql> explain select * from mysql.user where user in (select user from mysql.user);
+----+--------------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type        | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+--------------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY            | user  | ALL   | NULL          | NULL    | NULL    | NULL |    7 | Using where              |
|  2 | DEPENDENT SUBQUERY | user  | index | NULL          | PRIMARY | 228     | NULL |    7 | Using where; Using index |
+----+--------------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)

You should get the results of EXPLAIN for the query much faster...

Also, please, send the results of:

select count(*) from tbl;
[17 Jun 2010 9:42] Christoph Moser
1) explain select ....:
+----+--------------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type        | table           | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+--------------------+-----------------+------+---------------+------+---------+------+-------+-------------+
|  1 | PRIMARY            | SvnRevisionInfo | ALL  | NULL          | NULL | NULL    | NULL | 48611 | Using where |
|  2 | DEPENDENT SUBQUERY | SvnRevisionInfo | ALL  | NULL          | NULL | NULL    | NULL | 48611 | Using where |
+----+--------------------+-----------------+------+---------------+------+---------+------+-------+-------------+

2) select count(*) from tbl  (every 30 Minutes there will be added some rows...)

+----------+
| count(*) |
+----------+
|     5587 |
+----------+

###
Yesterday I tried a select with four subselects ... it took 2 hours and 10 minutes to get the result.
###
[17 Jun 2010 10:04] Valeriy Kravchuk
So, this is a know problem - subquery is considered dependent by MySQL 5.x.y, and thus (as you have no proper indexes defined) for every row from the table you have to scan the entire table. This gives more than 31 millions of rows to read and results in bad performance. Known problem. See http://forge.mysql.com/worklog/task.php?id=1110 and http://forge.mysql.com/worklog/task.php?id=2980. We had implemented proper solutions for MySQL 6.0 and they will eventually be included in the following 5.x.y versions.

In the meantime you can try to rewrite your query as:

SELECT max(tbl.fld_1) as SomeName 
FROM tbl, (SELECT fld_1 
    FROM tbl 
    WHERE fld_2 = '1' 
    AND fld_3 = 'xy') a
WHERE tbl.fld_1 = a.fld_1;

and/or add proper indexes on fld_3 and/or fld_3.
[24 Jan 2011 8:39] Manyi Lu
This bug should be fixed by WL#3985 WL#1110 WL#3751 WL#2980.
[15 Oct 2012 13:31] Erlend Dahl
Fixed in 5.6.5.