Bug #10815 Sub-queries failure when using GROUP BY
Submitted: 24 May 2005 7:36 Modified: 26 Mar 2012 19:34
Reporter: Mark Papadakis Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:4.1.12 OS:Linux (Linux)
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: semijoin

[24 May 2005 7:36] Mark Papadakis
Description:
SELECT SQL_CALC_FOUND_ROWS id, ref_id, status, title, UNIX_TIMESTAMP(updated), has_draft, locked_by, UNIX_TIMESTAMP(lock_time), UNIX_TIMESTAMP(created), author_id, updated_by, section_id, valid_from_time, valid_to_time, hash_key, UNIX_TIMESTAMP(published), published_by FROM phaistos_cman.items WHERE section_id IN (166) AND status IN (1,2)  AND id IN (SELECT item_id FROM phaistos_cman.item_data WHERE section_id IN (166) AND (  (field_id = 281 AND data REGEXP  '^a') )  GROUP BY item_id HAVING COUNT(*) = 1) ORDER BY id LIMIT 0, 10

The above query is using the IN operator in conjuction with a result-set returned by a subselect:

SELECT item_id FROM phaistos_cman.item_data WHERE section_id IN (166) AND (  (field_id = 281 AND data REGEXP  '^a') )  GROUP BY item_id HAVING COUNT(*) = 1

This query takes almost 40 seconds. If we remove the 'GROUP BY item_id HAVING COUNT(*) = 1' part from the subquery it takes less than a second.

Invoking:
SELECT item_id FROM phaistos_cman.item_data WHERE section_id IN (166) AND (  (field_id = 281 AND data REGEXP  '^a') )  GROUP BY item_id HAVING COUNT(*) = 1
on its own (i.e not as a sub-query) takes less than a second.

Invoking:
SELECT item_id FROM phaistos_cman.item_data WHERE section_id IN (166) AND (  (field_id = 281 AND data REGEXP  '^a') ) 
on its own, also takes less than a second.

It seems that when using GROUP BY in a sub-select, something is slowing down the execution too much (1 sec VS 40 seconds).

Upgrading from 4.1 to 4.12 did not fix the problem.

How to repeat:
Here is the definition of the two tables.
If you populate them with random data, you will be able to reproduce the problem.

CREATE TABLE `items` (
  `id` int(11) NOT NULL default '0',
  `ref_id` varchar(255) NOT NULL default '',
  `flags` int(10) unsigned NOT NULL default '0',
  `status` tinyint(1) NOT NULL default '0',
  `title` varchar(255) NOT NULL default '',
  `updated` datetime NOT NULL default '0000-00-00 00:00:00',
  `draft_title` varchar(255) NOT NULL default '',
  `draft_ref_id` varchar(255) NOT NULL default '',
  `updated_by` int(11) NOT NULL default '0',
  `workflow_pos` int(2) NOT NULL default '0',
  `has_draft` tinyint(1) NOT NULL default '0',
  `hash_key` varchar(127) NOT NULL default '',
  `disapprove_reason` text NOT NULL,
  `section_id` int(11) NOT NULL default '0',
  `locked_by` int(11) NOT NULL default '0',
  `lock_time` datetime NOT NULL default '0000-00-00 00:00:00',
  `draft_updated` datetime NOT NULL default '0000-00-00 00:00:00',
  `valid_from_time` int(11) NOT NULL default '0',
  `valid_to_time` int(11) NOT NULL default '0',
  `keywords` text NOT NULL,
  `descr` text NOT NULL,
  `created` datetime NOT NULL default '0000-00-00 00:00:00',
  `pos` int(11) NOT NULL default '0',
  `series_id` int(11) NOT NULL default '0',
  `allow_comments` int(11) NOT NULL default '0',
  `draft_author_id` int(11) NOT NULL default '0',
  `author_id` int(10) unsigned NOT NULL default '0',
  `published` datetime default '0000-00-00 00:00:00',
  `published_by` int(10) unsigned default '0',
  PRIMARY KEY  (`section_id`,`id`),
  KEY `ref_id` (`ref_id`),
  KEY `section_id` (`status`),
  KEY `status` (`status`,`id`),
  KEY `section_id_2` (`section_id`),
  KEY `section_id_4` (`section_id`,`status`,`valid_from_time`,`valid_to_time`),
  KEY `section_id_5` (`section_id`,`created`),
  KEY `series_id` (`series_id`,`pos`),
  KEY `series_id_2` (`series_id`,`status`),
  KEY `section_id_3` (`section_id`,`hash_key`),
  KEY `authorKey` (`author_id`,`status`,`section_id`)
) ENGINE=InnoDB DEFAULT CHARSET=greek

CREATE TABLE `item_data` (
  `item_id` int(11) NOT NULL default '0',
  `field_id` int(11) NOT NULL default '0',
  `group_field_id` int(11) NOT NULL default '0',
  `data` blob NOT NULL,
  `id` int(11) NOT NULL default '0',
  `section_id` int(11) NOT NULL default '0',
  `lang` tinyint(2) NOT NULL default '0',
  PRIMARY KEY  (`section_id`,`item_id`,`id`,`lang`),
  KEY `item_field` (`item_id`,`field_id`,`group_field_id`),
  KEY `item_id` (`item_id`,`id`),
  KEY `item_id_2` (`item_id`,`field_id`,`section_id`)
) ENGINE=InnoDB DEFAULT CHARSET=greek
[29 May 2005 0:54] Jorge del Conde
Thank you for your bug report.  I was able to reproduce this with 4.1.13 from bk:

mysql> explain SELECT item_id FROM test.item_data WHERE section_id IN (166) AND ( 
    -> (field_id = 281 AND data REGEXP  '^a') )  GROUP BY item_id HAVING COUNT(*) = 1
    -> ;
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table     | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | item_data | ref  | PRIMARY       | PRIMARY |       4 | const |   12 | Using where |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)
[29 May 2005 15:35] Jorge del Conde
Hi,

Currently, there is no optimization done for IN subqueries.  We are aware of this problem and are working hard to enable this optimization.

This problem will not be fixed in our 4.1 tree, but you can expect it to be part of our future 5.1 release.

Thanks!
[5 Oct 2005 12:51] Mark Papadakis
I was wondering whether any progress was made in fixing this bug. Is it currently fixed in any mySQL version, including beta/alpha releases?
[26 Mar 2012 19:34] Paul DuBois
Noted in 5.6.5 changelog.

Several subquery performance issues were resolved through the
implementation of semi-join subquery optimization strategies.