Bug #28257 Optimizer does not use a required multi-key index on subquery selects.
Submitted: 5 May 2007 5:59 Modified: 14 Jan 2013 17:08
Reporter: Aaron Gavic Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:4.1.21-standard-log, 5.0.77 OS:Any
Assigned to: Matthew Lord CPU Architecture:Any
Tags: subquery benchmark

[5 May 2007 5:59] Aaron Gavic
Description:
I started looking into this issue with the very old <a href="http://bugs.mysql.com/bug.php?id=14070">Bug #14070</a>.  Thought it might be best to just create a new, clean bug.

<strong>The Bug</strong>
Optimizer does not use a required multi-key index on subquery selects.

<strong>State from SHOW PROCESSLIST when hanging.</strong>
"Copying to tmp table"
"Sending data"

<strong>Quick Fix</strong>
Kill the subquery process.

How to repeat:
I'm attaching a zip file with my example tables and data.  Maintable has 40,000 rows. Subtable has 5,000.  Return times will depend on your server's hardware.

<strong>Example Tables:</strong>

CREATE TABLE `maintable` (
  `table_id` int(11) NOT NULL auto_increment,
  `field1` int(11) NOT NULL default '0',
  PRIMARY KEY  (`table_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=40001 ;

CREATE TABLE `subtable` (
  `table_id` int(11) NOT NULL auto_increment,
  `field1` int(11) NOT NULL default '0',
  `field2` int(11) NOT NULL default '0',
  PRIMARY KEY  (`table_id`),
  KEY `index_two_fields` (`field1`,`field2`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=5001 ;

<strong>Select Condition #1:</strong>
This returned after 86 seconds on my box.

<strong>
SELECT field1
FROM maintable
WHERE field1
IN (
SELECT DISTINCT field2
FROM subtable
WHERE field1 >1
AND field2 >1
)

<strong>Select Condition #2:</strong>
This returned after 72 seconds on my box.

SELECT field1
FROM maintable
WHERE field1
IN (
SELECT max(field2)
FROM subtable
WHERE field1 >1
AND field2 >1
)

Suggested fix:
Fix the optimizer.  Or was it in 5.0?

<strong>Select Condition #1 - Workaround</strong>
Add a single key index on the distinct field.

KEY `index_field2` (`field2`)

<strong>Select Condition #2 - Workaround</strong>
None I know of.
[5 May 2007 6:02] Aaron Gavic
Test Tables and Data

Attachment: bug28257.zip (application/zip, text), 203.62 KiB.

[5 May 2007 6:09] Aaron Gavic
Sorry for the HTML code everywhere - I was hoping to would go through.
[15 Jun 2007 12:56] Valeriy Kravchuk
Thank you for a problem report. This is a known problem. Index is used, but subquery is considered DEPENDENT (wrongly). This will be fixed in 5.x.y eventually. Look:

mysql> explain SELECT field1 FROM maintable WHERE field1 IN ( SELECT DISTINCT f
ield2 FROM subtable WHERE field1 >1 AND field2 >1 );
+----+--------------------+-----------+-------+------------------+--------------
----+---------+------+-------+-------------------------------------------+
| id | select_type        | table     | type  | possible_keys    | key
    | key_len | ref  | rows  | Extra                                     |
+----+--------------------+-----------+-------+------------------+--------------
----+---------+------+-------+-------------------------------------------+
|  1 | PRIMARY            | maintable | ALL   | NULL             | NULL
    |    NULL | NULL | 40000 | Using where                               |
|  2 | DEPENDENT SUBQUERY | subtable  | range | index_two_fields | index_two_fie
lds |       4 | NULL |  4999 | Using where; Using index; Using temporary |
+----+--------------------+-----------+-------+------------------+--------------
----+---------+------+-------+-------------------------------------------+
2 rows in set (0.01 sec)

mysql> SELECT field1 FROM maintable WHERE field1 IN ( SELECT DISTINCT field2 FR
OM subtable WHERE field1 >1 AND field2 >1 );
+--------+
| field1 |
+--------+
|      2 |
|      3 |
|      4 |
|      5 |
+--------+
4 rows in set (2 min 19.61 sec)

mysql> explain SELECT field1
    -> FROM maintable
    -> WHERE field1
    -> IN (
    -> SELECT max(field2)
    -> FROM subtable
    -> WHERE field1 >1
    -> AND field2 >1
    -> );
+----+--------------------+-----------+-------+------------------+--------------
----+---------+------+-------+--------------------------+
| id | select_type        | table     | type  | possible_keys    | key
    | key_len | ref  | rows  | Extra                    |
+----+--------------------+-----------+-------+------------------+--------------
----+---------+------+-------+--------------------------+
|  1 | PRIMARY            | maintable | ALL   | NULL             | NULL
    |    NULL | NULL | 40000 | Using where              |
|  2 | DEPENDENT SUBQUERY | subtable  | range | index_two_fields | index_two_fie
lds |       4 | NULL |  4999 | Using where; Using index |
+----+--------------------+-----------+-------+------------------+--------------
----+---------+------+-------+--------------------------+
2 rows in set (0.00 sec)

And here is a well-known workaround on how to "materialize" subquery only once:

mysql> explain SELECT field1 FROM maintable WHERE field1 IN ( select * from (SE
LECT max(field2) FROM subtable WHERE field1 >1 AND field2 >1) t );
+----+--------------------+------------+--------+------------------+------------
------+---------+------+-------+--------------------------+
| id | select_type        | table      | type   | possible_keys    | key
      | key_len | ref  | rows  | Extra                    |
+----+--------------------+------------+--------+------------------+------------
------+---------+------+-------+--------------------------+
|  1 | PRIMARY            | maintable  | ALL    | NULL             | NULL
      |    NULL | NULL | 40000 | Using where              |
|  2 | DEPENDENT SUBQUERY | <derived3> | system | NULL             | NULL
      |    NULL | NULL |     1 |                          |
|  3 | DERIVED            | subtable   | range  | index_two_fields | index_two_f
ields |       4 | NULL |  4999 | Using where; Using index |
+----+--------------------+------------+--------+------------------+------------
------+---------+------+-------+--------------------------+
3 rows in set (0.00 sec)

mysql> SELECT field1 FROM maintable WHERE field1 IN ( select * from (SELECT max
(field2) FROM subtable WHERE field1 >1 AND field2 >1) t );
+--------+
| field1 |
+--------+
|      5 |
+--------+
1 row in set (0.03 sec)
[23 Jan 2008 16:37] Sergey Petrunya
Both of the provided queries are covered by new subquery optimizations in MySQL 6.0: 

Query 1:
SELECT field1 FROM maintable WHERE field1 IN ( SELECT DISTINCT field2 FROM subtable WHERE field1 >1 AND field2 >1 );

Execution times:
6.0, default choice: NL-semi-join duplicate elimination: 44 sec.
6.0, @@optimizer_switch='no_semijoin', uses materialization: 0.16 sec.
6.0, all new optimizations disabled (so it behaves like 5.x): 7 min 25 sec

So we have a 10x improvement with a potential to go to 2700x improvement. This potential should be realized when WL#3985 is implemented.

Query 2:
SELECT field1 FROM maintable WHERE field1 IN ( SELECT max(field2) FROM subtable WHERE field1 >1 AND field2 >1 );

6.0, default choice, materialization: 0.16 sec.
6.0, all new optimizations disabled (so it behaves like 5.x):8 min 22.57 sec

Here we have 3140x improvement.

Query EXPLAINs
--------------
MySQL 6.0.5, @@optimizer_switch='';
mysql> explain SELECT field1 FROM maintable WHERE field1 IN ( SELECT DISTINCT field2 FROM subtable WHERE field1 >1 AND field2 >1 )\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: subtable
         type: index
possible_keys: index_two_fields
          key: index_two_fields
      key_len: 8
          ref: NULL
         rows: 5000
        Extra: Using where; Using index; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: maintable
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 40000
        Extra: Using where; End temporary; Using join buffer
2 rows in set (0.00 sec)

MySQL 6.0.5, @@optimizer_switch='no_semijoin';
mysql> explain SELECT field1 FROM maintable WHERE field1 IN ( SELECT DISTINCT field2 FROM subtable WHERE field1 >1 AND field2 >1 )\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: maintable
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 40000
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: subtable
         type: index
possible_keys: index_two_fields
          key: index_two_fields
      key_len: 8
          ref: NULL
         rows: 5000
        Extra: Using where; Using index; Using temporary
2 rows in set (0.00 sec)

MySQL 6.0.5, @@optimizer_switch='';
mysql> explain SELECT field1 FROM maintable WHERE field1 IN ( SELECT max(field2) FROM subtable WHERE field1 >1 AND field2 >1 );
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: maintable
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 40000
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: subtable
         type: index
possible_keys: index_two_fields
          key: index_two_fields
      key_len: 8
          ref: NULL
         rows: 5000
        Extra: Using where; Using index
2 rows in set (0.00 sec)
[16 Oct 2008 15:01] Valeriy Kravchuk
Please, check if MySQL 6.0.6 solves this problem for you.
[17 Nov 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[13 Jul 2011 8:48] Shashidhar Y
Hi

I'm facing similar kind of issue what is reported here and at http://bugs.mysql.com/bug.php?id=14070.

I am using MySQL 5.0.77 on Linux box. I have sql query which was running just fine till last friday, but from this monday it started giving problem. I have a script which inserts some records this table, but this table neither responds nor exits. It simply hangs at State - Copying to tmp table.

I tried all the things, check table for errors, analyzed table, flushed it off and finally I went restarting "mysqld" service on the server. But I don't see any improvement, it simply hangs at "Copying to tmp table".

Kindly please note that, this sql was running fine till last Friday. It suddenly stopped and I'm able to do selects on this table. My query consists of joins on other table and that table too fine, no issues on that as well.

I'm asking here because this seems to be the problem reported here and in some of the bugs, it seems to be solved for few people (not sure). I don't know whether it is bug with MySQL, so kindly redirect me to appropriate place if it is bug (reported/solved) and if not accept my apologies and help me to know the workaround.
[28 Oct 2011 11:57] Valeriy Kravchuk
All reporters,

Dependent subquery problem is NOT solved in any current GA version of MySQL server. Fix is in 5.6.x (and discontinued 6.0.x) only.

So, please, send the exact query that hangs in "Copying to tmp table" status and EXPLAIN results for it (or for equivalent SELECT query with the same WHERE clause). Then we will be able to check if this is a duplicate and what can be done to workaround the problem.
[29 Nov 2011 7:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[27 Mar 2012 14:03] Brian Dalby
Hi, 

I have a very similar bug. It hangs with 'Copying to tmp table', I have also seen it hanging with "sending data"... 
I would appreciate very much a solution to this problem.

select version(); output:
'5.1.41-3ubuntu12.10-log'

Select Query: 

SELECT SQL_CACHE
node.is_folder,
node.id,
node.name,
node.bug,
node.known_error,
(COUNT(parent.name)-1) AS level,
node.lft,
node.rgt,
node.status_id
FROM test_testcase AS node, test_testcase AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.lft
ORDER BY node.lft;

It gets 4783 results, and has been working for several years.

Explain string:
explain SELECT SQL_CACHE
node.is_folder,
node.id,
node.name,
node.bug,
node.known_error,
(COUNT(parent.name)-1) AS level,
node.lft,
node.rgt,
node.status_id
FROM test_testcase AS node, test_testcase AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt 
GROUP BY node.lft
ORDER BY node.lft;

Result of Explain:
1, 'SIMPLE', 'node', 'ALL', 'indx_testcase_lft', '', '', '', 5607, 'Using temporary; Using filesort'
1, 'SIMPLE', 'parent', 'ALL', 'indx_testcase_rgt,indx_testcase_lft', '', '', '', 5607, 'Range checked for each record (index map: 0x6)'

Table create statement:
DROP TABLE IF EXISTS `beotest`.`test_testcase`;
CREATE TABLE  `beotest`.`test_testcase` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  `status_id` int(10) unsigned DEFAULT NULL,
  `steps` text,
  `description` text,
  `priority_id` int(10) unsigned DEFAULT NULL,
  `lft` int(10) unsigned NOT NULL,
  `rgt` int(10) unsigned NOT NULL,
  `known_error` varchar(100) DEFAULT NULL,
  `dependent_from_id` int(10) unsigned DEFAULT NULL,
  `is_folder` tinyint(4) NOT NULL DEFAULT '0',
  `latest_user_edit` varchar(45) NOT NULL,
  `bug` text,
  `creation_time` datetime NOT NULL DEFAULT '1000-01-01 00:00:00',
  PRIMARY KEY (`id`),
  KEY `indx_testcase_rgt` (`rgt`),
  KEY `indx_testcase_lft` (`lft`),
  KEY `tst_testcase_status` (`status_id`),
  KEY `tst_testcase_priority` (`priority_id`),
  KEY `tst_testcase_dependencies` (`dependent_from_id`),
) ENGINE=InnoDB AUTO_INCREMENT=4863 DEFAULT CHARSET=latin1;
[28 Mar 2012 12:02] Valeriy Kravchuk
Not sure how last comment is related to the original bug report (it was about subqueries). If you are not happy with the query plan and use InnoDB tables, try to run ANALYZE TABLE on them probably...
[28 Mar 2012 12:08] Brian Dalby
I know the original query had subqueries, but the description, heading and other details correspond to my problems.

But if my information cannot be used, then so be it.

Other than that, the bug seems to have dissapeared as suddently as it was created. I still fear it would come back, but I have not had the problem today.
[1 May 2012 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[4 Oct 2012 21:57] Petr Medonos
Hi,
I just try 5.6.6 m9 and i think that this bug is still alive :).

query:

SELECT SQL_CALC_FOUND_ROWS DISTINCT item.id FROM item JOIN item_available ON item.id = item_available.item_id JOIN price ON price.item_id = item.id JOIN erp_object ON erp_object.id = item.id JOIN item_in_category iic ON iic.item_id = item.id JOIN attr_value av0 ON av0.item_id = item.id AND av0.attr_type_id = 321126 JOIN attr_value av1 ON av1.item_id = item.id AND av1.attr_type_id = 320074 JOIN attr_value av2 ON av2.item_id = item.id AND av2.attr_type_id = 320078 JOIN attr_value av3 ON av3.item_id = item.id AND av3.attr_type_id = 321125 JOIN attr_value av4 ON av4.item_id = item.id AND av4.attr_type_id = 320212 LEFT JOIN item_bind ON item.id = item_bind.item_id AND item_bind.bind_type = 'ACC_GIFT' WHERE  item_available.hidden = 0 AND item_available.shop = 'ESHOP' AND item_available.segment_id = 1 AND item.deleted = 0 AND price.valid_from <= now() AND price.valid_to > now() AND price.price > 0 AND price.price_type = 'ESHOP' AND price.segment_id = item_available.segment_id AND iic.category_id IN (319928,349521,319927) AND item.manufacturer_id = 319389 AND av0.val_txt_cz = 'Ano' AND av1.val_txt_cz = 'B' AND av2.val_txt_cz = 'A' AND av3.val_txt_cz = 'předem' AND (av4.val_float >= 6.0 AND av4.val_float < 7.0) ORDER BY  (item.id not IN (338027,299778,291181,312654,15043,299777,235725,239521,239518,335867,338792,145334,167932,239520,435382,428675,452489,436892,447584,428557,426148,451912,450062,436264,436887,436891,436889,437376,432623,436749,440102,434953,436929,425104,437302,428676,451913,451914,437205,436207,338275,434973,428674,452997,434952,451602,338835,341539,327618,390961,396327,312176,411699,450076,395120,418222,321223,355306,392891,392886,316864,138288,435414,434421,315924,425488,341511,339003,315970,299898,449919,314538,315971,338285,321222,9812,327355,253009,331045,284154,344704,405410,16977,316156,17782,436197,331769,138281,407995,236376,284155,284159,436446,218017,230084,344728,408433,284158,396337,327623,453036,435658,423147,341509,423144,352298,423146,434658,415856,451898,451897,451895,397755,405634,408674,408675,415858,426116,423627,425926,437385,450038,434655,453035,354285,440652,440732,440734,404638,318616,426121,452227,440735,451899,451896,415855,415857,442041,438151,452226,452228,402695,396551,423618,425791,426117,392584,413262,435424,423628,438284,384921,438286,438285,384890,384886,384904,384653,384905,384877,384898,390466,390471,251655)), item_available.available_id=6 DESC, item_available.available_id=14 DESC, item_available.available_id, item_bind.item_id is null, item.description_cz DESC, item.id LIMIT 0,10;

This query runs about 12 min. When i use straight join hint its just about 0.01s. Both explains are in attachment.

PMP shows most of the time this thread stack:

??,??,??,handler::ha_index_read_map(unsigned,??,sub_select(JOIN*,,JOIN_CACHE::generate_full_extensions(unsigned,JOIN_CACHE_BNL::join_matching_records(bool),JOIN_CACHE::join_records(bool),JOIN_CACHE::join_records(bool),sub_select_op(JOIN*,,JOIN::exec(),mysql_select(THD*,,handle_select(THD*,,??,mysql_execute_command(THD*),mysql_parse(THD*,,dispatch_command(enum_server_command,,do_handle_one_connection(THD*),handle_one_connection,pfs_spawn_thread,start_thread,clone

I can provide description of all used tables or full pmp log (one second sampling) if it helps.
[4 Oct 2012 21:57] Petr Medonos
explain

Attachment: explain_28257.txt (text/plain), 7.75 KiB.

[14 Jan 2013 17:08] Matthew Lord
Hi Peter,

I have corrected the synopsis to match what this bug report was actually about.

This bug report was not about the generic symptom of "copying to tmp table" or "sending data", which is the symptom for any slow query and is way too generic for a bug report. This particular bug report was about a specific cause of that symptom, that being the lack of indexes used for dependent subqueries.

This specific issue was fixed in MySQL 5.6, so I will mark this as closed and fixed in 5.6.

If you feel that you have another optimizer bug in 5.6, which it sounds like you likely do, can you please open another bug report and provide a repeatable test case there so that we can take a closer look?

Thanks!

Matt
[14 Jan 2013 17:13] Matthew Lord
This general issue was resolved in MySQL 5.6:
  https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html

If you have a specific example where you can demonstrate that subqueries are still not being handled optimally in MySQL 5.6, then please open a new bug report with a working test case that demonstrates the specific case. Then we can examine the specific case and work towards some specific improvements.