Bug #36449 Significant slowdown for (ANY|SOME) subqueries processed with semijoin
Submitted: 1 May 2008 10:09 Modified: 29 Dec 2008 17:22
Reporter: Alexey Stroganov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:6.0.6-pre OS:Any
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: subquery benchmark

[1 May 2008 10:09] Alexey Stroganov
Description:
This issue was exposed by mysql-bench test. Below are two queries that perform
in more than 2000 times slower than in 6.0.2.

select * from bench5 where c = any (select c from bench6)
select * from bench5 where c = some (select c from bench6)

Tables structures:
CREATE TABLE `bench5` (
  `id` int(11) NOT NULL,
  `b` int(11) DEFAULT NULL,
  `c` char(3) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `bench6i1` (`b`)
) ENGINE=MyISAM;

CREATE TABLE `bench6` (
  `id` int(11) NOT NULL,
  `b` int(11) DEFAULT NULL,
  `c` char(3) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `bench6i` (`b`)
) ENGINE=MyISAM;

mysql> explain select * from bench5 where c = some (select c from bench6);
+----+-------------+--------+------+---------------+------+---------+------+-------+-----------------------------------------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows  | Extra                                         |
+----+-------------+--------+------+---------------+------+---------+------+-------+-----------------------------------------------+
|  1 | PRIMARY     | bench5 | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Start temporary                               |
|  1 | PRIMARY     | bench6 | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Using where; End temporary; Using join buffer |
+----+-------------+--------+------+---------------+------+---------+------+-------+-----------------------------------------------+
2 rows in set (0.00 sec)

mysql> explain select * from bench5 where c = any (select c from bench6);
+----+-------------+--------+------+---------------+------+---------+------+-------+-----------------------------------------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows  | Extra                                         |
+----+-------------+--------+------+---------------+------+---------+------+-------+-----------------------------------------------+
|  1 | PRIMARY     | bench5 | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Start temporary                               |
|  1 | PRIMARY     | bench6 | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Using where; End temporary; Using join buffer |
+----+-------------+--------+------+---------------+------+---------+------+-------+-----------------------------------------------+
2 rows in set (0.00 sec)

mysql> set @@optimizer_switch='no_semijoin';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from bench5 where c = some (select c from bench6);
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
|  1 | PRIMARY     | bench5 | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Using where |
|  2 | SUBQUERY    | bench6 | ALL  | NULL          | NULL | NULL    | NULL | 10000 |             |
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
2 rows in set (0.00 sec)

mysql> explain select * from bench5 where c = any (select c from bench6);
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
|  1 | PRIMARY     | bench5 | ALL  | NULL          | NULL | NULL    | NULL | 10000 | Using where |
|  2 | SUBQUERY    | bench6 | ALL  | NULL          | NULL | NULL    | NULL | 10000 |             |
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
2 rows in set (0.00 sec)

mysql> select * from bench5 where c = some (select c from bench6);
...
+------+------+------+
10000 rows in set (2 min 54.77 sec)

mysql> set @@optimizer_switch='no_semijoin';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from bench5 where c = some (select c from bench6);
....
+------+------+------+
10000 rows in set (0.06 sec)

How to repeat:
- Load attached dump:
mysql -uroot test < bug.subq.any.some.dump

- Run following queries:
select * from bench5 where c = any (select c from bench6)
select * from bench5 where c = some (select c from bench6)
[1 May 2008 10:10] Alexey Stroganov
SQL file for the issue

Attachment: bug.subq.any.some.dump (application/octet-stream, text), 310.46 KiB.

[8 May 2008 5:11] Sergey Petrunya
More data gathering:

MySQL 6.0.2 didn't include any 6.0 subquery optimizations. EXPLAINs and query times:

mysql> explain select * from bench5 where c = any (select c from bench6)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: bench5
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: bench6
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: Using where

Query time: 0.12 sec

mysql> explain select * from bench5 where c = some (select c from bench6)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: bench5
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: bench6
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: Using where

Query time: 0.12 sec

If I replace "select *"  with "select sum(b)", query time is 0.08 sec for both
queries.
[8 May 2008 20:54] Sergey Petrunya
More investigations
===================

The difference between the two queries is purely syntactical. MySQL converts
both x=ANY and x=SOME to  x IN (...) when parsing. So we're actually looking
at only one query.

The outer expression has only one value: bench5.c='GPL' always.
The inner expression has two possible values:
mysql> select c, count(*) from bench6 group by c ;
+------+----------+
| c    | count(*) |
+------+----------+
| BSD  |     5000 | 
| GPL  |     5000 | 
+------+----------+

Handler counters analysis
-------------------------

@@optimizer_switch='no_semijoin,no_materialization': 
Query time 0.19 sec 
| Handler_read_rnd_next      | 20001 | 
IN->EXISTS scans each of the table once (can't get any faster)

@@optimizer_switch='no_semijoin';
Query time 0.10 sec 
| Handler_read_rnd_next      | 20002 | 
| Handler_write              | 10002 | 
Materialization scans each table once and also makes 10K writes into
temp table. The temporary table actually has only two elements.

 
@@optimizer_switch='';
Query time 2 min 54 sec
| Handler_read_rnd_next      | 30003    | 
| Handler_write              | 50000003 | 

One of the tables is read two times (perhaps join buffer is not big enough
to make just one sweep), and there are 50M duplicate elimination checks. It
seems that they are the problem.

Note that if we try the corresponding join it will also be very slow:

mysql>  select count(*) from bench5,bench6 where bench5.c = bench6.c;
+----------+
| count(*) |
+----------+
| 50000000 | 
+----------+
1 row in set (1 min 4.61 sec)
[8 May 2008 20:56] Sergey Petrunya
This bug should be addressed by WL#3985.
[25 Nov 2008 16:11] Sergey Petrunya
With WL#3985code, I get this:

mysql> set @@optimizer_switch='no_materialization,no_semijoin';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from bench5 where c = any (select c from bench6)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: bench5
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: bench6
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: Using where
2 rows in set (0.02 sec)

mysql> set @@optimizer_switch='';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select sum(b) from bench5 where c = any (select c from bench6)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: bench5
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: bench6
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10000
        Extra: Using where; FirstMatch(bench5)
2 rows in set (0.13 sec)

Average query run times on my machine:

'no_materialization,no_semijoin':  0.11 sec 
(all optimizations on)          :  0.07 sec
[25 Nov 2008 16:15] Sergey Petrunya
That is, WL#3985 code makes a cost-based decision to use the FirstMatch strategy. FirstMatch execution is similar to the pre-6.0 execution strategy of this query. 

I'm not sure why FirstMatch it is faster - most likely running FirstMatch in a semi-join is faster than entering/leaving subquery re-evaluation code.
[25 Nov 2008 17:27] Sergey Petrunya
The patch is WL#3985 code
[29 Dec 2008 17:21] Sergey Petrunya
Changing status to closed as WL#3985 has been pushed into main tree.