Bug #35468 Slowdown and wrong result for uncorrelated subquery w/o where
Submitted: 20 Mar 2008 20:01 Modified: 22 Nov 2010 0:43
Reporter: Alexey Stroganov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0.4 OS:Any
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: LooseScan, subqueries

[20 Mar 2008 20:01] Alexey Stroganov
Description:
There are couple issues with execution of following query:

select count(expr_key) from ot where expr_key in (select expr_key FROM it);

1. Execution of such query with new optimizations (semijoin or materialization)
is slower than without. For semijoin it likely caused by second issue (see below).

semijoin(loosescan)(wrong)=1.24sec
materialization=0.25sec
without optimizations=0.02sec

set @@optimizer_switch='';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+----+-------------+-------+-------+---------------+----------+---------+--------------------+--------+-------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref                | rows   | Extra       |
+----+-------------+-------+-------+---------------+----------+---------+--------------------+--------+-------------+
|  1 | PRIMARY     | it    | index | expr_key      | PRIMARY  | 4       | NULL               | 320000 | LooseScan   |
|  1 | PRIMARY     | ot    | ref   | expr_key      | expr_key | 4       | test10.it.expr_key |      1 | Using index |
+----+-------------+-------+-------+---------------+----------+---------+--------------------+--------+-------------+
2 rows in set (0.00 sec)

mysql> select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+-----------------+
| count(expr_key) |
+-----------------+
|          320000 |
+-----------------+
1 row in set (1.24 sec)

mysql> set @@optimizer_switch='no_semijoin';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+----+-------------+-------+-------+---------------+----------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+---------------+----------+---------+------+--------+--------------------------+
|  1 | PRIMARY     | ot    | index | NULL          | expr_key | 4       | NULL |  10000 | Using where; Using index |
|  2 | SUBQUERY    | it    | index | NULL          | expr_key | 4       | NULL | 320000 | Using index              |
+----+-------------+-------+-------+---------------+----------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)

mysql> select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+-----------------+
| count(expr_key) |
+-----------------+
|           10000 |
+-----------------+
1 row in set (0.25 sec)

mysql> set @@optimizer_switch='no_materialization,no_semijoin';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+----+--------------------+-------+----------------+---------------+----------+---------+------+-------+--------------------------+
| id | select_type        | table | type           | possible_keys | key      | key_len | ref  | rows  | Extra                    |
+----+--------------------+-------+----------------+---------------+----------+---------+------+-------+--------------------------+
|  1 | PRIMARY            | ot    | index          | NULL          | expr_key | 4       | NULL | 10000 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | it    | index_subquery | expr_key      | expr_key | 4       | func |    32 | Using index              |
+----+--------------------+-------+----------------+---------------+----------+---------+------+-------+--------------------------+
2 rows in set (0.00 sec)

mysql> select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+-----------------+
| count(expr_key) |
+-----------------+
|           10000 |
+-----------------+
1 row in set (0.02 sec)

2. Query returns wrong result when optimizer chooses SJ(Loosescan) strategy.

mysql> select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+-----------------+
| count(expr_key) |
+-----------------+
|          320000 |
+-----------------+

instead of 

mysql> set @@optimizer_switch='no_materialization,no_semijoin';
Query OK, 0 rows affected (0.00 sec)

mysql> select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+-----------------+
| count(expr_key) |
+-----------------+
|           10000 |
+-----------------+

How to repeat:
1. Run attached script to generate/load data with following command line:

perl subquery-datagen.pl --fanout=32 --ot-rows=10000 --generate --load-data --socket=/tmp/mysql.sock --user=root --database=test10

2. Repeat necessary steps from description.
[20 Mar 2008 20:03] Alexey Stroganov
Datagen/loader tool for test case

Attachment: subquery-datagen.pl (application/x-perl, text), 6.97 KiB.

[3 May 2008 5:37] Sergey Petrunya
With the latest mysql-6.0-opt, I get:

mysql> explain select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ot
         type: index
possible_keys: NULL
          key: expr_key
      key_len: 4
          ref: NULL
         rows: 10000
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: it
         type: index_subquery
possible_keys: expr_key
          key: expr_key
      key_len: 4
          ref: func
         rows: 1
        Extra: Using index
2 rows in set (0.00 sec)

Considering the table it has 50K rows, and it.expr_key has 50K distinct values, this is the best join order and the best strategy to run this semi-join.

Query times obtained by re-running the query w/ different values of @@optimizer_switch:
semi-join:        0.17 sec
materialization:  0.29 sec
no optimizations: 0.18 sec
[3 May 2008 6:08] Sergey Petrunya
Oops. Please disregard the last comment. I was working on the wrong dataset. The attached subquery-datagen.pl actually doesn't understand --fanout=32 option and so has generated the data with --fanout=5.  Using the script attached to BUG#35032, I can repeat.
[3 May 2008 7:39] Sergey Petrunya
mysql> explain select count(expr_key) from ot where expr_key in ( SELECT  expr_key FROM it);
+----+-------------+-------+-------+---------------+----------+---------+-----------------
---+--------+-------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref             
  | rows   | Extra       |
+----+-------------+-------+-------+---------------+----------+---------+-----------------
---+--------+-------------+
|  1 | PRIMARY     | it    | index | expr_key      | PRIMARY  | 4       | NULL            
  | 320000 | LooseScan   |

The problem lies here. Loose index scan should be not done on index expr_key, not on PRIMARY index.
[28 May 2008 5:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47116

ChangeSet@1.2638, 2008-05-28 09:54:00+04:00, sergefp@mysql.com +9 -0
  BUG#35468: Slowdown and wrong result for uncorrelated subquery w/o where
  - Moved LooseScan functionality from READ_RECORD functions to join runtime.
    (This enables use of insideout together with range access)
  - Let best_access_path() save the number of the index to be used by the
    insideout scan and #keyparts (just keeping a flag wasn't enough for 
    full-index insideout scans).
  - Made setup_semijoin_dups_elimination() define and use an enum instead 
    of numeric constants.
[29 May 2008 3:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47180

ChangeSet@1.2638, 2008-05-29 07:07:29+04:00, sergefp@mysql.com +9 -0
  BUG#35468: Slowdown and wrong result for uncorrelated subquery w/o where
  - Moved LooseScan functionality from READ_RECORD functions to join runtime.
    (This enables use of insideout together with range access)
  - Let best_access_path() save the number of the index to be used by the
    insideout scan and #keyparts (just keeping a flag wasn't enough for 
    full-index insideout scans).
  - Made setup_semijoin_dups_elimination() define and use an enum instead 
    of numeric constants.
  - Addressed review feedback
[31 May 2008 7:33] Sergey Petrunya
Pushed into bzr_6.0-opt tree
[31 May 2008 17:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47280

2649 Sergey Petrunia	2008-05-31
      BUG#35468: Slowdown and wrong result for uncorrelated subquery w/o where
      - Moved LooseScan functionality from READ_RECORD functions to join runtime.
        (This enables use of insideout together with range access)
      - Let best_access_path() save the number of the index to be used by the
        insideout scan and #keyparts (just keeping a flag wasn't enough for 
        full-index insideout scans).
      - Made setup_semijoin_dups_elimination() define and use an enum instead 
        of numeric constants.
      - Addressed review feedback
[31 May 2008 17:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47283

2649 Sergey Petrunia	2008-05-31
      BUG#35468: Slowdown and wrong result for uncorrelated subquery w/o where
      - Moved LooseScan functionality from READ_RECORD functions to join runtime.
        (This enables use of insideout together with range access)
      - Let best_access_path() save the number of the index to be used by the
        insideout scan and #keyparts (just keeping a flag wasn't enough for 
        full-index insideout scans).
      - Made setup_semijoin_dups_elimination() define and use an enum instead 
        of numeric constants.
      - Addressed review feedback
[28 Aug 2008 20:15] Bugs System
Pushed into 6.0.7-alpha  (revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (version source revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (pib:3)
[2 Sep 2008 18:37] Paul DuBois
Noted in 6.0.7 changelog.

For uncorrelated subqueries without a WHERE clause, use of semi-join
or materialization options could result in slow performance, or use
of the LooseScan strategy could produce incorrect results.
[13 Sep 2008 22:22] Bugs System
Pushed into 6.0.6-alpha  (revid:sergefp@mysql.com-20080531071457-utx02hn5wr0d58sp) (version source revid:hakan@mysql.com-20080716105246-eg0utbybp122n2w9) (pib:3)
[16 Aug 2010 6:33] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:25] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[22 Nov 2010 0:43] Paul DuBois
Noted in 5.6.1 changelog.
[23 Nov 2010 2:21] Paul DuBois
Correction: No 5.6.1 changelog entry. Bug does not appear in any released 5.6.x version.