Bug #27452 Correlated subquery produces inefficient optimizer plan for INFORMATION_SCHEMA
Submitted: 26 Mar 2007 17:35 Modified: 17 Nov 2008 19:38
Reporter: Jay Pipes Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.40-BK, 5.0.24a-debian-log, 5.1.31-bzr, 6.0.7-alpha, 6.0.9-bzr OS:Linux
Assigned to: CPU Architecture:Any
Tags: I_S, performance, subqueries, subquery

[26 Mar 2007 17:35] Jay Pipes
Description:
Hi!  This is the second of two related bugs.  The first bug is Bug#27451 "INFORMATION_SCHEMA with EXPLAIN and derived table crashes server" which uses the same I_S queries seen below.

This bug shows a severe performance degradation of queries against the I_S virtual database when correlated subqueries are used in place of a derived table (subquery in the FROM clause).

Technically, the two statements here should produce the same results:

SELECT
  t.TABLE_SCHEMA
 , t.TABLE_NAME
 , s.INDEX_NAME
 , s.COLUMN_NAME
 , s.SEQ_IN_INDEX
 , s2.max_columns AS "COLS_IN_INDEX"
 , s.CARDINALITY AS "CARD"
 , t.TABLE_ROWS AS "ROWS"
 , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS "SEL %"
FROM INFORMATION_SCHEMA.STATISTICS s
 INNER JOIN INFORMATION_SCHEMA.TABLES t
  ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
  AND s.TABLE_NAME = t.TABLE_NAME
 INNER JOIN (
  SELECT 
     TABLE_SCHEMA
   , TABLE_NAME
   , INDEX_NAME
   , MAX(SEQ_IN_INDEX) AS max_columns
  FROM INFORMATION_SCHEMA.STATISTICS
  WHERE TABLE_SCHEMA != 'mysql'
  GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
 ) AS s2
 ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA
 AND s.TABLE_NAME = s2.TABLE_NAME
 AND s.INDEX_NAME = s2.INDEX_NAME
WHERE t.TABLE_SCHEMA != 'mysql'
AND t.TABLE_ROWS > 10
AND s.CARDINALITY IS NOT NULL
AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00
ORDER BY "SEL %", TABLE_SCHEMA, TABLE_NAME 
LIMIT 10;

SELECT
  t.TABLE_SCHEMA
 , t.TABLE_NAME
 , s.INDEX_NAME
 , s.COLUMN_NAME
 , s.SEQ_IN_INDEX
 , (
   SELECT MAX(SEQ_IN_INDEX)
   FROM INFORMATION_SCHEMA.STATISTICS s2
   WHERE s.TABLE_SCHEMA = s2.TABLE_SCHEMA
   AND s.TABLE_NAME = s2.TABLE_NAME
   AND s.INDEX_NAME = s2.INDEX_NAME
  ) AS "COLS_IN_INDEX"
 , s.CARDINALITY AS "CARD"
 , t.TABLE_ROWS AS "ROWS"
 , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS "SEL %"
FROM INFORMATION_SCHEMA.STATISTICS s
 INNER JOIN INFORMATION_SCHEMA.TABLES t
  ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
  AND s.TABLE_NAME = t.TABLE_NAME
WHERE t.TABLE_SCHEMA != 'mysql'
AND t.TABLE_ROWS > 10
AND s.CARDINALITY IS NOT NULL
AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00
ORDER BY "SEL %", TABLE_SCHEMA, TABLE_NAME
LIMIT 10;

The top query uses a derived table and the bottom uses a correlated subquery in the SELECT clause.  You can see from the results below the *massive* performance difference favoring the derived table query.  Clearly, something is amiss with the subquery optimization here...  The results below are easily reproduceable across many requests, varying only by a few milliseconds.

How to repeat:
jpipes@shakedown:~$ mysql --user=root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 5.0.24a-Debian_9ubuntu0.1-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SELECT
    ->   t.TABLE_SCHEMA
    ->  , t.TABLE_NAME
    ->  , s.INDEX_NAME
    ->  , s.COLUMN_NAME
    ->  , s.SEQ_IN_INDEX
    ->  , (
    ->    SELECT MAX(SEQ_IN_INDEX)
    ->    FROM INFORMATION_SCHEMA.STATISTICS s2
    ->    WHERE s.TABLE_SCHEMA = s2.TABLE_SCHEMA
    ->    AND s.TABLE_NAME = s2.TABLE_NAME
    ->    AND s.INDEX_NAME = s2.INDEX_NAME
    ->   ) AS "COLS_IN_INDEX"
    ->  , s.CARDINALITY AS "CARD"
    ->  , t.TABLE_ROWS AS "ROWS"
    ->  , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS "SEL %"
    -> FROM INFORMATION_SCHEMA.STATISTICS s
    ->  INNER JOIN INFORMATION_SCHEMA.TABLES t
    ->   ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
    ->   AND s.TABLE_NAME = t.TABLE_NAME
    -> WHERE t.TABLE_SCHEMA != 'mysql'
    -> AND t.TABLE_ROWS > 10
    -> AND s.CARDINALITY IS NOT NULL
    -> AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00
    -> ORDER BY "SEL %", TABLE_SCHEMA, TABLE_NAME
    -> LIMIT 10;
+--------------+------------+-----------------------------+----------------------+--------------+---------------+------+-------+-------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME                  | COLUMN_NAME          | SEQ_IN_INDEX | COLS_IN_INDEX | CARD | ROWS  | SEL % |
+--------------+------------+-----------------------------+----------------------+--------------+---------------+------+-------+-------+
| worklog      | amendments | text                        | text                 |            1 |             1 |    1 | 33794 |  0.00 | 
| planetmysql  | entries    | categories                  | categories           |            1 |             3 |    1 |  4171 |  0.02 | 
| planetmysql  | entries    | categories                  | title                |            2 |             3 |    1 |  4171 |  0.02 | 
| planetmysql  | entries    | categories                  | content              |            3 |             3 |    1 |  4171 |  0.02 | 
| sakila       | payment    | idx_fk_staff_id             | staff_id             |            1 |             1 |    3 | 15422 |  0.02 | 
| sakila       | rental     | idx_fk_staff_id             | staff_id             |            1 |             1 |    4 | 16325 |  0.02 | 
| worklog      | tasks      | title                       | description          |            2 |             2 |    1 |  3567 |  0.03 | 
| worklog      | tasks      | title                       | title                |            1 |             2 |    1 |  3567 |  0.03 | 
| sakila       | inventory  | idx_store_id_film_id        | store_id             |            1 |             2 |    2 |  4673 |  0.04 | 
| sakila       | film       | idx_fk_original_language_id | original_language_id |            1 |             1 |    1 |   953 |  0.10 | 
+--------------+------------+-----------------------------+----------------------+--------------+---------------+------+-------+-------+
10 rows in set (13.21 sec)

mysql> SELECT
    ->   t.TABLE_SCHEMA
    ->  , t.TABLE_NAME
    ->  , s.INDEX_NAME
    ->  , s.COLUMN_NAME
    ->  , s.SEQ_IN_INDEX
    ->  , s2.max_columns AS "COLS_IN_INDEX"
    ->  , s.CARDINALITY AS "CARD"
    ->  , t.TABLE_ROWS AS "ROWS"
    ->  , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS "SEL %"
    -> FROM INFORMATION_SCHEMA.STATISTICS s
    ->  INNER JOIN INFORMATION_SCHEMA.TABLES t
    ->   ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
    ->   AND s.TABLE_NAME = t.TABLE_NAME
    ->  INNER JOIN (
    ->   SELECT 
    ->      TABLE_SCHEMA
    ->    , TABLE_NAME
    ->    , INDEX_NAME
    ->    , MAX(SEQ_IN_INDEX) AS max_columns
    ->   FROM INFORMATION_SCHEMA.STATISTICS
    ->   WHERE TABLE_SCHEMA != 'mysql'
    ->   GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
    ->  ) AS s2
    ->  ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA
    ->  AND s.TABLE_NAME = s2.TABLE_NAME
    ->  AND s.INDEX_NAME = s2.INDEX_NAME
    -> WHERE t.TABLE_SCHEMA != 'mysql'
    -> AND t.TABLE_ROWS > 10
    -> AND s.CARDINALITY IS NOT NULL
    -> AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00
    -> ORDER BY "SEL %", TABLE_SCHEMA, TABLE_NAME 
    -> LIMIT 10;
+--------------+------------+----------------------+-------------+--------------+---------------+------+-------+-------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME           | COLUMN_NAME | SEQ_IN_INDEX | COLS_IN_INDEX | CARD | ROWS  | SEL % |
+--------------+------------+----------------------+-------------+--------------+---------------+------+-------+-------+
| worklog      | amendments | text                 | text        |            1 |             1 |    1 | 33794 |  0.00 | 
| planetmysql  | entries    | categories           | categories  |            1 |             3 |    1 |  4171 |  0.02 | 
| planetmysql  | entries    | categories           | title       |            2 |             3 |    1 |  4171 |  0.02 | 
| planetmysql  | entries    | categories           | content     |            3 |             3 |    1 |  4171 |  0.02 | 
| sakila       | rental     | idx_fk_staff_id      | staff_id    |            1 |             1 |    3 | 16291 |  0.02 | 
| worklog      | tasks      | title                | description |            2 |             2 |    1 |  3567 |  0.03 | 
| worklog      | tasks      | title                | title       |            1 |             2 |    1 |  3567 |  0.03 | 
| sakila       | inventory  | idx_store_id_film_id | store_id    |            1 |             2 |    2 |  4673 |  0.04 | 
| sakila       | payment    | idx_fk_staff_id      | staff_id    |            1 |             1 |    6 | 15123 |  0.04 | 
| sakila       | film       | idx_fk_language_id   | language_id |            1 |             1 |    1 |  1058 |  0.09 | 
+--------------+------------+----------------------+-------------+--------------+---------------+------+-------+-------+
10 rows in set (0.60 sec)

Suggested fix:
None.
[26 Mar 2007 19:12] Valeriy Kravchuk
Thank you for a problem report. Yes, I verified that, even on latest 5.0.40-BK, query with correlated subquery runs substantially slower. But this is expected - correlated subquery is executed many times, while derived table is built only once. Until optimizer will automagically rewrite query with dependent subquery using derived table, nothing can be done. This will be eventually done, but not any time soon.
[16 Oct 2008 15:05] Valeriy Kravchuk
Please, check if MySQL 6.0.6 solves this problem for you.
[17 Nov 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[17 Nov 2008 15:22] Jay Pipes
Problem not only still exists; it is a LOT worse.  The execution time for the non-correlated subquery is ~1.5 seconds.  The time for the correlated subquery is ~1 min. 22 seconds.  This is repeatable.

Below, the output of the EXPLAIN and the SELECTs themselves.

You may notice that the output of the SELECT is actually different between the two results.  This is not because the SELECTs produce different result sets, but because InnoDB provides different index statistics in each run, and therefore the selectivity % numbers vary slightly.

Also note that there is an erroneous result in the Extra column of the first EXPLAIN output which says Impossible WHERE noticed, but there is still a result...

Built new 6.0.7-alpha with compile-pentium-debug-max-no-ndb.

Started db with:

$> cd mysql-test
$> ./mtr --start-and-exit

Loaded databases from forge and planetmysql and a couple other things for testing and started client with:

$> cd ../client
$> ./mysql --user=root --port=9306 --socket=../mysql-test/var/tmp/master.sock < ~/all.db
$> ./mysql --user=root --port=9306 --socket=../mysql-test/var/tmp/master.sock

Output of the SQL commands and EXPLAINs is attached.
[17 Nov 2008 15:23] Jay Pipes
Commands to reproduce and output showing problem

Attachment: bug27452.txt (text/plain), 13.45 KiB.

[17 Nov 2008 16:19] Sergey Petrunya
Neither of the mentioned subqueries are handled by 6.0's subquery optimization. 

The FROM subquery should receive better handling after WL#3485 is done (this is being worked on by Evgen Potemkin, current target is 6.x)

The subquery in select list should be executed better when WL#1117 is done (which we intend to do but noone is working on this ATM).
[17 Nov 2008 16:27] Sergey Petrunya
Jay Pipes wrote:
> Also note that there is an erroneous result in the Extra column of the first EXPLAIN
> output which says Impossible WHERE noticed, but there is still a result...

Confirm, the first EXPLAIN doesn't seem to be correct, we should analyze and fix this
[17 Nov 2008 16:31] Sergey Petrunya
mysql> explain ...
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE noticed after reading const tables

This seems to be repeatable on any database, including the default db created by mysql-test-run.
[17 Nov 2008 16:31] Sergey Petrunya
I was using 6.0-bk.  Valeriy, could you please check if this problem shows up on 5.x also?
[17 Nov 2008 16:35] Sergey Petrunya
Regarding I_S, I don't see any apparent I_S processing problems.  We should check if I_S tables are filled only one time when they are accessed from within a subquery.
[17 Nov 2008 19:38] Valeriy Kravchuk
Verified with recent 6.0.9 from bzr. 5.1.31 from bzr is also affected:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.31-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show tables;
Empty set (0.00 sec)

mysql> create table t1 (c1 int auto_increment primary key, c2 int, key(c2));
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t1 values(1,1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 (c2) select rand() * 1000 from t1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

...

mysql> insert into t1 (c2) select rand() * 1000 from t1;
Query OK, 128 rows affected (0.01 sec)
Records: 128  Duplicates: 0  Warnings: 0

mysql> alter table t1 engine=InnoDB;
Query OK, 256 rows affected (0.06 sec)
Records: 256  Duplicates: 0  Warnings: 0

mysql> update t1 set c2=10;
Query OK, 256 rows affected (0.07 sec)
Rows matched: 256  Changed: 256  Warnings: 0

mysql> analyze table t1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> SELECT   t.TABLE_SCHEMA  , t.TABLE_NAME  , s.INDEX_NAME  , s.COLUMN_NAME  , s.SEQ_IN_INDEX  , s2.max_columns AS `COLS_IN_INDEX`  , s.CARDINALITY AS `CARD`  , t.TABLE_ROWS AS `ROWS`  , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS `SEL %` FROM INFORMATION_SCHEMA.STATISTICS s  INNER JOIN INFORMATION_SCHEMA.TABLES t   ON s.TABLE_SCHEMA = t.TABLE_SCHEMA   AND s.TABLE_NAME = t.TABLE_NAME  INNER JOIN (   SELECT       TABLE_SCHEMA    , TABLE_NAME    , INDEX_NAME    , MAX(SEQ_IN_INDEX) AS max_columns   FROM INFORMATION_SCHEMA.STATISTICS   WHERE TABLE_SCHEMA != 'mysql'   GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME  ) AS s2  ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA  AND s.TABLE_NAME = s2.TABLE_NAME  AND s.INDEX_NAME = s2.INDEX_NAME WHERE t.TABLE_SCHEMA != 'mysql' AND t.TABLE_ROWS > 10 AND s.CARDINALITY IS NOT NULL AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00 ORDER BY `SEL %`, TABLE_SCHEMA, TABLE_NAME  LIMIT 10;
+--------------+------------+------------+-------------+--------------+---------------+------+------+-------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | COLUMN_NAME | SEQ_IN_INDEX | COLS_IN_INDEX | CARD | ROWS | SEL % |
+--------------+------------+------------+-------------+--------------+---------------+------+------+-------+
| test         | t1         | c2         | c2          |            1 |             1 |    2 |  256 |  0.78 |
+--------------+------------+------------+-------------+--------------+---------------+------+------+-------+
1 row in set (0.18 sec)

mysql> EXPLAIN SELECT   t.TABLE_SCHEMA  , t.TABLE_NAME  , s.INDEX_NAME  , s.COLUMN_NAME  , s.SEQ_IN_INDEX  , s2.max_columns AS `COLS_IN_INDEX`  , s.CARDINALITY AS `CARD`  , t.TABLE_ROWS AS `ROWS`  , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS `SEL %` FROM INFORMATION_SCHEMA.STATISTICS s  INNER JOIN INFORMATION_SCHEMA.TABLES t   ON s.TABLE_SCHEMA = t.TABLE_SCHEMA   AND s.TABLE_NAME = t.TABLE_NAME  INNER JOIN (   SELECT       TABLE_SCHEMA    , TABLE_NAME    , INDEX_NAME    , MAX(SEQ_IN_INDEX) AS max_columns   FROM INFORMATION_SCHEMA.STATISTICS   WHERE TABLE_SCHEMA != 'mysql'   GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME  ) AS s2  ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA  AND s.TABLE_NAME = s2.TABLE_NAME  AND s.INDEX_NAME = s2.INDEX_NAME WHERE t.TABLE_SCHEMA != 'mysql' AND t.TABLE_ROWS > 10 AND s.CARDINALITY IS NOT NULL AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00 ORDER BY `SEL %`, TABLE_SCHEMA, TABLE_NAME  LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE noticed after reading const tables
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: STATISTICS
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_frm_only; Scanned all databases; Using temporary; Using filesort
2 rows in set (0.01 sec)