Bug #30659 300% Performance Regression 5.0.45 -> 5.2.6 on Correlated Subquery
Submitted: 28 Aug 2007 1:03 Modified: 31 Aug 2007 9:59
Reporter: Jay Pipes Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.2.6 OS:Linux (2.6.20)
Assigned to: Gleb Shchepa CPU Architecture:Any
Tags: Optimizer, performance, subquery

[28 Aug 2007 1:03] Jay Pipes
Description:
5.2 is supposed to have subquery optimizations, but my initial tests show a dramatic performance degradation for (at least) correlated subqueries.

How to repeat:
Loaded up clean 5.0.45 and 5.2.6 MySQL servers with the Sakila Sample database.  Issued identical queries with EXPLAIN, and then without and compare results.

SQL query:

SELECT 
  p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name
FROM payment p
INNER JOIN customer c
ON p.customer_id = c.customer_id
WHERE p.payment_date = (
 SELECT MAX(payment_date)
 FROM payment
 WHERE payment.customer_id = p.customer_id
)\G

The EXPLAIN output should be the same:

mysql> EXPLAIN SELECT    p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN customer c ON p.customer_id = c.customer_id WHERE p.payment_date = (  SELECT MAX(payment_date)  FROM payment  WHERE payment.customer_id = p.customer_id )\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: c
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 590
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: p
         type: ref
possible_keys: idx_fk_customer_id
          key: idx_fk_customer_id
      key_len: 2
          ref: sakila.c.customer_id
         rows: 14
        Extra: Using where
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: payment
         type: ref
possible_keys: idx_fk_customer_id
          key: idx_fk_customer_id
      key_len: 2
          ref: sakila.p.customer_id
         rows: 14
        Extra: 
3 rows in set (0.01 sec)

When executing the query on my machine, with identical configs for both instances, performance of the above query averaged 3.6 seconds, with variance of ~.4 seconds.  5.0.45 showed performance of 1.3 seconds, with a variance of ~.2 seconds.

Additionally, I tested the following SQL query:

SELECT 
  p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name
FROM payment p
INNER JOIN (
  SELECT customer_id, MAX(payment_date) AS payment_date
  FROM payment
  GROUP BY customer_id
) AS last_orders
ON p.customer_id = last_orders.customer_id
AND p.payment_date = last_orders.payment_date
INNER JOIN customer c
ON p.customer_id = c.customer_id\G

The EXPLAIN output were different:

5.0.45:
-------
mysql> EXPLAIN SELECT 
    ->   p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name
    -> FROM payment p
    -> INNER JOIN (
    ->   SELECT customer_id, MAX(payment_date) AS payment_date
    ->   FROM payment
    ->   GROUP BY customer_id
    -> ) AS last_orders
    -> ON p.customer_id = last_orders.customer_id
    -> AND p.payment_date = last_orders.payment_date
    -> INNER JOIN customer c
    -> ON p.customer_id = c.customer_id\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 599
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: c
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: last_orders.customer_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: p
         type: ref
possible_keys: idx_fk_customer_id
          key: idx_fk_customer_id
      key_len: 2
          ref: sakila.c.customer_id
         rows: 13
        Extra: Using where
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: payment
         type: index
possible_keys: NULL
          key: idx_fk_customer_id
      key_len: 2
          ref: NULL
         rows: 15123
        Extra: 
4 rows in set (0.10 sec)

5.2.6
-----
mysql> EXPLAIN SELECT 
    ->   p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name
    -> FROM payment p
    -> INNER JOIN (
    ->   SELECT customer_id, MAX(payment_date) AS payment_date
    ->   FROM payment
    ->   GROUP BY customer_id
    -> ) AS last_orders
    -> ON p.customer_id = last_orders.customer_id
    -> AND p.payment_date = last_orders.payment_date
    -> INNER JOIN customer c
    -> ON p.customer_id = c.customer_id\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 599
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: c
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: last_orders.customer_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: p
         type: ref
possible_keys: idx_fk_customer_id
          key: idx_fk_customer_id
      key_len: 2
          ref: sakila.c.customer_id
         rows: 14
        Extra: Using index condition; Using where
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: payment
         type: index
possible_keys: NULL
          key: idx_fk_customer_id
      key_len: 2
          ref: NULL
         rows: 16323
        Extra: 
4 rows in set (0.17 sec)

Do note the long EXPLAIN times (repeatable on both instances)

Note the 5.2.6 EXPLAIN plan shows the new "Using index condition" "optimization" in 5.2.6 optimizer.

The performance of the 5.2.6 instance with the above query is 300% SLOWER than the 5.0.45 one.  The 5.2.6 instance executed, on average, in .35 seconds, while the 5.0.45 instance executed the query, on average, in .12 seconds.

Suggested fix:
If 5.2.6 is supposed to contain subquery performance improvements, I don't see any; in fact, we see a regression.
[28 Aug 2007 19:04] Sveta Smirnova
Thank you for the report.

Verified as described on Linux using last BK sources. Mac version is not affected (both 5.0 and 5.2 shows same results like 5.2 on Linux)
[30 Aug 2007 17:30] Gleb Shchepa
Tested on 32bit Linux with BUILD/compile-pentium-max (no debugging code).
Results are same on 5.0 and 5.2:

---------------------------------------------------------------------------

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.50    |
+-----------+
1 row in set (0.00 sec)

mysql> SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN customer c ON p.customer_id = c.customer_id WHERE p.payment_date = (SELECT MAX(payment_date) FROM payment WHERE payment.customer_id = p.customer_id )\G

...

623 rows in set (0.88 sec)

---------------------------------------------------------------------------

mysql> select version();
+-------------+
| version()   |
+-------------+
| 5.2.6-alpha |
+-------------+
1 row in set (0.00 sec)

mysql> SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN customer c ON p.customer_id = c.customer_id WHERE p.payment_date = (SELECT MAX(payment_date) FROM payment WHERE payment.customer_id = p.customer_id )\G

...
623 rows in set (0.90 sec)

===========================================================================

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.50    |
+-----------+
1 row in set (0.00 sec)

mysql> SELECT    p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN (   SELECT customer_id, MAX(payment_date) AS payment_date   FROM payment   GROUP BY customer_id ) AS last_orders ON p.customer_id = last_orders.customer_id AND p.payment_date = last_orders.payment_date INNER JOIN customer c ON p.customer_id = c.customer_id\G

...
623 rows in set (0.06 sec)

---------------------------------------------------------------------------

mysql> select version();
+-------------+
| version()   |
+-------------+
| 5.2.6-alpha |
+-------------+
1 row in set (0.00 sec)

mysql> SELECT    p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN (   SELECT customer_id, MAX(payment_date) AS payment_date   FROM payment   GROUP BY customer_id ) AS last_orders ON p.customer_id = last_orders.customer_id AND p.payment_date = last_orders.payment_date INNER JOIN customer c ON p.customer_id = c.customer_id\G

...
623 rows in set (0.06 sec)