| Bug #30659 | 300% Performance Regression 5.0.45 -> 5.2.6 on Correlated Subquery | ||
|---|---|---|---|
| Submitted: | 28 Aug 2007 1:03 | Modified: | 31 Aug 2007 9:59 |
| Reporter: | Jay Pipes | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 5.2.6 | OS: | Linux (2.6.20) |
| Assigned to: | Gleb Shchepa | CPU Architecture: | Any |
| Tags: | Optimizer, performance, subquery | ||
[28 Aug 2007 19:04]
Sveta Smirnova
Thank you for the report. Verified as described on Linux using last BK sources. Mac version is not affected (both 5.0 and 5.2 shows same results like 5.2 on Linux)
[30 Aug 2007 17:30]
Gleb Shchepa
Tested on 32bit Linux with BUILD/compile-pentium-max (no debugging code). Results are same on 5.0 and 5.2: --------------------------------------------------------------------------- mysql> select version(); +-----------+ | version() | +-----------+ | 5.0.50 | +-----------+ 1 row in set (0.00 sec) mysql> SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN customer c ON p.customer_id = c.customer_id WHERE p.payment_date = (SELECT MAX(payment_date) FROM payment WHERE payment.customer_id = p.customer_id )\G ... 623 rows in set (0.88 sec) --------------------------------------------------------------------------- mysql> select version(); +-------------+ | version() | +-------------+ | 5.2.6-alpha | +-------------+ 1 row in set (0.00 sec) mysql> SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN customer c ON p.customer_id = c.customer_id WHERE p.payment_date = (SELECT MAX(payment_date) FROM payment WHERE payment.customer_id = p.customer_id )\G ... 623 rows in set (0.90 sec) =========================================================================== mysql> select version(); +-----------+ | version() | +-----------+ | 5.0.50 | +-----------+ 1 row in set (0.00 sec) mysql> SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN ( SELECT customer_id, MAX(payment_date) AS payment_date FROM payment GROUP BY customer_id ) AS last_orders ON p.customer_id = last_orders.customer_id AND p.payment_date = last_orders.payment_date INNER JOIN customer c ON p.customer_id = c.customer_id\G ... 623 rows in set (0.06 sec) --------------------------------------------------------------------------- mysql> select version(); +-------------+ | version() | +-------------+ | 5.2.6-alpha | +-------------+ 1 row in set (0.00 sec) mysql> SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN ( SELECT customer_id, MAX(payment_date) AS payment_date FROM payment GROUP BY customer_id ) AS last_orders ON p.customer_id = last_orders.customer_id AND p.payment_date = last_orders.payment_date INNER JOIN customer c ON p.customer_id = c.customer_id\G ... 623 rows in set (0.06 sec)

Description: 5.2 is supposed to have subquery optimizations, but my initial tests show a dramatic performance degradation for (at least) correlated subqueries. How to repeat: Loaded up clean 5.0.45 and 5.2.6 MySQL servers with the Sakila Sample database. Issued identical queries with EXPLAIN, and then without and compare results. SQL query: SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN customer c ON p.customer_id = c.customer_id WHERE p.payment_date = ( SELECT MAX(payment_date) FROM payment WHERE payment.customer_id = p.customer_id )\G The EXPLAIN output should be the same: mysql> EXPLAIN SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN customer c ON p.customer_id = c.customer_id WHERE p.payment_date = ( SELECT MAX(payment_date) FROM payment WHERE payment.customer_id = p.customer_id )\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: c type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 590 Extra: *************************** 2. row *************************** id: 1 select_type: PRIMARY table: p type: ref possible_keys: idx_fk_customer_id key: idx_fk_customer_id key_len: 2 ref: sakila.c.customer_id rows: 14 Extra: Using where *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: payment type: ref possible_keys: idx_fk_customer_id key: idx_fk_customer_id key_len: 2 ref: sakila.p.customer_id rows: 14 Extra: 3 rows in set (0.01 sec) When executing the query on my machine, with identical configs for both instances, performance of the above query averaged 3.6 seconds, with variance of ~.4 seconds. 5.0.45 showed performance of 1.3 seconds, with a variance of ~.2 seconds. Additionally, I tested the following SQL query: SELECT p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name FROM payment p INNER JOIN ( SELECT customer_id, MAX(payment_date) AS payment_date FROM payment GROUP BY customer_id ) AS last_orders ON p.customer_id = last_orders.customer_id AND p.payment_date = last_orders.payment_date INNER JOIN customer c ON p.customer_id = c.customer_id\G The EXPLAIN output were different: 5.0.45: ------- mysql> EXPLAIN SELECT -> p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name -> FROM payment p -> INNER JOIN ( -> SELECT customer_id, MAX(payment_date) AS payment_date -> FROM payment -> GROUP BY customer_id -> ) AS last_orders -> ON p.customer_id = last_orders.customer_id -> AND p.payment_date = last_orders.payment_date -> INNER JOIN customer c -> ON p.customer_id = c.customer_id\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 599 Extra: *************************** 2. row *************************** id: 1 select_type: PRIMARY table: c type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: last_orders.customer_id rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: PRIMARY table: p type: ref possible_keys: idx_fk_customer_id key: idx_fk_customer_id key_len: 2 ref: sakila.c.customer_id rows: 13 Extra: Using where *************************** 4. row *************************** id: 2 select_type: DERIVED table: payment type: index possible_keys: NULL key: idx_fk_customer_id key_len: 2 ref: NULL rows: 15123 Extra: 4 rows in set (0.10 sec) 5.2.6 ----- mysql> EXPLAIN SELECT -> p.payment_id, p.amount, p.payment_date, c.first_name, c.last_name -> FROM payment p -> INNER JOIN ( -> SELECT customer_id, MAX(payment_date) AS payment_date -> FROM payment -> GROUP BY customer_id -> ) AS last_orders -> ON p.customer_id = last_orders.customer_id -> AND p.payment_date = last_orders.payment_date -> INNER JOIN customer c -> ON p.customer_id = c.customer_id\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 599 Extra: *************************** 2. row *************************** id: 1 select_type: PRIMARY table: c type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: last_orders.customer_id rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: PRIMARY table: p type: ref possible_keys: idx_fk_customer_id key: idx_fk_customer_id key_len: 2 ref: sakila.c.customer_id rows: 14 Extra: Using index condition; Using where *************************** 4. row *************************** id: 2 select_type: DERIVED table: payment type: index possible_keys: NULL key: idx_fk_customer_id key_len: 2 ref: NULL rows: 16323 Extra: 4 rows in set (0.17 sec) Do note the long EXPLAIN times (repeatable on both instances) Note the 5.2.6 EXPLAIN plan shows the new "Using index condition" "optimization" in 5.2.6 optimizer. The performance of the 5.2.6 instance with the above query is 300% SLOWER than the 5.0.45 one. The 5.2.6 instance executed, on average, in .35 seconds, while the 5.0.45 instance executed the query, on average, in .12 seconds. Suggested fix: If 5.2.6 is supposed to contain subquery performance improvements, I don't see any; in fact, we see a regression.