Bug #14292 some query takes 10 times more time in 5.0 than in 4.1
Submitted: 25 Oct 2005 13:29 Modified: 9 May 2006 18:54
Reporter: Aleksey Kishkin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0 OS:Linux (Linux)
Assigned to: Igor Babaev CPU Architecture:Any

[25 Oct 2005 13:29] Aleksey Kishkin
Description:
I have the query:

select
        nation,
        o_year,
        sum(amount) as sum_profit
from
        (
                select
                        n_name as nation,
                        year(o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
                from
                        part,
                        supplier,
                        lineitem,
                        partsupp,
                        orders,
                        nation
                where
                        s_suppkey = l_suppkey
                        and ps_suppkey = l_suppkey
                        and ps_partkey = l_partkey
                        and p_partkey = l_partkey
                        and o_orderkey = l_orderkey
                        and s_nationkey = n_nationkey
                        and p_name like '%orchid%'
        ) as profit
group by
        nation,
        o_year
order by
        nation,
        o_year desc;

I tested the query against  5.0.15 and 4.1.15 and on my computer it took 2.30 seconds on 4.1.15 and 27.77 seconds on 5.0.15

How to repeat:
I attached data and shell script for creating tables and filling them (fill.sh). So, you need to
1) ungzip and untar archive into some directory, 
2) start mysql (say, 4.1)
3) run ./fill.sh (in order to create and fill tables)
4) start mysql console client
5) perform 
  source 9.sql

and check how much time takes the execution of 9.sql

repeat 2..5 for mysql 5.0 and compare execution time for 4.1 and 5.0
[30 Apr 2006 5:23] Igor Babaev
The difference in performance for version 4.1 and 5.0 for the reported query is explained by the fact version 5.0 chooses much worse execution plan than version 4.1 does for the subquery used in this query.

Version 5.0 chooses the following plan:

mysql> explain select n_name as nation,year(o_orderdate) as o_year,l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part,supplier,lineitempartsupp,orders,nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%orchid%';
+----+-------------+----------+--------+----------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------+--------+-------------+
| id | select_type | table    | type   | possible_keys                                                                          | key                   | key_len | ref                                         | rows   | Extra       |
+----+-------------+----------+--------+----------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------+--------+-------------+
|  1 | SIMPLE      | orders   | ALL    | PRIMARY                                                                                | NULL                  | NULL    | NULL                                        | 150000 |             |
|  1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4       | test.orders.o_orderkey                      |      4 |             |
|  1 | SIMPLE      | supplier | eq_ref | PRIMARY,i_s_nationkey                                                                  | PRIMARY               | 4       | test.lineitem.l_suppkey                     |      1 |             |
|  1 | SIMPLE      | part     | eq_ref | PRIMARY                                                                                | PRIMARY               | 4       | test.lineitem.l_partkey                     |      1 | Using where |
|  1 | SIMPLE      | partsupp | eq_ref | PRIMARY,i_ps_partkey,i_ps_suppkey                                                      | PRIMARY               | 8       | test.part.p_partkey,test.lineitem.l_suppkey |      1 | Using where |
|  1 | SIMPLE      | nation   | eq_ref | PRIMARY                                                                                | PRIMARY               | 4       | test.supplier.s_nationkey                   |      1 |             |
+----+-------------+----------+--------+----------------------------------------------------------------------------------------+-----------------------+---------+---------------------------------------------+--------+-------------+

The following plan chosen by version 4.1 would be significantly faster:

mysql> explain select n_name as nation,year(o_orderdate) as o_year,l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part,supplier,lineitempartsupp,orders,nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%orchid%';

+----+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+---------------------------------------------+-------+-------------+
| id | select_type | table    | type   | possible_keys                                                                          | key                 | key_len | ref                                         | rows  | Extra       |
+----+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+---------------------------------------------+-------+-------------+
|  1 | SIMPLE      | part     | ALL    | PRIMARY                                                                                | NULL                | NULL    | NULL                                        | 20000 | Using where |
|  1 | SIMPLE      | lineitem | ref    | PRIMARY,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey,i_l_orderkey,i_l_orderkey_quantity | i_l_suppkey_partkey | 5       | test.part.p_partkey                         |    30 | Using where |
|  1 | SIMPLE      | supplier | eq_ref | PRIMARY,i_s_nationkey                                                                  | PRIMARY             | 4       | test.lineitem.l_suppkey                     |     1 |             |
|  1 | SIMPLE      | partsupp | eq_ref | PRIMARY,i_ps_partkey,i_ps_suppkey                                                      | PRIMARY             | 8       | test.part.p_partkey,test.lineitem.l_suppkey |     1 |             |
|  1 | SIMPLE      | orders   | eq_ref | PRIMARY                                                                                | PRIMARY             | 4       | test.lineitem.l_orderkey                    |     1 |             |
|  1 | SIMPLE      | nation   | eq_ref | PRIMARY                                                                                | PRIMARY             | 4       | test.supplier.s_nationkey                   |     1 |             |
+----+-------------+----------+--------+----------------------------------------------------------------------------------------+---------------------+---------+---------------------------------------------+-------+-------------+

It happened because some changes applied to the function find_best in 4.1 and merged into 5.0 
in January 2004 were not reflected in the code of new greedy optimizer code merged into 5.0 in May 2005.
[30 Apr 2006 22:17] Igor Babaev
Correction for the previous comment: the greedy optimizer code appeared in the main 5.0 tree in May 2004.
[3 May 2006 1:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/5838
[7 May 2006 22:46] Igor Babaev
ChangeSet
  1.2098 06/05/02 18:31:20 igor@rurik.mysql.com +5 -0
  Fixed bug #14292: performance degradation for a benchmark query.
  This performance degradation was due to the fact that some
  cost evaluation code added into 4.1 in the function find_best was
  not merged into the code of the function best_access_path added
  together with other code for greedy optimizer.

The fix will appear in 5.0.22 and 5.1.10.
[9 May 2006 18:54] Paul DuBois
Noted in 5.0.22, 5.1.10 changelogs.