MySQL Bugs: #26528: Optimizer produces inefficient execution plan on 5.0.x compared to 4.1.21

Bug #26528	Optimizer produces inefficient execution plan on 5.0.x compared to 4.1.21
Submitted:	21 Feb 2007 15:02	Modified:	22 Feb 2007 16:21
Reporter:	Valeriy Kravchuk	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.0.34, 5.0.27	OS:	Windows (Windows)
Assigned to:	Igor Babaev	CPU Architecture:	Any
Tags:	regression

Description:
Some complex queries with derived tables works 3+ times slower on 5.0.27 compared to 4.1.21 (with the same default my.ini settings, same hardware and on same data). For example, this one:

select count(*) as n,im,rt,tc 
from 
  (select R.type_con as rt ,SE.imex as im, H.Lib_mvt, 
          R.type_con,R.contrat as tc, 
          TY.fr, H.Type_mvt, H.TypeP, H.TypeD, H.Pv,
          H.etat, H.date_mvt, H.Id_sspP,
          CONCAT(H.Type_mvt, H.TypeP, H.TypeD,H.Pv), 
          SE.imex , H.Id_conteneur, H.Id_sspD 
     from mft_historique_gilles as H 
          INNER JOIN 
          (mft_ref as R, mft_passeport as P, mft_sequence SE, mft_typemvt as TY)
          ON ((P.Mft_refP=R.mft_ref) 
              AND (H.Id_conteneur = P.Id_conteneur ) 
              AND Ty.type=SE.imex 
              AND CONCAT(H.Type_mvt, H.TypeP, H.TypeD, H.Pv) = SE.sequencec
             )
          left outer join
          (mft_refhisto as RE) 
          ON ((Re.id_mouvement=H.Id_mouvement)) 
     where (((H.numeroD="4499" ) OR ( Re.portD="FRLEH" )) AND SE.imex !="")
     group by H.id_conteneur) as bob 
group by im, rt 
order by rt,tc

runs 13+ seconds on 4.1.21 and 43+ seconds (with 100% CPU load) on 5.0.27 on the same data.

ANALYZE TABLE for all the tables involved does not change query plan.

How to repeat:
C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot -proot test -P 3307
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 4.1.21-community-nt

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use marfret
Database changed
mysql> select count(*) as n,im,rt,tc from (select R.type_con as rt ,SE.imex as im
...
    -> group by H.id_conteneur) as bob group by im,rt order by rt,tc
    -> ;
...
33 rows in set (13.72 sec)

mysql> explain select count(*) as n,im,rt,tc from (select R.type_con as rt, SE.imex as im
...
    -> group by H.id_conteneur) as bob group by im,rt order by rt,tc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 464
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: H
         type: ALL
possible_keys: idcont,numeroD,index1,index2
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 35482
        Extra: Using temporary; Using filesort
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: P
         type: ref
possible_keys: idetat,mft_refP,ls_conteneur,Id,index7
          key: idetat
      key_len: 11
          ref: marfret.H.Id_conteneur
         rows: 1
        Extra:
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: R
         type: eq_ref
possible_keys: mft_ref,ref_contrat,index10
          key: mft_ref
      key_len: 50
          ref: marfret.P.Mft_refP
         rows: 1
        Extra:
*************************** 5. row ***************************
           id: 2
  select_type: DERIVED
        table: TY
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 37
        Extra:
*************************** 6. row ***************************
           id: 2
  select_type: DERIVED
        table: RE
         type: ref
possible_keys: index1,idmvtportdtyperef,idmvt,stock
          key: index1
      key_len: 4
          ref: marfret.H.Id_mouvement
         rows: 1
        Extra: Using where; Using index
*************************** 7. row ***************************
           id: 2
  select_type: DERIVED
        table: SE
         type: ref
possible_keys: seqc,seqcp,imex,seqc_imex
          key: seqc_imex
      key_len: 266
          ref: func,marfret.TY.type
         rows: 2
        Extra: Using where; Using index
7 rows in set (8.13 sec)

Note the order of table access! Now, with the same data:

C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot -P 3308
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.0.27-community-nt MySQL Community Edition (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use marfret;
Database changed
mysql> explain select count(*) as n,im,rt,tc from (select R.type_con as rt, SE.imex as im
...
    -> group by H.id_conteneur) as bob group by im,rt order by rt,tc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 464
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: TY
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 37
        Extra: Using temporary; Using filesort
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: H
         type: ALL
possible_keys: idcont,numeroD,index1,index2
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 35199
        Extra:
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: P
         type: ref
possible_keys: idetat,mft_refP,ls_conteneur,Id,index7
          key: idetat
      key_len: 13
          ref: marfret.H.Id_conteneur
         rows: 1
        Extra:
*************************** 5. row ***************************
           id: 2
  select_type: DERIVED
        table: R
         type: ref
possible_keys: mft_ref,ref_contrat,index10
          key: mft_ref
      key_len: 52
          ref: marfret.P.Mft_refP
         rows: 1
        Extra:
*************************** 6. row ***************************
           id: 2
  select_type: DERIVED
        table: RE
         type: ref
possible_keys: index1,idmvtportdtyperef,idmvt,stock
          key: index1
      key_len: 4
          ref: marfret.H.Id_mouvement
         rows: 1
        Extra: Using where; Using index
*************************** 7. row ***************************
           id: 2
  select_type: DERIVED
        table: SE
         type: ref
possible_keys: seqc,seqcp,imex,seqc_imex
          key: seqc_imex
      key_len: 270
          ref: func,marfret.TY.type
         rows: 2
        Extra: Using where; Using index
7 rows in set (45.91 sec)

Note different order of tables! And:

mysql> select count(*) as n,im,rt,tc from (select R.type_con as rt ,SE.imex as im
...
    -> group by H.id_conteneur) as bob group by im,rt order by rt,tc;
+-----+----+------+------+
...
+-----+----+------+------+
33 rows in set (43.63 sec)

Suggested fix:
Restore previous behaviour?

For the query in the subselect (derived table) 4.1 returns the following execution plan:
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys                          | key       | key_len | ref                    | rows  | Extra                           |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
|  1 | SIMPLE      | H     | ALL    | idcont,numeroD,index1,index2           | NULL      |    NULL | NULL                   | 35111 | Using temporary; Using filesort |
|  1 | SIMPLE      | P     | ref    | idetat,mft_refP,ls_conteneur,Id,index7 | idetat    |      11 | marfret.H.Id_conteneur |     1 |                                 |
|  1 | SIMPLE      | R     | eq_ref | mft_ref,ref_contrat,index10            | mft_ref   |      50 | marfret.P.Mft_refP     |     1 |                                 |
|  1 | SIMPLE      | TY    | ALL    | NULL                                   | NULL      |    NULL | NULL                   |    37 |                                 |
|  1 | SIMPLE      | RE    | ref    | index1,idmvtportdtyperef,idmvt,stock   | index1    |       4 | marfret.H.Id_mouvement |     1 | Using where; Using index        |
|  1 | SIMPLE      | SE    | ref    | seqc,seqcp,imex,seqc_imex              | seqc_imex |     266 | func,marfret.TY.type   |     2 | Using where; Using index        |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+

The execution by this plan on my machine takes:
 (18.90 sec)

If we change the order of tables in the original query we get a different execution plan:

mysql> explain
    -> select R.type_con as rt ,SE.imex as im, H.Lib_mvt,
    ->           R.type_con,R.contrat as tc,
    ->           TY.fr, H.Type_mvt, H.TypeP, H.TypeD, H.Pv,
    ->           H.etat, H.date_mvt, H.Id_sspP,
    ->           CONCAT(H.Type_mvt, H.TypeP, H.TypeD,H.Pv),
    ->           SE.imex , H.Id_conteneur, H.Id_sspD
    ->      from
    ->           (mft_typemvt as TY, mft_ref as R, mft_passeport as P, mft_sequence SE)
    ->           INNER JOIN
    ->           mft_historique_gilles as H
    ->           ON ((P.Mft_refP=R.mft_ref)
    ->               AND (H.Id_conteneur = P.Id_conteneur )
    ->               AND TY.type=SE.imex
    ->               AND CONCAT(H.Type_mvt, H.TypeP, H.TypeD, H.Pv) = SE.sequencec
    ->              )
    ->           left outer join
    ->           (mft_refhisto as RE)
    ->           ON ((RE.id_mouvement=H.Id_mouvement))
    ->      where (((H.numeroD="4499" ) OR ( RE.portD="FRLEH" )) AND SE.imex !="")
    ->      group by H.id_conteneur;
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys                          | key       | key_len | ref                    | rows  | Extra                           |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
|  1 | SIMPLE      | TY    | ALL    | NULL                                   | NULL      |    NULL | NULL                   |    37 | Using temporary; Using filesort |
|  1 | SIMPLE      | H     | ALL    | idcont,numeroD,index1,index2           | NULL      |    NULL | NULL                   | 35111 |                                 |
|  1 | SIMPLE      | P     | ref    | idetat,mft_refP,ls_conteneur,Id,index7 | idetat    |      11 | marfret.H.Id_conteneur |     1 |                                 |
|  1 | SIMPLE      | R     | eq_ref | mft_ref,ref_contrat,index10            | mft_ref   |      50 | marfret.P.Mft_refP     |     1 |                                 |
|  1 | SIMPLE      | RE    | ref    | index1,idmvtportdtyperef,idmvt,stock   | index1    |       4 | marfret.H.Id_mouvement |     1 | Using where; Using index        |
|  1 | SIMPLE      | SE    | ref    | seqc,seqcp,imex,seqc_imex              | seqc_imex |     266 | func,marfret.TY.type   |     2 | Using where; Using index        |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+

This plan is executed for: 
(1 min 32.90 sec)

This happens because the optimizer does not evaluate all possible execution plans and as a result sometimes it misses the fastest ones.

See continuation in the next comment.

5.1 is smarter in this respect. We can force the optimizer to evaluate all
possible plans by setting the option optimizer_prune_level to 0. By default
it's value is set to 1 and with it 5.1 returns the following execution plan for
our original query:

+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys                          | key       | key_len | ref                    | rows  | Extra                           |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
|  1 | SIMPLE      | TY    | ALL    | NULL                                   | NULL      | NULL    | NULL                   |    37 | Using temporary; Using filesort |
|  1 | SIMPLE      | H     | ALL    | idcont,numeroD,index1,index2           | NULL      | NULL    | NULL                   | 35213 |                                 |
|  1 | SIMPLE      | P     | ref    | idetat,mft_refP,ls_conteneur,Id,index7 | idetat    | 13      | marfret.H.Id_conteneur |     1 |                                 |
|  1 | SIMPLE      | R     | eq_ref | mft_ref,ref_contrat,index10            | mft_ref   | 52      | marfret.P.Mft_refP     |     1 |                                 |
|  1 | SIMPLE      | RE    | ref    | index1,idmvtportdtyperef,idmvt,stock   | index1    | 4       | marfret.H.Id_mouvement |     1 | Using where; Using index        |
|  1 | SIMPLE      | SE    | ref    | seqc,seqcp,imex,seqc_imex              | seqc_imex | 270     | func,marfret.TY.type   |     2 | Using where; Using index        |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+

The plan is executed for:
(2 min 3.17 sec)

If I change the value of optimizer_prune_level:

mysql> select @@optimizer_prune_level;
+-------------------------+
| @@optimizer_prune_level |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.00 sec)

mysql> set @@optimizer_prune_level=0;
Query OK, 0 rows affected (0.00 sec)

the optimizer returns a much better plan:

+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys                          | key       | key_len | ref                    | rows  | Extra                           |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
|  1 | SIMPLE      | H     | ALL    | idcont,numeroD,index1,index2           | NULL      | NULL    | NULL                   | 35213 | Using temporary; Using filesort |
|  1 | SIMPLE      | P     | ref    | idetat,mft_refP,ls_conteneur,Id,index7 | idetat    | 13      | marfret.H.Id_conteneur |     1 |                                 |
|  1 | SIMPLE      | RE    | ref    | index1,idmvtportdtyperef,idmvt,stock   | index1    | 4       | marfret.H.Id_mouvement |     1 | Using where; Using index        |
|  1 | SIMPLE      | TY    | ALL    | NULL                                   | NULL      | NULL    | NULL                   |    37 |                                 |
|  1 | SIMPLE      | SE    | ref    | seqc,seqcp,imex,seqc_imex              | seqc_imex | 270     | func,marfret.TY.type   |     2 | Using where; Using index        |
|  1 | SIMPLE      | R     | eq_ref | mft_ref,ref_contrat,index10            | mft_ref   | 52      | marfret.P.Mft_refP     |     1 |                                 |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+

whose execution time is:
 (5.58 sec)

And the choice of this plan does not depend on the order of tables in the query:

mysql> explain
    -> select R.type_con as rt ,SE.imex as im, H.Lib_mvt,
    ->           R.type_con,R.contrat as tc,
    ->           TY.fr, H.Type_mvt, H.TypeP, H.TypeD, H.Pv,
    ->           H.etat, H.date_mvt, H.Id_sspP,
    ->           CONCAT(H.Type_mvt, H.TypeP, H.TypeD,H.Pv),
    ->           SE.imex , H.Id_conteneur, H.Id_sspD
    ->      from
    ->           (mft_ref as R, mft_passeport as P, mft_sequence SE, mft_typemvt as TY)
    ->           INNER JOIN
    ->           mft_historique_gilles as H
    ->           ON ((P.Mft_refP=R.mft_ref)
    ->               AND (H.Id_conteneur = P.Id_conteneur )
    ->               AND TY.type=SE.imex
    ->               AND CONCAT(H.Type_mvt, H.TypeP, H.TypeD, H.Pv) = SE.sequencec
    ->              )
    ->           left outer join
    ->           (mft_refhisto as RE)
    ->           ON ((RE.id_mouvement=H.Id_mouvement))
    ->      where (((H.numeroD="4499" ) OR ( RE.portD="FRLEH" )) AND SE.imex !="")
    ->      group by H.id_conteneur;
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys                          | key       | key_len | ref                    | rows  | Extra                           |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+
|  1 | SIMPLE      | H     | ALL    | idcont,numeroD,index1,index2           | NULL      | NULL    | NULL                   | 35213 | Using temporary; Using filesort |
|  1 | SIMPLE      | P     | ref    | idetat,mft_refP,ls_conteneur,Id,index7 | idetat    | 13      | marfret.H.Id_conteneur |     1 |                                 |
|  1 | SIMPLE      | RE    | ref    | index1,idmvtportdtyperef,idmvt,stock   | index1    | 4       | marfret.H.Id_mouvement |     1 | Using where; Using index        |
|  1 | SIMPLE      | TY    | ALL    | NULL                                   | NULL      | NULL    | NULL                   |    37 |                                 |
|  1 | SIMPLE      | SE    | ref    | seqc,seqcp,imex,seqc_imex              | seqc_imex | 270     | func,marfret.TY.type   |     2 | Using where; Using index        |
|  1 | SIMPLE      | R     | eq_ref | mft_ref,ref_contrat,index10            | mft_ref   | 52      | marfret.P.Mft_refP     |     1 |                                 |
+----+-------------+-------+--------+----------------------------------------+-----------+---------+------------------------+-------+---------------------------------+

Conclusion: there is no performance degradation for the customer's query.
This is not a bug.

Correction for my previous [open] comment:
the option optimizer_prune_level is valid for 5.0+ and
I checked the queries I referred to with 5.0 (never checked them with 5.1, but nothing changed in this functionality).