Bug #72580 Optimize choice unoptimal
Submitted: 8 May 2014 15:04 Modified: 11 Jun 2014 9:49
Reporter: Olivier DASINI Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.6.16-enterprise-commercial OS:Linux (centOS)
Assigned to: CPU Architecture:Any
Tags: 5.6.16, Optimizer

[8 May 2014 15:04] Olivier DASINI
Description:
A 14 tables join query who takes 4 seconds with MySQL 5.1.57 MySQL Enterprise Server takes 2 minutes with 5.6.16-enterprise-commercial.

5.6.16 choose an unoptimal query plan. 5.1.57 choose a much better one.

The optimal query plan could be find with a hint (USE INDEX).

SELECT ...
-- 5.1.57 
111148 rows in set (4.77 sec)

-- 5.6.16
111148 rows in set (2 min 14.80 sec)

-- 5.6.16 with the hint
=>111146 rows in set (6.07 sec)

 show variables like 'optimizer%'\G
*************************** 1. row ***************************
Variable_name: optimizer_prune_level
        Value: 1
*************************** 2. row ***************************
Variable_name: optimizer_search_depth
        Value: 62
*************************** 3. row ***************************
Variable_name: optimizer_switch
        Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on
*************************** 4. row ***************************
Variable_name: optimizer_trace
        Value: enabled=on,one_line=on
*************************** 5. row ***************************
Variable_name: optimizer_trace_features
        Value: greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on
*************************** 6. row ***************************
Variable_name: optimizer_trace_limit
        Value: 1
*************************** 7. row ***************************
Variable_name: optimizer_trace_max_mem_size
        Value: 1000000
*************************** 8. row ***************************
Variable_name: optimizer_trace_offset
        Value: -1

show variables like 'eq_range_index_dive_limit';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 10    |
+---------------------------+-------+

How to repeat:
Run the query in 5.6.16
[21 May 2014 10:46] MySQL Verification Team
Hello Olivier,

Thank you for the bug report.
Please could you provide the complete test case(table structure(DDL),subset of data) to reproduce this problem at our end? Please make it as private once you upload/post here.

Thanks,
Umesh
[21 May 2014 15:18] Olivier DASINI
Sure.
I'll provide you that next week.
I have to anonymized some data.

Thanks
[22 May 2014 4:54] MySQL Verification Team
Thanks, shall wait for test case.

Thanks,
Umesh
[23 May 2014 7:36] Øystein Grøvlen
Hi,

AFAICT from the attached optimizer trace, the MySQL 5.6 optimizer consider both the "good" and the "bad" plan, and consider the "bad" plan to be cheaper.  This is not surprising since according to the statistics presented by EXPLAIN, starting with tables sg, bt will access 9*2575 rows which is far less than the number of rows to be read from the bookings table (215200 rows).  However, what the MySQL 5.6 optimizer does not take into account is that there are predicates on non-indexed columns of the bookings table that reduces the number of rows that needs to be joined with the subsequent tables.  This will be changed in MySQL 5.7, see http://mysqlserverteam.com/a-new-dimension-to-mysql-query-optimizations-part-1/

I am not sure why MySQL 5.1 comes up with a different plan, but I suspect that your only option for MySQL 5.6 is to use the optimizer hint to force the better plan.

It is great that you can provide us with a database so we can investigate this further and also check whether MySQL 5.7 really fixes your issue.  You are also welcome to download the MySQL 5.7 April labs release and test this yourself.
[27 May 2014 20:51] Olivier DASINI
Complete test case provide.
Structure here
Data by sftp
[2 Jun 2014 7:33] MySQL Verification Team
Thank you for providing the requested data.
Indeed, it takes a lot time in 5.6/5.7 versions(checked against 5.6.17 and 5.6.19).

Thanks,
Umesh
[4 Jun 2014 11:57] Øystein Grøvlen
I am not able to reproduce this on 5.6.18.  I get a plan that looks pretty good (see below), and the query runs in 3.5 secs.

mysql> explain select b.booking_id, date(b.created) booking_date, bookingchannelname, (((deposit-credits_deposit)*rate)-IFNULL(TotalSPDValue,0)) Deposit, IFNULL(TotalSPDValue,0) TotalSPDValue, (bookingfee-credits_bookingfee)*rate Booking_Fee, bednights, sitegroupname, sitetypename, sitename, bbc.foreign_id affiliate_id, userid, ((grosstotal*rate)-IFNULL(TotalSPDValue,0)) GBR, week                                                                                                                        ng_id = b.booking_id join bookingdetails bd on bd.booking_id = b.booking_id join bookingpricesrates bpr on bpr.booking_id = bp.book                                                                                                                         = bbc                                                                                                                         egroup_id join sitetypes st on st.sitetype_id = bt.sitetype_id join sitenames sn on sn.sitename_id = bt.sitename_id join dateslook                                                                                                                          ffi                                                                                                                            (b.created between '2014-04-27 00:00:00' and '2014-05-03 23:59:59') and bpr.currency_id = 35 and bookingstatus_id in (1,2) and book
+----+-------------+-------+--------+------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+--------+------------------------------------+
| id | select_type | table | type   | possible_keys                                                                | key                              | key_len | ref                            | rows   | Extra                              |
+----+-------------+-------+--------+------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+--------+------------------------------------+
|  1 | SIMPLE      | b     | range  | PRIMARY,bookings_bookingstatus_id,bookings_bookingtype_id,created            | created                          | 5       | NULL                           | 235224 | Using index condition; Using where |
|  1 | SIMPLE      | d     | eq_ref | PRIMARY                                                                      | PRIMARY                          | 3       | func                           |      1 | Using where                        |
|  1 | SIMPLE      | bp    | eq_ref | PRIMARY                                                                      | PRIMARY                          | 4       | bookings.b.booking_id          |      1 | NULL                               |
|  1 | SIMPLE      | bd    | eq_ref | PRIMARY                                                                      | PRIMARY                          | 4       | bookings.b.booking_id          |      1 | NULL                               |
|  1 | SIMPLE      | bpr   | ref    | bookingpricesrates_booking_id,bookingpricesrates_currency_id                 | bookingpricesrates_booking_id    | 4       | bookings.b.booking_id          |      1 | Using where                        |
|  1 | SIMPLE      | bt    | eq_ref | PRIMARY,bookingtracking_sitetype_id,bookingtracking_sitename_id,sitegroup_id | PRIMARY                          | 4       | bookings.b.booking_id          |      1 | Using where                        |
|  1 | SIMPLE      | sg    | eq_ref | PRIMARY                                                                      | PRIMARY                          | 1       | bookings.bt.sitegroup_id       |      1 | NULL                               |
|  1 | SIMPLE      | st    | eq_ref | PRIMARY                                                                      | PRIMARY                          | 2       | bookings.bt.sitetype_id        |      1 | NULL                               |
|  1 | SIMPLE      | sn    | eq_ref | PRIMARY                                                                      | PRIMARY                          | 8       | bookings.bt.sitename_id        |      1 | NULL                               |
|  1 | SIMPLE      | bbc   | ref    | bookingchannelssingle_booking_id,bookingchannelssingle_bookingchannel_id     | bookingchannelssingle_booking_id | 4       | bookings.b.booking_id          |      1 | NULL                               |
|  1 | SIMPLE      | bc    | eq_ref | PRIMARY                                                                      | PRIMARY                          | 4       | bookings.bbc.bookingchannel_id |      1 | NULL                               |
|  1 | SIMPLE      | af    | eq_ref | PRIMARY                                                                      | PRIMARY                          | 4       | bookings.bbc.foreign_id        |      1 | NULL                               |
|  1 | SIMPLE      | bcid  | eq_ref | PRIMARY                                                                      | PRIMARY                          | 4       | bookings.b.booking_id          |      1 | NULL                               |
|  1 | SIMPLE      | spd   | ref    | PRIMARY,CalculatedCurrencyCode                                               | PRIMARY                          | 7       | bookings.bcid.custid,const     |      1 | Using where                        |
+----+-------------+-------+--------+------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+--------+------------------------------------+
[4 Jun 2014 12:24] Øystein Grøvlen
Looking at the bad plan: The statistics for number of disctinct values in the bookingtracking.sitegroup_id seems to be different from my installation.

What does "SHOW INDEX FROM bookingtracking" show?

I have a cardinality of 14 for the sitegroup_id index and 9528791 for PRIMARY.  That should give 680627 rows per index lookup.  The bad plan says 2575. That is probably why optimizer thinks this plan is cheaper.  Maybe you should recompute statistics for this table (ANALYZE TABLE).

Also with MySQL 5.6 you should make sure to use InnoDB persitent statistics.  Not sure that happens automatically when upgrading from earlier versions.
[4 Jun 2014 13:42] Olivier DASINI
Hi Øystein,
thank you for your feedback.

- ANALYZE TABLE was done. It was my first action. I even rebuilt them (OPTIMIZE TABLE)

-  InnoDB persistent statistics is enable
show global variables like '%persis%';
+--------------------------------------+-------+
| Variable_name                        | Value |
+--------------------------------------+-------+
| innodb_stats_persistent              | ON    |
| innodb_stats_persistent_sample_pages | 12    |
+--------------------------------------+-------+

- In fact, the values of the bad plan came from the full data set.
SHOW INDEX FROM bookingtracking;
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name                    | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| bookingtracking |          0 | PRIMARY                     |            1 | booking_id   | A         |    30031799 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitetype_id |            1 | sitetype_id  | A         |       10242 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitename_id |            1 | sitename_id  | A         |       34920 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_server_id   |            1 | server_id    | A         |       38850 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_country_id  |            1 | country_id   | A         |       28520 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | sitegroup_id                |            1 | sitegroup_id | A         |       10166 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

- With the test data set I have the same result like you:
 SHOW INDEX FROM bookingtracking;
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name                    | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| bookingtracking |          0 | PRIMARY                     |            1 | booking_id   | A         |    30031799 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitetype_id |            1 | sitetype_id  | A         |       10242 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitename_id |            1 | sitename_id  | A         |       34920 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_server_id   |            1 | server_id    | A         |       38850 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_country_id  |            1 | country_id   | A         |       28520 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | sitegroup_id                |            1 | sitegroup_id | A         |       10166 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Anyway, I can give you json explain and/or optimizer trace from test data set if necessary.

Olivier
[5 Jun 2014 8:49] Øystein Grøvlen
Hi Olivier,

Are you saying that you have many more sitegroups in the full data set than in the test data?  If so, I do not think there is much use for me to investigate this using the test data.  Given the statistics you present from full data set, I am not surprised that MySQL 5.6 selects this plan.  If the statistics is precise, we can only hope that this is fixed in MySQL 5.7.  You can download the latest 5.7 labs release to check this out.  (Note that the labs release is different from the official 5.7.4 developer milestone release)

On the other hand, if the statistics is way off, and you really do not have that many different sitegroups, there are things that could be done to improve 
the precision:

1. Verify that there is actually persistent statistics for this table by running
   "SELECT * FROM mysql.innodb_index_stats WHERE table_name = 'bookingtracking'"

2. Why do you have a lower than default setting for 
   innodb_stats_persistent_sample_pages?  Try increasing it to 100 or 1000 to
   get more precise statistics.

3. After running ANALYZE TABLE, do a FLUSH TABLE to make sure that no 
   connections reuse old cached statistics.  (Table objects are cached and
   reused by new connections, so without FLUSH TABLE new connections may
[5 Jun 2014 8:54] Øystein Grøvlen
Sorry, submitted incomplete comment by accident.  Continuing:

New connections may see old statistics due to caching of table objects.  FLUSH TABLES will prevent that.

Did you paste the wrong output for SHOW INDEX for test data?  It seems to be identical to output for full database.
[5 Jun 2014 17:59] Olivier DASINI
Hi Øystein,

In fact, sitegroups are quite similar in full and in test data set.
count(distinct sitegroup_id) give 9 for both.

You're right I've made a typo for SHOW INDEX. Here the good numbers:
Full data set:
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name                    | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| bookingtracking |          0 | PRIMARY                     |            1 | booking_id   | A         |    30051692 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitetype_id |            1 | sitetype_id  | A         |          66 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitename_id |            1 | sitename_id  | A         |        4656 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_server_id   |            1 | server_id    | A         |         136 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_country_id  |            1 | country_id   | A         |         450 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | sitegroup_id                |            1 | sitegroup_id | A         |          16 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

test data set:

+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name                    | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| bookingtracking |          0 | PRIMARY                     |            1 | booking_id   | A         |     9971628 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitetype_id |            1 | sitetype_id  | A         |          50 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_sitename_id |            1 | sitename_id  | A         |        2494 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_server_id   |            1 | server_id    | A         |          68 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | bookingtracking_country_id  |            1 | country_id   | A         |         368 |     NULL | NULL   |      | BTREE      |         |               |
| bookingtracking |          1 | sitegroup_id                |            1 | sitegroup_id | A         |          14 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

With values of innodb_stats_persistent_sample_pages higher than default values (tried 100, 500, 1000), the optimizer could find a better execution plan.
After ANALYZE TABLE and FLUSH TABLE as you recommended of course.

I wondering if 20 by default for innodb_stats_persistent_sample_pages is not too small ?

I don't know if we can compare but in 5.1, innodb_stats_sample_pages = 8 and it can find the better execution plan.

Thank you very much for your works guys.
[11 Jun 2014 9:42] Øystein Grøvlen
Hi Olivier,

I think the default value of innodb_stats_persistent_sample_pages is set so low in order avoid that too much resources are used to collect statistics (statistics are automatically recalculated after certain number of changes to the table).  However, this is not as critical in 5.6 as in 5.1 since recalculation of statistics now happen in a background thread instead of in a user thread.  Hence, setting a high value should have less impact.  Alternative, you can turn off automatic recalculation and schedule regular runs of ANALYZE TABLE yourself.

The sampling algorithm in 5.1 is different from in 5.6.  On average, the new algorithm should give more precise statistics, but there are probably cases where you can get lucky with the old algorithm.
[11 Jun 2014 9:49] Øystein Grøvlen
Problem solved by increasing innodb_stats_persistent_sample_pages to get more precise statistics.