Bug #72580 | Optimize choice unoptimal | ||
---|---|---|---|
Submitted: | 8 May 2014 15:04 | Modified: | 11 Jun 2014 9:49 |
Reporter: | Olivier DASINI | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.6.16-enterprise-commercial | OS: | Linux (centOS) |
Assigned to: | CPU Architecture: | Any | |
Tags: | 5.6.16, Optimizer |
[8 May 2014 15:04]
Olivier DASINI
[21 May 2014 10:46]
MySQL Verification Team
Hello Olivier, Thank you for the bug report. Please could you provide the complete test case(table structure(DDL),subset of data) to reproduce this problem at our end? Please make it as private once you upload/post here. Thanks, Umesh
[21 May 2014 15:18]
Olivier DASINI
Sure. I'll provide you that next week. I have to anonymized some data. Thanks
[22 May 2014 4:54]
MySQL Verification Team
Thanks, shall wait for test case. Thanks, Umesh
[23 May 2014 7:36]
Øystein Grøvlen
Hi, AFAICT from the attached optimizer trace, the MySQL 5.6 optimizer consider both the "good" and the "bad" plan, and consider the "bad" plan to be cheaper. This is not surprising since according to the statistics presented by EXPLAIN, starting with tables sg, bt will access 9*2575 rows which is far less than the number of rows to be read from the bookings table (215200 rows). However, what the MySQL 5.6 optimizer does not take into account is that there are predicates on non-indexed columns of the bookings table that reduces the number of rows that needs to be joined with the subsequent tables. This will be changed in MySQL 5.7, see http://mysqlserverteam.com/a-new-dimension-to-mysql-query-optimizations-part-1/ I am not sure why MySQL 5.1 comes up with a different plan, but I suspect that your only option for MySQL 5.6 is to use the optimizer hint to force the better plan. It is great that you can provide us with a database so we can investigate this further and also check whether MySQL 5.7 really fixes your issue. You are also welcome to download the MySQL 5.7 April labs release and test this yourself.
[27 May 2014 20:51]
Olivier DASINI
Complete test case provide. Structure here Data by sftp
[2 Jun 2014 7:33]
MySQL Verification Team
Thank you for providing the requested data. Indeed, it takes a lot time in 5.6/5.7 versions(checked against 5.6.17 and 5.6.19). Thanks, Umesh
[4 Jun 2014 11:57]
Øystein Grøvlen
I am not able to reproduce this on 5.6.18. I get a plan that looks pretty good (see below), and the query runs in 3.5 secs. mysql> explain select b.booking_id, date(b.created) booking_date, bookingchannelname, (((deposit-credits_deposit)*rate)-IFNULL(TotalSPDValue,0)) Deposit, IFNULL(TotalSPDValue,0) TotalSPDValue, (bookingfee-credits_bookingfee)*rate Booking_Fee, bednights, sitegroupname, sitetypename, sitename, bbc.foreign_id affiliate_id, userid, ((grosstotal*rate)-IFNULL(TotalSPDValue,0)) GBR, week ng_id = b.booking_id join bookingdetails bd on bd.booking_id = b.booking_id join bookingpricesrates bpr on bpr.booking_id = bp.book = bbc egroup_id join sitetypes st on st.sitetype_id = bt.sitetype_id join sitenames sn on sn.sitename_id = bt.sitename_id join dateslook ffi (b.created between '2014-04-27 00:00:00' and '2014-05-03 23:59:59') and bpr.currency_id = 35 and bookingstatus_id in (1,2) and book +----+-------------+-------+--------+------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+--------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+--------+------------------------------------+ | 1 | SIMPLE | b | range | PRIMARY,bookings_bookingstatus_id,bookings_bookingtype_id,created | created | 5 | NULL | 235224 | Using index condition; Using where | | 1 | SIMPLE | d | eq_ref | PRIMARY | PRIMARY | 3 | func | 1 | Using where | | 1 | SIMPLE | bp | eq_ref | PRIMARY | PRIMARY | 4 | bookings.b.booking_id | 1 | NULL | | 1 | SIMPLE | bd | eq_ref | PRIMARY | PRIMARY | 4 | bookings.b.booking_id | 1 | NULL | | 1 | SIMPLE | bpr | ref | bookingpricesrates_booking_id,bookingpricesrates_currency_id | bookingpricesrates_booking_id | 4 | bookings.b.booking_id | 1 | Using where | | 1 | SIMPLE | bt | eq_ref | PRIMARY,bookingtracking_sitetype_id,bookingtracking_sitename_id,sitegroup_id | PRIMARY | 4 | bookings.b.booking_id | 1 | Using where | | 1 | SIMPLE | sg | eq_ref | PRIMARY | PRIMARY | 1 | bookings.bt.sitegroup_id | 1 | NULL | | 1 | SIMPLE | st | eq_ref | PRIMARY | PRIMARY | 2 | bookings.bt.sitetype_id | 1 | NULL | | 1 | SIMPLE | sn | eq_ref | PRIMARY | PRIMARY | 8 | bookings.bt.sitename_id | 1 | NULL | | 1 | SIMPLE | bbc | ref | bookingchannelssingle_booking_id,bookingchannelssingle_bookingchannel_id | bookingchannelssingle_booking_id | 4 | bookings.b.booking_id | 1 | NULL | | 1 | SIMPLE | bc | eq_ref | PRIMARY | PRIMARY | 4 | bookings.bbc.bookingchannel_id | 1 | NULL | | 1 | SIMPLE | af | eq_ref | PRIMARY | PRIMARY | 4 | bookings.bbc.foreign_id | 1 | NULL | | 1 | SIMPLE | bcid | eq_ref | PRIMARY | PRIMARY | 4 | bookings.b.booking_id | 1 | NULL | | 1 | SIMPLE | spd | ref | PRIMARY,CalculatedCurrencyCode | PRIMARY | 7 | bookings.bcid.custid,const | 1 | Using where | +----+-------------+-------+--------+------------------------------------------------------------------------------+----------------------------------+---------+--------------------------------+--------+------------------------------------+
[4 Jun 2014 12:24]
Øystein Grøvlen
Looking at the bad plan: The statistics for number of disctinct values in the bookingtracking.sitegroup_id seems to be different from my installation. What does "SHOW INDEX FROM bookingtracking" show? I have a cardinality of 14 for the sitegroup_id index and 9528791 for PRIMARY. That should give 680627 rows per index lookup. The bad plan says 2575. That is probably why optimizer thinks this plan is cheaper. Maybe you should recompute statistics for this table (ANALYZE TABLE). Also with MySQL 5.6 you should make sure to use InnoDB persitent statistics. Not sure that happens automatically when upgrading from earlier versions.
[4 Jun 2014 13:42]
Olivier DASINI
Hi Øystein, thank you for your feedback. - ANALYZE TABLE was done. It was my first action. I even rebuilt them (OPTIMIZE TABLE) - InnoDB persistent statistics is enable show global variables like '%persis%'; +--------------------------------------+-------+ | Variable_name | Value | +--------------------------------------+-------+ | innodb_stats_persistent | ON | | innodb_stats_persistent_sample_pages | 12 | +--------------------------------------+-------+ - In fact, the values of the bad plan came from the full data set. SHOW INDEX FROM bookingtracking; +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | bookingtracking | 0 | PRIMARY | 1 | booking_id | A | 30031799 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitetype_id | 1 | sitetype_id | A | 10242 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitename_id | 1 | sitename_id | A | 34920 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_server_id | 1 | server_id | A | 38850 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_country_id | 1 | country_id | A | 28520 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | sitegroup_id | 1 | sitegroup_id | A | 10166 | NULL | NULL | | BTREE | | | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ - With the test data set I have the same result like you: SHOW INDEX FROM bookingtracking; +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | bookingtracking | 0 | PRIMARY | 1 | booking_id | A | 30031799 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitetype_id | 1 | sitetype_id | A | 10242 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitename_id | 1 | sitename_id | A | 34920 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_server_id | 1 | server_id | A | 38850 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_country_id | 1 | country_id | A | 28520 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | sitegroup_id | 1 | sitegroup_id | A | 10166 | NULL | NULL | | BTREE | | | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ Anyway, I can give you json explain and/or optimizer trace from test data set if necessary. Olivier
[5 Jun 2014 8:49]
Øystein Grøvlen
Hi Olivier, Are you saying that you have many more sitegroups in the full data set than in the test data? If so, I do not think there is much use for me to investigate this using the test data. Given the statistics you present from full data set, I am not surprised that MySQL 5.6 selects this plan. If the statistics is precise, we can only hope that this is fixed in MySQL 5.7. You can download the latest 5.7 labs release to check this out. (Note that the labs release is different from the official 5.7.4 developer milestone release) On the other hand, if the statistics is way off, and you really do not have that many different sitegroups, there are things that could be done to improve the precision: 1. Verify that there is actually persistent statistics for this table by running "SELECT * FROM mysql.innodb_index_stats WHERE table_name = 'bookingtracking'" 2. Why do you have a lower than default setting for innodb_stats_persistent_sample_pages? Try increasing it to 100 or 1000 to get more precise statistics. 3. After running ANALYZE TABLE, do a FLUSH TABLE to make sure that no connections reuse old cached statistics. (Table objects are cached and reused by new connections, so without FLUSH TABLE new connections may
[5 Jun 2014 8:54]
Øystein Grøvlen
Sorry, submitted incomplete comment by accident. Continuing: New connections may see old statistics due to caching of table objects. FLUSH TABLES will prevent that. Did you paste the wrong output for SHOW INDEX for test data? It seems to be identical to output for full database.
[5 Jun 2014 17:59]
Olivier DASINI
Hi Øystein, In fact, sitegroups are quite similar in full and in test data set. count(distinct sitegroup_id) give 9 for both. You're right I've made a typo for SHOW INDEX. Here the good numbers: Full data set: +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | bookingtracking | 0 | PRIMARY | 1 | booking_id | A | 30051692 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitetype_id | 1 | sitetype_id | A | 66 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitename_id | 1 | sitename_id | A | 4656 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_server_id | 1 | server_id | A | 136 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_country_id | 1 | country_id | A | 450 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | sitegroup_id | 1 | sitegroup_id | A | 16 | NULL | NULL | | BTREE | | | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ test data set: +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | bookingtracking | 0 | PRIMARY | 1 | booking_id | A | 9971628 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitetype_id | 1 | sitetype_id | A | 50 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_sitename_id | 1 | sitename_id | A | 2494 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_server_id | 1 | server_id | A | 68 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | bookingtracking_country_id | 1 | country_id | A | 368 | NULL | NULL | | BTREE | | | | bookingtracking | 1 | sitegroup_id | 1 | sitegroup_id | A | 14 | NULL | NULL | | BTREE | | | +-----------------+------------+-----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ With values of innodb_stats_persistent_sample_pages higher than default values (tried 100, 500, 1000), the optimizer could find a better execution plan. After ANALYZE TABLE and FLUSH TABLE as you recommended of course. I wondering if 20 by default for innodb_stats_persistent_sample_pages is not too small ? I don't know if we can compare but in 5.1, innodb_stats_sample_pages = 8 and it can find the better execution plan. Thank you very much for your works guys.
[11 Jun 2014 9:42]
Øystein Grøvlen
Hi Olivier, I think the default value of innodb_stats_persistent_sample_pages is set so low in order avoid that too much resources are used to collect statistics (statistics are automatically recalculated after certain number of changes to the table). However, this is not as critical in 5.6 as in 5.1 since recalculation of statistics now happen in a background thread instead of in a user thread. Hence, setting a high value should have less impact. Alternative, you can turn off automatic recalculation and schedule regular runs of ANALYZE TABLE yourself. The sampling algorithm in 5.1 is different from in 5.6. On average, the new algorithm should give more precise statistics, but there are probably cases where you can get lucky with the old algorithm.
[11 Jun 2014 9:49]
Øystein Grøvlen
Problem solved by increasing innodb_stats_persistent_sample_pages to get more precise statistics.