Bug #19330 | Optimizer chooses wrong table order then query does not come back | ||
---|---|---|---|
Submitted: | 25 Apr 2006 9:56 | Modified: | 6 Feb 2014 13:58 |
Reporter: | Oli Sennhauser | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.0.20 | OS: | Linux (linux) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[25 Apr 2006 9:56]
Oli Sennhauser
[25 Apr 2006 10:01]
Oli Sennhauser
query and execution plan
Attachment: rechnungslauf.sql (text/x-sql), 29.77 KiB.
[25 Apr 2006 11:59]
Valeriy Kravchuk
Thank you for a problem report. Please, send the my.cnf file content. To me the first plan, selected by optimizer, looks better than second one, by the way. At least, if take into account number of rows to lookup.
[25 Apr 2006 13:04]
Oli Sennhauser
Hi Valeriy I do fully agree with you again. The execution plan looks better. But with STRAIGTH_JOIN it runs 4 sec. on my machine and without it is not back after ... min. my.cnf [mysqld] port = 3310 socket = /u00/app/mysql/tmp/mysql-5.0.20.sock log_slow_queries = slow.log long_query_time = 1 log_long_format lower_case_table_names = 1 log_bin = bin_log innodb_status_file = innodb_status.log myisam_sort_buffer_size = 32M default.key_buffer_size = 32M hot.key_buffer_size = 32M cold.key_buffer_size = 32M
[3 May 2006 6:01]
Jorge del Conde
I was able to repeat this using the attached file with 5.0 from bk
[31 Jul 2006 1:37]
Igor Babaev
This problem can be demonstrated on the following two simple queries that use the same database state as the reported queries. The first query is executed very quickly: mysql> SELECT -> STRAIGHT_JOIN -> COUNT(*) -> FROM drdat d -> LEFT JOIN drdat dstop ON d.firma_id=dstop.firma_id AND d.stadt_id=dstop.stadt_id -> AND d.druckdatum<dstop.druckdatum -> AND dstop.abrechnungstyp='MASSEN' -> INNER JOIN fsdaten fst USE INDEX (firma_id) ON fst.firma_id=d.firma_id AND fst.abr_stadt_id=d.stadt_id -> INNER JOIN fahrt ft ON ft.firma_id=d.firma_id AND ft.firma_id=fst.firma_id -> AND ft.stadt_id=fst.stadt_id -> AND ft.druckdatum='0000-00-00' -> AND ft.drdat_id IS NULL -> WHERE -> d.druckdatum!='1910-10-10' -> AND d.druckdatum!='1911-11-11' -> AND d.druckdatum!='1912-12-12' -> AND d.abrechnungstyp='MASSEN' -> AND d.firma_id IN (1,2,3,4,5,9,10,11,12,99,100,102,103,104,111, -> 200,201,202,203,204,205,206,209, -> 300,302,328,329,330,331,332,335,336,337,338,339, -> 340,341,342,343,344,345,346,347,348,349, -> 500,501,1000,2000,3000,4000, -> 5000,5100,5200,5300,5500,6000,7000,8000) -> AND dstop.id IS NULL; +----------+ | COUNT(*) | +----------+ | 66070 | +----------+ 1 row in set (0.92 sec) This is the execution plan by which the first query is executed: +----+-------------+-------+------+---------------------------------------------+------------+---------+-------------------------------------------+------+-------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------------------------------------+------------+---------+-------------------------------------------+------+-------------------------+ | 1 | SIMPLE | d | ALL | uniqueindex,firma_id | NULL | NULL | NULL | 8616 | Using where | | 1 | SIMPLE | dstop | ref | uniqueindex,firma_id | firma_id | 8 | test.d.firma_id,test.d.stadt_id | 115 | Using where; Not exists | | 1 | SIMPLE | fst | ref | firma_id | firma_id | 8 | test.d.firma_id,test.d.stadt_id | 5 | | | 1 | SIMPLE | ft | ref | druckdatum,drdat_id_rechnung_neu,neu_relauf | neu_relauf | 13 | test.fst.firma_id,const,test.fst.stadt_id | 8 | Using where | +----+-------------+-------+------+---------------------------------------------+------------+---------+-------------------------------------------+------+-------------------------+ 4 rows in set (0.02 sec) The second query is equivalent to the first one differing from the latter only by omission of the STRAIGHT_JOIN option under SELECT. Yet an execution of the second query takes much more time: mysql> SELECT -> COUNT(*) -> FROM drdat d -> LEFT JOIN drdat dstop ON d.firma_id=dstop.firma_id AND d.stadt_id=dstop.stadt_id -> AND d.druckdatum<dstop.druckdatum -> AND dstop.abrechnungstyp='MASSEN' -> INNER JOIN fsdaten fst USE INDEX (firma_id) ON fst.firma_id=d.firma_id AND fst.abr_stadt_id=d.stadt_id -> INNER JOIN fahrt ft ON ft.firma_id=d.firma_id AND ft.firma_id=fst.firma_id -> AND ft.stadt_id=fst.stadt_id -> AND ft.druckdatum='0000-00-00' -> AND ft.drdat_id IS NULL -> WHERE -> d.druckdatum!='1910-10-10' -> AND d.druckdatum!='1911-11-11' -> AND d.druckdatum!='1912-12-12' -> AND d.abrechnungstyp='MASSEN' -> AND d.firma_id IN (1,2,3,4,5,9,10,11,12,99,100,102,103,104,111, -> 200,201,202,203,204,205,206,209, -> 300,302,328,329,330,331,332,335,336,337,338,339, -> 340,341,342,343,344,345,346,347,348,349, -> 500,501,1000,2000,3000,4000, -> 5000,5100,5200,5300,5500,6000,7000,8000) -> AND dstop.id IS NULL; +----------+ | COUNT(*) | +----------+ | 66070 | +----------+ 1 row in set (8 min 27.59 sec) The second query is executed by the following plan: +----+-------------+-------+------+---------------------------------------------+------------+---------+-------------------------------------------+------+-------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------------------------------------+------------+---------+-------------------------------------------+------+-------------------------+ | 1 | SIMPLE | fst | ALL | firma_id | NULL | NULL | NULL | 325 | Using where | | 1 | SIMPLE | ft | ref | druckdatum,drdat_id_rechnung_neu,neu_relauf | neu_relauf | 13 | test.fst.firma_id,const,test.fst.stadt_id | 8 | Using where | | 1 | SIMPLE | d | ref | uniqueindex,firma_id | firma_id | 8 | test.ft.firma_id,test.fst.abr_stadt_id | 115 | Using where | | 1 | SIMPLE | dstop | ref | uniqueindex,firma_id | firma_id | 8 | test.d.firma_id,test.fst.abr_stadt_id | 115 | Using where; Not exists | +----+-------------+-------+------+---------------------------------------------+------------+---------+-------------------------------------------+------+-------------------------+ -- the comment to be continued
[31 Jul 2006 1:39]
Igor Babaev
-- continuation of the previous comment The estimated cost of the second plan is really less than the estimated cost of the first plan. That's why the optimizer selects the second plan when we don't use the STRAIGHT_JOIN option. The estimated number of rows for the first plan is equal to 8616*115*5*8=39642800, while for the second plan the estimated number of rows is equal to 325*8*115*115=34385000. Note with 'Not exists' optimization the fanout for the left join operation must be 1 rather then 115. This is abug and it will bw fixed later. After this fix the expected number of rows will be for the first query: 8616*1*5*8=344640 for the second query: 325*8*115*1=299000. In reality we have the following cardinalities for partial joins. first, quick query: we have: table to join cardinality of partial join real fanout d 1856 0.2154 dstop 57 0.0307 fst 292 5.12 ft 66070 226 instead of predicted: table to join cardinality of partial join real fanout d 8616 1 dstop 8616 1 fst 43080 5 ft 344640 8 second, slow query: we have: table to join cardinality of partial join real fanout fst 325 1 ft 66101 203 d 3065772 46 dstop 66070 0.02 instead of predicted: table to join cardinality of partial join real fanout fst 325 1 ft 2600 8 d 229000 115 dstop 229000 1 We make a huge miss with our estimates by two reasons: 1. MySQL use a uniform simplistic model for distribution of values which does not work for the tables in the example (there the values are skewed). 2. We do not make a good estimate for selectivity of the condition in the antijoin: dstop ON d.firma_id=dstop.firma_id AND d.stadt_id=dstop.stadt_id AND d.druckdatum<dstop.druckdatum AND dstop.abrechnungstyp='MASSEN'. The high selectivity of this condition makes the execution by the first plan dramatically better than by second one. Yet we hardly can calculate this selectivity. The above analysis allows me to state that we won't be able to fix this problem until we introduce much more sophisticated database statistics ito MySQL.