Bug #1274 | optimizer bug - chooses inappriate index/filesort instead doing table scan. | ||
---|---|---|---|
Submitted: | 13 Sep 2003 14:39 | Modified: | 17 Oct 2003 7:52 |
Reporter: | Gunnar von Boehn | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.0.14 | OS: | Linux (Linux (GNU/Debian)) |
Assigned to: | Konstantin Osipov | CPU Architecture: | Any |
[13 Sep 2003 14:39]
Gunnar von Boehn
[14 Sep 2003 2:38]
Gunnar von Boehn
Maybe this info may help to explain the optimizers choice. Keyname Type Cardinality PRIMARY PRIMARY 505821 ind_race_stadiumid INDEX 332 ind_race_date INDEX 4014 ind_race_wintime INDEX 4250 ind_race_class INDEX 9 race_timestamp INDEX 168607
[15 Sep 2003 10:57]
Peter Zaitsev
Thank you for a lot of information. This is not a bug. You're using SQL_CALC_FOUND_ROWS in your case, this means MySQL will need to build whole result set anyway, even if it does not send it to the client. In such case the alternative is full table scan in "race_date" index order, which would result in a lot of random IO or do full table scan and filesort. The last one is usually faster as the IO is not random. If you will drop SQL_CALC_FOUND_ROWS MySQL shall probably use index instead (as it will need to scan only few several rows to get first number of records) If you think index scan is still better for you, please use FORCE INDEX(ind_race_date) Good luck.
[16 Sep 2003 1:39]
Gunnar von Boehn
Hi Peter, thanks for your answer. I still think that this is a bug! The optimizer needs 35 second longer to sort an empty result set. >If you will drop SQL_CALC_FOUND_ROWS MySQL shall probably use index >instead (as it will need to scan only few several rows to get first >number of records) No, SQL_CALC_FOUND_ROWS has no influence on the query time. mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; Empty set (37.21 sec) mysql> SELECT race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; Empty set (37.28 sec) mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' LIMIT 0,100; Empty set (1.21 sec) mysql> SELECT race_name FROM races WHERE race_name LIKE '%Madison%' LIMIT 0,100; Empty set (1.23 sec) If I force to use the key is needs exactly the same time as the table scan should need. Could it be that the optimizer is accidently using the key all the time instead of doing a table scann ? mysql> SELECT race_name FROM races FORCE INDEX(ind_race_date) WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100 ; Empty set (37.36 sec) mysql> EXPLAIN SELECT race_name FROM races IGNORE INDEX(ind_race_date) WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100 ; +-------+-------+---------------+---------------+---------+------+--------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+-------+---------------+---------------+---------+------+--------+-------------+ | races | index | NULL | ind_race_date | 3 | NULL | 505821 | Using where | +-------+-------+---------------+---------------+---------+------+--------+-------------+ 1 row in set (0.00 sec) +-------+-------+---------------+---------------+---------+------+--------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+-------+---------------+---------------+---------+------+--------+-------------+ | races | index | NULL | ind_race_date | 3 | NULL | 505821 | Using where | +-------+-------+---------------+---------------+---------+------+--------+-------------+ Kind regards Gunnar
[16 Sep 2003 6:45]
Peter Zaitsev
Dear Gunnar, Yes it realy does not sounds right. However note - you get slow speed even in case index is used for the scan, so there shall be something different. I'm also surprised you get index scan, while IGNORE INDEX is specified. In any case that would be great if you could upload the data you used to repeat the problem to ftp://supprt.mysql.com/pub/mysql/secret Just make sure you compress table before uploading.
[16 Sep 2003 7:17]
Gunnar von Boehn
Dear Peter, I've uploaded a tgz containing the table and index file. The file is on your ftp server and its called "bug1274_testcase.tgz" I hope that you can help me with a fix or workaround as the current 35 second delay basicly kills our webserver. Kind regards Gunnar
[16 Sep 2003 8:10]
Peter Zaitsev
Thank you for the good bug report. Even before getting your files I identified issue with IGNORE INDEX does not really ignore index, and added it as bug #1296 with simple test case. Please follow it for that problem solution. After that bug is closed you shall be able to use use IGNORE INDEX(ind_race_date) to fix the problem. The problem was in index usage, not filesort as I initially thought. In case full table scan is used with sort of empty set the speed is good: mysql> SELECT race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; Empty set (1.49 sec) mysql> explain SELECT race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; +-------+------+---------------+------+---------+------+--------+-----------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+------+---------+------+--------+-----------------------------+ | races | ALL | NULL | NULL | NULL | NULL | 505821 | Using where; Using filesort | +-------+------+---------------+------+---------+------+--------+-----------------------------+ 1 row in set (0.00 sec) To avoid index usage I just dropped it before. I would still hold this bug open due to the following behavior: mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; Empty set (38.59 sec) mysql> explain SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; +-------+------+---------------+------+---------+------+--------+-----------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+------+---------+------+--------+-----------------------------+ | races | ALL | NULL | NULL | NULL | NULL | 505821 | Using where; Using filesort | +-------+------+---------------+------+---------+------+--------+-----------------------------+ 1 row in set (0.00 sec) As you might see the explain is "good" in this case while speed is the same as in case of using index access mode.
[17 Sep 2003 7:56]
Gunnar von Boehn
Dear Peter, Thanks for identifying the bug. As you suggested I droped the index as a quick fix to speed up the server. I'm looking forward to see a smarter optimiser in the next release. :-) Kind regards Gunnar
[19 Sep 2003 7:08]
Konstantin Osipov
This is not a bug in the Optimiser: the execution plan for your query is correct in both cases. The only difference between two plans is that for the first one MySQL scans index on race_date in forward order, and for the last - in backward order (we currently don't output this difference in EXPLAIN) In general, using INDEX SCAN for both queries is more efficient than TABLE SCAN. The optimiser can't determine how many rows satisfy WHERE clause, but likely that a lot, because LIMIT 0, 100 is present. So using index is considered better than reading the whole table. The slowdown of the last query is caused by our key compression algorithm: currently uncompression takes more time if keys are read in reverse order. We have a task in our Work-Log to fix it. Suggested workaround to speedup your query is ALTER TABLE races PACK_KEYS=0; I'm closing it as 'Not a Bug'. Feel free to reopen it if you still consider information given above as a bug. Thank you for your feedback.
[1 Oct 2003 8:03]
Gunnar von Boehn
Hi Konstantin, Could you please explain why you closed this bug ? Could it be that you didn't read the whole bug report properly ? The problem is that adding an index to the table slows the queries down by the factor of 30 ! I'm quiet sure that the execution plan the optimizer chooses is not OK. With index on "race_date" mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; Empty set (36.46 sec) After dropping the index on "race_date" mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100; Empty set (1.26 sec)
[1 Oct 2003 10:02]
Gunnar von Boehn
Hello, I have the feeling that we face more than one problem here. This might be the reason that resolving this bugreport is a little bit more confusing then it should be. Found problems: a) optimizer chooses index scan instead using a table scan. b) reverse index scan on compressed index is very slow. c) optimizer ignores the "IGNORE INDEX" command To a) As the query uses "SQL_CALC_FOUND_ROWS" and "LIKE '%text%'" its clear that all records need to be accessed. As far as I understood the MySQL documentation, it is faster to do a table scan than to do an index scan if you need to access many records. (or all records like in our case) quote from mysql documentation: "If the use of the index would require MySQL to access more than 30% of the rows in the table. (In this case a table scan is probably much faster, as this will require us to do much fewer seeks.) Note that if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, as it can much more quickly find the few rows to return in the result." - index scan is faster if not many rows are accessed. - if more then 30% of the rows need to be accesssed the optimzer uses a full table scan as this is propably faster. - if LIMIT is used the optimiser uses the index scan as it expects to that the query will not access over 30% of the rows. + I think that both "SQL_CALC_FOUND_ROWS" and "LIKE '%somestring%'" should tell the optimiser that he will end up accesing more then 30% of the rows and that the table scan will be much faster. To b) Its nice that you plan to improve the speed of the compressed indexes in the future. To c) Its nice that Peter makes sure that the "IGNORE INDEX" command will be fixed. I think that the optimisers current decision rules are to simple. Assuming that using the index is the best choice for every query which has a "LIMIT" clause doesn't work. I think that this simple approach is not adequate anymore at least since you introduced the "SQL_CALC_FOUND_ROWS" feature. The current behavior will result in situations where mys
[1 Oct 2003 10:16]
Gunnar von Boehn
[sorry - accidently hit submit button] The current behavior will result in situations where mysql is far to slow. I'm sorry, if I lost patience here, but reporting this bug/unoptimal behavior ended to be much more time consuming for me then expected. On the other hand it could be that I'm totally wrong here and that the current optimiser behavior is still optimal - Please have a look and check if the way the optimiser works is optimal or not. I would appreciate if you would explain it then to me. Thanks it advance Gunnar
[8 Oct 2003 8:37]
Gunnar von Boehn
Hello ? Could someone please answer and tell me whether the behavior of the optimiser is normal and desired or if this a deficit or bug. Do I understand the situation correctly that the optimiser uses the index to access the rows even if he needs to access all the rows ? Could you please double check this and explain to me the reason ? Thanks in advance Kind regards Gunnar
[17 Oct 2003 7:52]
Konstantin Osipov
SQL_FOUND_ROWS is now taken into account by optimiser (bk commit - 4.0 tree (1.1577)) Wrong output of explain is moved to separate bug #1560