MySQL Bugs: #1274: optimizer bug - chooses inappriate index/filesort instead doing table scan.

Bug #1274	optimizer bug - chooses inappriate index/filesort instead doing table scan.
Submitted:	13 Sep 2003 14:39	Modified:	17 Oct 2003 7:52
Reporter:	Gunnar von Boehn	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	4.0.14	OS:	Linux (Linux (GNU/Debian))
Assigned to:	Konstantin Osipov	CPU Architecture:	Any

Description:
Hello,

I think I have found an optimizer bug.

I would expect that the lower query runs a full table scan which would take about one sec, instead doing this the optimizer chooses to access every row
complicated through an index/filesort - this takes over 30 secs.

It looks like the optimizer gets confused that
the coloumn which is used for ordering has an index.

mysql> analyze table races;
| greyhounddata.races | analyze | status   | Table is already up to date |

The problematic query:
mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100;
Empty set (36.46 sec)

Without the order its very fast:
mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' LIMIT 0,100;
Empty set (1.26 sec)

Without DESC the order is faster but still much slower than no order.
mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date LIMIT 0,100;
Empty set (3.13 sec)

mysql> explain SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100;
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| table | type | possible_keys | key  | key_len | ref  | rows   | Extra                       |
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| races | ALL  | NULL          | NULL |    NULL | NULL | 505821 | Using where; Using filesort |
+-------+------+---------------+------+---------+------+--------+-----------------------------+

mysql> explain SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date  LIMIT 0,100;
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| table | type | possible_keys | key  | key_len | ref  | rows   | Extra                       |
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| races | ALL  | NULL          | NULL |    NULL | NULL | 505821 | Using where; Using filesort |
+-------+------+---------------+------+---------+------+--------+-----------------------------+

mysql> explain SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name   LIKE '%Madison%'   LIMIT 0,100;
+-------+------+---------------+------+---------+------+--------+-------------+
| table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+-------+------+---------------+------+---------+------+--------+-------------+
| races | ALL  | NULL          | NULL |    NULL | NULL | 505821 | Using where |
+-------+------+---------------+------+---------+------+--------+-------------+

mysql> explain races;
+-------------------+-----------------------+------+-----+---------+----------------+
| Field             | Type                  | Null | Key | Default | Extra          |
+-------------------+-----------------------+------+-----+---------+----------------+
| race_id           | mediumint(8) unsigned |      | PRI | NULL    | auto_increment |
| race_round        | tinyint(3) unsigned   |      |     | 0       |                |
| race_grade        | tinyint(3) unsigned   |      |     | 0       |                |
| race_gradeo       | char(3)               |      |     |         |                |
| race_class        | tinyint(3) unsigned   |      | MUL | 0       |                |
| race_heat         | tinyint(3) unsigned   |      |     | 0       |                |
| race_date         | mediumint(8) unsigned |      | MUL | 0       |                |
| race_dist         | smallint(5) unsigned  |      |     | 0       |                |
| race_prize        | mediumint(8) unsigned |      |     | 0       |                |
| race_stadiumid    | smallint(5) unsigned  |      | MUL | 0       |                |
| race_name         | varchar(50)           |      |     |         |                |
| race_comment      | text                  |      |     |         |                |
| race_wintime      | smallint(5) unsigned  |      | MUL | 0       |                |
| race_windogid     | mediumint(8) unsigned |      |     | 0       |                |
| race_editorid     | smallint(6)           |      |     | 0       |                |
| race_timestamp    | timestamp(14)         | YES  | MUL | NULL    |                |
| race_timestamp_cr | timestamp(14)         | YES  |     | NULL    |                |
+-------------------+-----------------------+------+-----+---------+----------------+

How to repeat:
CREATE TABLE `races` (
  `race_id` mediumint(8) unsigned NOT NULL auto_increment,
  `race_round` tinyint(3) unsigned NOT NULL default '0',
  `race_grade` tinyint(3) unsigned NOT NULL default '0',
  `race_gradeo` char(3) NOT NULL default '',
  `race_class` tinyint(3) unsigned NOT NULL default '0',
  `race_heat` tinyint(3) unsigned NOT NULL default '0',
  `race_date` mediumint(8) unsigned NOT NULL default '0',
  `race_dist` smallint(5) unsigned NOT NULL default '0',
  `race_prize` mediumint(8) unsigned NOT NULL default '0',
  `race_stadiumid` smallint(5) unsigned NOT NULL default '0',
  `race_name` varchar(50) NOT NULL default '',
  `race_comment` text NOT NULL,
  `race_wintime` smallint(5) unsigned NOT NULL default '0',
  `race_windogid` mediumint(8) unsigned NOT NULL default '0',
  `race_editorid` smallint(6) NOT NULL default '0',
  `race_timestamp` timestamp(14) NOT NULL,
  `race_timestamp_cr` timestamp(14) NOT NULL,
  PRIMARY KEY  (`race_id`),
  KEY `ind_race_stadiumid` (`race_stadiumid`),
  KEY `ind_race_date` (`race_date`),
  KEY `ind_race_wintime` (`race_wintime`),
  KEY `ind_race_class` (`race_class`),
  KEY `race_timestamp` (`race_timestamp`)
) TYPE=MyISAM PACK_KEYS=0 AUTO_INCREMENT=513960 ;

I assume the optimzer problem is of general nature.
But if you need a dump of my data to verufy the bug
please get back to me.

Suggested fix:
Please fix the optimizer or give me some hints
howto work around this bug.

Keep up the good work

Maybe this info may help to explain the optimizers choice.

Keyname  	   Type  	Cardinality  	
PRIMARY 	   PRIMARY 	505821  	
ind_race_stadiumid INDEX 	332  	        
ind_race_date 	   INDEX 	4014  	        
ind_race_wintime   INDEX 	4250  	        
ind_race_class 	   INDEX 	9  	        
race_timestamp 	   INDEX 	168607

Thank you for a lot of information.

This is not a bug. 

You're using SQL_CALC_FOUND_ROWS in your case, this means MySQL will need to build  whole result set anyway, even if it does not send it to the client.

In such case the alternative is full table scan in "race_date" index order, which would result in a lot of random IO or do full table scan and filesort.

The last one is usually faster as the IO is not random. 

If you will drop SQL_CALC_FOUND_ROWS MySQL shall probably use index instead (as it will need to scan only few several rows to get first number of records)

If you think index scan is still better for you, please use FORCE INDEX(ind_race_date)

Good luck.

Hi Peter,

thanks for your answer.
I still think that this is a bug!
The optimizer needs 35 second longer to sort an empty result set.

>If you will drop SQL_CALC_FOUND_ROWS MySQL shall probably use index 
>instead (as it will need to scan only few several rows to get first 
>number of records) 

No, SQL_CALC_FOUND_ROWS has no influence on the query time.

mysql> SELECT SQL_CALC_FOUND_ROWS race_name  FROM races WHERE  race_name LIKE '%Madison%' ORDER BY race_date DESC  LIMIT 0,100;
Empty set (37.21 sec)

mysql> SELECT race_name   FROM races WHERE  race_name LIKE '%Madison%' ORDER BY race_date DESC  LIMIT 0,100;
Empty set (37.28 sec)

mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE  race_name LIKE '%Madison%' LIMIT 0,100;
Empty set (1.21 sec)

mysql> SELECT race_name FROM races WHERE  race_name LIKE '%Madison%' LIMIT  0,100;
Empty set (1.23 sec)

If I force to use the key is needs exactly the same time as the table scan should need. Could it be that the optimizer is accidently using the key all
the time instead of doing a table scann ?

mysql> SELECT race_name FROM races FORCE INDEX(ind_race_date) WHERE  race_name  LIKE '%Madison%' ORDER BY race_date DESC  LIMIT 0,100 ;
Empty set (37.36 sec)

mysql> EXPLAIN SELECT race_name   FROM races IGNORE INDEX(ind_race_date) WHERE  race_name  LIKE '%Madison%' ORDER BY race_date DESC  LIMIT 0,100 ;
+-------+-------+---------------+---------------+---------+------+--------+-------------+
| table | type  | possible_keys | key           | key_len | ref  | rows   | Extra       |
+-------+-------+---------------+---------------+---------+------+--------+-------------+
| races | index | NULL          | ind_race_date |       3 | NULL | 505821 | Using where |
+-------+-------+---------------+---------------+---------+------+--------+-------------+
1 row in set (0.00 sec)

+-------+-------+---------------+---------------+---------+------+--------+-------------+
| table | type  | possible_keys | key           | key_len | ref  | rows   | Extra       |
+-------+-------+---------------+---------------+---------+------+--------+-------------+
| races | index | NULL          | ind_race_date |       3 | NULL | 505821 | Using where |
+-------+-------+---------------+---------------+---------+------+--------+-------------+

Kind regards
Gunnar

Dear Gunnar,

Yes it realy does not sounds right. However note - you get slow speed even in case index is used for the scan, so there shall be something different.

I'm also surprised you get index scan, while IGNORE INDEX is specified.

In any case that would be great if you could upload the data you used to repeat the problem to ftp://supprt.mysql.com/pub/mysql/secret

Just make sure you compress table before uploading.

Dear Peter,

I've uploaded a tgz containing the table and index file.
The file is on your ftp server and its called "bug1274_testcase.tgz"

I hope that you can help me with a fix or workaround
as the current 35 second delay basicly kills our webserver.

Kind regards
Gunnar

Thank you for the good bug report.

Even before getting your files I identified issue with IGNORE INDEX does not really ignore index, and added it as bug #1296 with simple test case. Please follow it for that problem solution.

After that bug is closed you shall be able to use use IGNORE INDEX(ind_race_date) to fix the problem.

The problem was in index usage, not filesort as I initially thought. In case full table scan is used with sort of empty set the speed is good:

mysql> SELECT race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100;
Empty set (1.49 sec)

mysql> explain SELECT race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100;
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| table | type | possible_keys | key  | key_len | ref  | rows   | Extra                       |
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| races | ALL  | NULL          | NULL |    NULL | NULL | 505821 | Using where; Using filesort |
+-------+------+---------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)

To avoid index usage I just dropped it before. 

I would still hold this bug open due to the following behavior:

mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100;
Empty set (38.59 sec)

mysql> explain SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE '%Madison%' ORDER BY race_date DESC LIMIT 0,100;
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| table | type | possible_keys | key  | key_len | ref  | rows   | Extra                       |
+-------+------+---------------+------+---------+------+--------+-----------------------------+
| races | ALL  | NULL          | NULL |    NULL | NULL | 505821 | Using where; Using filesort |
+-------+------+---------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)

As you might see the explain is "good" in this case while speed is the same as in case of using index access mode.

Dear Peter,

Thanks for identifying the bug.
As you suggested I droped the index as a quick fix to speed up the server.

I'm looking forward to see a smarter optimiser
in the next release. :-)

Kind regards
Gunnar

This is not a bug in the Optimiser: the execution plan for your query is
correct in both cases. The only difference between two plans is that for the
first one MySQL scans index on race_date in forward order, and for the last
- in backward order (we currently don't output this difference in EXPLAIN)

In general, using INDEX SCAN for both queries is more efficient than TABLE
SCAN.
The optimiser can't determine how many rows satisfy WHERE clause, but likely
that a lot, because LIMIT 0, 100 is present. So using index is considered
better than reading the whole table.

The slowdown of the last query is caused by our key compression algorithm:
currently uncompression takes more time if keys are read in reverse order.
We have a task in our Work-Log to fix it.
Suggested workaround to speedup your query is

ALTER TABLE races PACK_KEYS=0;

I'm closing it as 'Not a Bug'. Feel free to reopen it if you still
consider information given above as a bug.
Thank you for your feedback.

Hi Konstantin,

Could you please explain why you closed this bug ?
Could it be that you didn't read the whole bug report properly ?

The problem is that adding an index to the table
slows the queries down by the factor of 30 !
I'm quiet sure that the execution plan the optimizer chooses is not OK.

With index on "race_date"

mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE
'%Madison%' ORDER BY race_date DESC LIMIT 0,100;
Empty set (36.46 sec)

After dropping the index on "race_date"

mysql> SELECT SQL_CALC_FOUND_ROWS race_name FROM races WHERE race_name LIKE
'%Madison%' ORDER BY race_date DESC LIMIT 0,100;
Empty set (1.26 sec)

Hello,

I have the feeling that we face more than one problem here.
This might be the reason that resolving this bugreport
is a little bit more confusing then it should be.

Found problems:

a) optimizer chooses index scan instead using a table scan.
b) reverse index scan on compressed index is very slow.
c) optimizer ignores the "IGNORE INDEX" command

To a)
As the query uses "SQL_CALC_FOUND_ROWS" and "LIKE '%text%'"
its clear that all records need to be accessed.

As far as I understood the MySQL documentation, it is faster
to do a table scan than to do an index scan if you need to
access many records. (or all records like in our case)

quote from mysql documentation:
"If the use of the index would require MySQL to access more than 30% of the rows in the table. (In this case a table scan is probably much faster, as this will require us to do much fewer seeks.) Note that if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, as it can much more quickly find the few rows to return in the result."

- index scan is faster if not many rows are accessed.
- if more then 30% of the rows need to be accesssed the optimzer uses
a full table scan as this is propably faster.
- if LIMIT is used the optimiser uses the index scan as it expects to
that the query will not access over 30% of the rows.

+ I think that both "SQL_CALC_FOUND_ROWS" and "LIKE '%somestring%'"
should tell the optimiser that he will end up accesing more then
30% of the rows and that the table scan will be much faster.

To b)
Its nice that you plan to improve the speed
of the compressed indexes in the future.

To c)
Its nice that Peter makes sure that the "IGNORE INDEX" command will be fixed.

I think that the optimisers current decision rules are to simple.
Assuming that using the index is the best choice for
every query which has a "LIMIT" clause doesn't work.
I think that this simple approach is not adequate anymore
at least since you introduced the "SQL_CALC_FOUND_ROWS" feature.

The current behavior will result in situations where mys

[sorry - accidently hit submit button]

The current behavior will result in situations where mysql is far to slow.

I'm sorry, if I lost patience here, but reporting this bug/unoptimal behavior
ended to be much more time consuming for me then expected.

On the other hand it could be that I'm totally wrong here and that the current optimiser behavior is still optimal - Please have a look and check
if the way the optimiser works is optimal or not.
I would appreciate if you would explain it then to me.

Thanks it advance

Gunnar

Hello ?

Could someone please answer and tell me
whether the behavior of the optimiser is normal
and desired or if this a deficit or bug.

Do I understand the situation correctly that the optimiser
uses the index to access the rows even if he needs to access all the rows ?
Could you please double check this and explain to me the reason ?

Thanks in advance
Kind regards

Gunnar

SQL_FOUND_ROWS is now taken into account by optimiser (bk commit - 4.0 tree (1.1577))
Wrong output of explain is moved to separate bug #1560