MySQL Bugs: #8662: optimizer, query takes 3 to 10 times as long when using index

Bug #8662	optimizer, query takes 3 to 10 times as long when using index
Submitted:	21 Feb 2005 21:31	Modified:	30 Nov 2009 17:25
Reporter:	Martin Friebe (Gold Quality Contributor) (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	6.0.14, 5.1.12-BK, 4.1.10	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	qc

Description:
Well, I kow ultimatley, it is my problem to design my tables.

However I believe the example below can be generalized, and the optimizer can detect cases with have similiar conditions, so a table scan would be prefered against a index

the how to repeat contains two tables. 
In real life the first table contains more rows, but a where on an index selects about 200 rows

the 2nd table contains the values to be looked up

executing each of the example queries tkaes between 3 and 4 times as long, if j2 has an index on column b.

this is even more drastic, if the condition for b<19 or length(c)<101 is lowered down. In which case the query without index benefits, while the indexed query still takes longer (more than 10 times as long as without index).

the condition for a is a dummy and probably not needed, as I said before, in the original table, the columsn in j1 are selcted by an index, on a column not used in the join.

I should also add that I have ordered both table by the join column (alter table order by), without noticable difference)

I dont know how mysql executes and retrieves the data, depending on having an index or not, the time to read the index alone can not explain the time difference (IMHO)

from the explain, there would be 8 million rows to be examined, without index, and only 400000 with index. However mysql must apply some kind of optimization yo the query without index.
This optimization does not apply on the index query. It seems the optimizer does not take this into consideration, and therefore choses a plan, that is much slower (up to 10 times)

My guess is that without index, the 2nd table is only scanned once (40000 rows is 1/10 of 400000), as soon as the index gets applied this is not happening anymore.

if that is the case, the optimizer should compare 40thsd against 400thsd, instead of 8million against 400 thsd

How to repeat:
create table j1 (a int); 
create table j2 (b int, c char(1000));

# fill the first table, with 200 entries: values from 1 to 20 each 10 times
perl -wle 'print " insert into j1 values ".join(", ", map( {"($_)"} (1..20) )).";" for (1..10); '
 | mysql test

# 2nd table 40000 rows: values 1 to 20; each 2000 times
# additional text colum, to fill diskspace and hold a search criteria (length of text)
# the length of text is 20 times the value of a for each row
perl -wle 'print " insert into j2 values ".join(", ", map( {"($_, (\"".("a" x (20*$_))."\"))"} (1..20) )).";" for (1..2000); '
 | mysql test

# example 1 (with counts)
alter table j2 drop index b;  reset query cache;
select count(c) from j1 left join j2 on a=b where a > 2 and b < 19 and length(c) <101 limit 2; select found_rows();
alter table j2 add index (b); reset query cache;
select count(c) from j1 left join j2 on a=b where a > 2 and b < 19 and length(c) <101 limit 2; select found_rows();

#example 2 (using SQL_CALC_FOUND_ROWS to execute till the last line)
alter table j2 drop index b;  reset query cache;
select SQL_CALC_FOUND_ROWS c from j1 left join j2 on a=b where a > 2 and b < 19 and length(c) <101 limit 2; select found_rows();
alter table j2 add index (b); reset query cache;
select SQL_CALC_FOUND_ROWS c from j1 left join j2 on a=b where a > 2 and b < 19 and length(c) <101 limit 2; select found_rows();

# EXPLAINS
#with index
 explain select count(c) from j1 left join j2 on a=b where a > 2 and b < 19 and length(c) <101 limit 2;
+----+-------------+-------+------+---------------+------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref      | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+----------+------+-------------+
|  1 | SIMPLE      | j1    | ALL  | NULL          | NULL |    NULL | NULL     |  200 | Using where |
|  1 | SIMPLE      | j2    | ref  | b             | b    |       5 | xxx.j1.a | 2000 | Using where |
+----+-------------+-------+------+---------------+------+---------+----------+------+-------------+

#without index
 explain select count(c) from j1 left join j2 on a=b where a > 2 and b < 19 and length(c) <101 limit 2;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | j1    | ALL  | NULL          | NULL |    NULL | NULL |   200 | Using where |
|  1 | SIMPLE      | j2    | ALL  | NULL          | NULL |    NULL | NULL | 40000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+

Suggested fix:
find why the query runs faster without index, get the optimizer to take this into calculation

Thank you for a problem report. I think, it is not a feature request. If, without index used, query runs much faster (8+ times faster in my case), and this can be proved also with:

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(c) from j1 left join j2 ignore index (b) on a=b where a > 2
 and b < 19 and length(c) <101 limit 2;
+----------+
| count(c) |
+----------+
|    60000 |
+----------+
1 row in set (0.65 sec)

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(c) from j1 left join j2 on a=b where a > 2 and b < 19 and l
ength(c) <101 limit 2;
+----------+
| count(c) |
+----------+
|    60000 |
+----------+
1 row in set (5.40 sec)

then it is serious bug in optimizer, affecting performance. Verified with latest 5.1.12-BK!

I had the following results:
mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.0.26-debug |
+--------------+
1 row in set (0.00 sec)
mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(c) from j1 left join j2 ignore index (b) on a=b where a > 2  and b < 19 and length(c) <101;
+----------+
| count(c) |
+----------+
|    60000 |
+----------+
1 row in set (1.05 sec)

mysql> show status like "Last%";
+-----------------+----------------+
| Variable_name   | Value          |
+-----------------+----------------+
| Last_query_cost | 1602172.309547 |
+-----------------+----------------+
1 row in set (0.01 sec)

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(c) from j1 left join j2 on a=b where a > 2 and b < 19 and length(c) <101;
+----------+
| count(c) |
+----------+
|    60000 |
+----------+
1 row in set (3.05 sec)

mysql> show status like "Last%";
+-----------------+---------------+
| Variable_name   | Value         |
+-----------------+---------------+
| Last_query_cost | 480002.340797 |
+-----------------+---------------+
1 row in set (0.00 sec)

So really a faster execution plan is estimated as more expensive.
This is another evidence that the current cost model is far from
being perfect.

Yet we can't change the cost model in the current versions. We'll try to fix the 
problem in 5.2.

Re-verified with recent mysql-6.0-codebase:

77-52-7-73:6.0-codebase openxs$ bin/mysql -uroot test
Reading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 6.0.14-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table j2 add index (b);
Query OK, 40000 rows affected (0.98 sec)
Records: 40000  Duplicates: 0  Warnings: 0

mysql> select count(c) from j1 left join j2 ignore index (b) on a=b where a > 2  and b < 19 and length(c) <101;
+----------+
| count(c) |
+----------+
|    60000 |
+----------+
1 row in set (0.63 sec)

mysql> show status like "Last%";
+-----------------+----------------+
| Variable_name   | Value          |
+-----------------+----------------+
| Last_query_cost | 1602543.403297 |
+-----------------+----------------+
1 row in set (0.00 sec)

mysql> reset query cache;
Query OK, 0 rows affected (0.00 sec)

mysql>  select count(c) from j1 left join j2 on a=b where a > 2 and b < 19 and length(c)
    -> <101;
+----------+
| count(c) |
+----------+
|    60000 |
+----------+
1 row in set (2.61 sec)

mysql> show status like "Last%";
+-----------------+---------------+
| Variable_name   | Value         |
+-----------------+---------------+
| Last_query_cost | 480002.340797 |
+-----------------+---------------+
1 row in set (0.00 sec)

We still have this bug even in the latest optimizer (as cost model had not really changed notably).