Bug #25459 InnoDB prefers table scan over using primary key
Submitted: 8 Jan 2007 2:39 Modified: 28 Feb 2007 13:34
Reporter: Arjen Lentz Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0 OS:Any (any)
Assigned to: CPU Architecture:Any
Tags: innodb, Optimizer, primary key, table scan

[8 Jan 2007 2:39] Arjen Lentz
Description:
It appears that InnoDB prefers a full table scan over using the primary key.
This shows up in cases where the optimizer reckons a fair # of rows will need to be accessed.
But since InnoDB's primary key is clustered, using the primary key should *always* be faster than a full table scan regardless of the # of rows.

The negative effect of a full table scan is of course more visible with
 a) large tables (regardless of how many rows are needed from this table)
 b) joins. Doing unnecessary table scans is very bad in this scenario,
    depending on how the order in which the tables are read. And in fact,
    the "do a table scan" decision may change the join order!

How to repeat:
As above.

Suggested fix:
Optimize this situation. Not sure if it's something inside InnoDB or will need to be covered in the main optimizer.
[8 Jan 2007 13:19] Heikki Tuuri
Arjen,

I think Monty fixed the MySQL optimizer about 5 years ago, so that it knew that an index scan through the PRIMARY KEY is always faster than a table scan for clustered indexes (like in InnoDB and BDB). I wonder if some past version ofr MySQL optimized thsi correctly.

Is the table very small?

Regards,

Heikki
[9 Jan 2007 2:16] Arjen Lentz
Nop. That is, 10.000+ at least.
[28 Jan 2007 13:34] Valeriy Kravchuk
Sorry, but I can not repeat the behaviour described with 5.0.36-BK on Linux:

openxs@suse:~/dbs/5.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.36 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t25459 (c1 int auto_increment primary key, c2 char(100)) en
gine=InnoDB;
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t25459(c2) values('abc');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t25459(c2) select c2 from t25459;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t25459(c2) select c2 from t25459;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

...

mysql> insert into t25459(c2) select c2 from t25459;
Query OK, 8192 rows affected (0.17 sec)
Records: 8192  Duplicates: 0  Warnings: 0

mysql> select count(*) from t25459;
+----------+
| count(*) |
+----------+
|    16384 |
+----------+
1 row in set (0.01 sec)

mysql> explain select * from t25459 where c1 between 2 and 1000;
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  |
 rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
|  1 | SIMPLE      | t25459 | range | PRIMARY       | PRIMARY | 4       | NULL |
 1472 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
1 row in set (0.01 sec)

mysql> explain select * from t25459 where c1 between 2 and 10000;
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  |
 rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
|  1 | SIMPLE      | t25459 | range | PRIMARY       | PRIMARY | 4       | NULL |
 8249 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from t25459 where c1 between 2 and 15000;
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  |
 rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
|  1 | SIMPLE      | t25459 | range | PRIMARY       | PRIMARY | 4       | NULL |
 8249 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
1 row in set (0.00 sec)

mysql> analyze table t25459;
+-------------+---------+----------+----------+
| Table       | Op      | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| test.t25459 | analyze | status   | OK       |
+-------------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> explain select * from t25459 where c1 between 2 and 15000;
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  |
 rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
|  1 | SIMPLE      | t25459 | range | PRIMARY       | PRIMARY | 4       | NULL |
 8242 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from t25459;
+----+-------------+--------+------+---------------+------+---------+------+----
---+-------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | row
s  | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----
---+-------+
|  1 | SIMPLE      | t25459 | ALL  | NULL          | NULL | NULL    | NULL | 164
84 |       |
+----+-------------+--------+------+---------------+------+---------+------+----
---+-------+
1 row in set (0.00 sec)

mysql> explain select * from t25459 where c1>2;
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  |
 rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
|  1 | SIMPLE      | t25459 | range | PRIMARY       | PRIMARY | 4       | NULL |
 8242 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from t25459 where c1<15000;
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  |
 rows | Extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
|  1 | SIMPLE      | t25459 | range | PRIMARY       | PRIMARY | 4       | NULL |
 8242 | Using where |
+----+-------------+--------+-------+---------------+---------+---------+------+
------+-------------+
1 row in set (0.00 sec)

So, even if I select almost all rows, PRIMARY key is used. Any ideas on how to repeat the behaviour described are welcomed.
[1 Mar 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".