| Bug #36562 | Falcon reports unrealistic row estimate with IN() | ||
|---|---|---|---|
| Submitted: | 7 May 2008 11:55 | Modified: | 26 May 2010 17:50 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Unsupported | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
| Version: | 6.0.5 | OS: | Any |
| Assigned to: | Ann Harrison | CPU Architecture: | Any |
| Tags: | F_HANDLER, Optimizer, performance | ||
[7 May 2008 12:27]
Philip Stoev
Here is an example involving AND and <> operators:
CREATE TABLE t1 ( id int(11) NOT NULL auto_increment, status varchar(20), PRIMARY KEY (id), KEY (status) ) engine=falcon;
INSERT INTO t1 VALUES (1,'B'), (2,'B'), (3,'B'), (4,'B'), (5,'B'), (6,'B'), (7,'B'), (8,'B'), (9,'B'), (10,'B'), (11,'B'), (12,'B'), (13,'B'), (14,'B'), (15,'B'), (16,'B'), (17,'B'), (18,'B'), (19,'B'), (20,'B'), (21,'B'), (22,'B'), (23,'B'), (24,'B'), (25,'A'), (26,'A'), (27,'A'), (28,'A'), (29,'A'), (30,'A'), (31,'A'), (32,'A'), (33,'A'), (34,'A'), (35,'A'), (36,'A'), (37,'A'), (38,'A'), (39,'A'), (40,'A'), (41,'A'), (42,'A'), (43,'A'), (44,'A'), (45,'A'), (46,'A'), (47,'A'), (48,'A'), (49,'A'), (50,'A'), (51,'A'), (52,'A'), (53,'C'), (54,'C'), (55,'C'), (56,'C'), (57,'C'), (58,'C'), (59,'C'), (60,'C');
select count(*) from t1;
mysql> EXPLAIN SELECT * FROM t1 WHERE status <> 'A' and status <> 'b' and status <> 'c' and status <> 'd' and status <> 'e' and status <> 'f'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: range
possible_keys: status
key: status
key_len: 23
ref: NULL
rows: 140
Extra: Using where
1 row in set (0.00 sec)
This is totally going to distort optimizer estimates for automatically generated queries, e.g. queries where some tool constructed a WHERE preducate containing a bunch of otherwise unnecessary expressions.
[15 Dec 2008 23:45]
Kevin Lewis
Triage team, Please re-evaluate this beta tag. This is a performance issue.
[18 Dec 2008 22:42]
Omer Barnir
triage: changing tag from SR60BETA to SR60RC as the fixes rely on performance tuning that will be done during beta

Description: When the IN() predicate is used in a query, Falcon reports a row estimate that may be substantially higher than the actual number of rows in the table. How to repeat: create table t1 (a int) engine=falcon; insert into t1 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table t2 (a int, filler char(200), key(a)) engine=falcon; insert into t2 select C.a*2, 'no' from t1 A, t1 B, t1 C; insert into t2 select C.a*2+1, 'yes' from t1 C; mysql> select count(*) from t2\G *************************** 1. row *************************** count(*): 1010 1 row in set (0.00 sec) mysql> explain select * from t2 where a IN (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t2 type: range possible_keys: a key: a key_len: 5 ref: NULL rows: 2626 Extra: Using where 1 row in set (0.00 sec) The table has way less than 2626 rows. mysql> optimize table t2; +---------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+----------+----------+----------+ | test.t2 | optimize | status | OK | +---------+----------+----------+----------+ 1 row in set (0.03 sec) mysql> explain select * from t2 where a IN (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t2 type: range possible_keys: a key: a key_len: 5 ref: NULL rows: 1300 Extra: Using where 1 row in set (0.00 sec) This is still not quite correct, but more acceptable. mysql> alter table t2 engine = innodb; Query OK, 1010 rows affected (0.48 sec) Records: 1010 Duplicates: 0 Warnings: 0 mysql> explain select * from t2 where a IN (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t2 type: ALL possible_keys: a key: NULL key_len: NULL ref: NULL rows: 1069 Extra: Using where 1 row in set (0.00 sec) This is almost correct.