Bug #36562 Falcon reports unrealistic row estimate with IN()
Submitted: 7 May 2008 11:55 Modified: 26 May 2010 17:50
Reporter: Philip Stoev Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.5 OS:Any
Assigned to: Ann Harrison CPU Architecture:Any
Tags: F_HANDLER, Optimizer, performance
Triage: Triaged: D3 (Medium)

[7 May 2008 11:55] Philip Stoev
Description:
When the IN() predicate is used in a query, Falcon reports a row estimate that may be substantially higher than the actual number of rows in the table.

How to repeat:
create table t1 (a int) engine=falcon;
insert into t1 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t2 (a int, filler char(200), key(a)) engine=falcon;

insert into t2 select C.a*2,   'no'  from t1 A, t1 B, t1 C;
insert into t2 select C.a*2+1, 'yes' from t1 C;

mysql> select count(*) from t2\G
*************************** 1. row ***************************
count(*): 1010
1 row in set (0.00 sec)

mysql> explain  select * from t2 where a IN (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: range
possible_keys: a
          key: a
      key_len: 5
          ref: NULL
         rows: 2626
        Extra: Using where
1 row in set (0.00 sec)

The table has way less than 2626 rows. 

mysql> optimize table t2;
+---------+----------+----------+----------+
| Table   | Op       | Msg_type | Msg_text |
+---------+----------+----------+----------+
| test.t2 | optimize | status   | OK       |
+---------+----------+----------+----------+
1 row in set (0.03 sec)

mysql> explain  select * from t2 where a IN (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: range
possible_keys: a
          key: a
      key_len: 5
          ref: NULL
         rows: 1300
        Extra: Using where
1 row in set (0.00 sec)

This is still not quite correct, but more acceptable.

mysql> alter table t2 engine = innodb;
Query OK, 1010 rows affected (0.48 sec)
Records: 1010  Duplicates: 0  Warnings: 0

mysql> explain  select * from t2 where a IN (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ALL
possible_keys: a
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1069
        Extra: Using where
1 row in set (0.00 sec)

This is almost correct.
[7 May 2008 12:27] Philip Stoev
Here is an example involving AND and <> operators:

CREATE TABLE t1 (   id int(11) NOT NULL auto_increment,   status varchar(20),   PRIMARY KEY  (id),   KEY (status) ) engine=falcon;

INSERT INTO t1 VALUES (1,'B'), (2,'B'), (3,'B'), (4,'B'), (5,'B'), (6,'B'), (7,'B'), (8,'B'), (9,'B'), (10,'B'), (11,'B'), (12,'B'), (13,'B'), (14,'B'), (15,'B'), (16,'B'), (17,'B'), (18,'B'), (19,'B'), (20,'B'), (21,'B'), (22,'B'), (23,'B'), (24,'B'), (25,'A'), (26,'A'), (27,'A'), (28,'A'), (29,'A'), (30,'A'), (31,'A'), (32,'A'), (33,'A'), (34,'A'), (35,'A'), (36,'A'), (37,'A'), (38,'A'), (39,'A'), (40,'A'), (41,'A'), (42,'A'), (43,'A'), (44,'A'), (45,'A'), (46,'A'), (47,'A'), (48,'A'), (49,'A'), (50,'A'), (51,'A'), (52,'A'), (53,'C'), (54,'C'), (55,'C'), (56,'C'), (57,'C'), (58,'C'), (59,'C'), (60,'C');

select count(*) from t1;

mysql> EXPLAIN SELECT * FROM t1 WHERE status <> 'A' and status <> 'b' and status <> 'c' and status <> 'd' and status <> 'e' and status <> 'f'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: status
          key: status
      key_len: 23
          ref: NULL
         rows: 140
        Extra: Using where
1 row in set (0.00 sec)

This is totally going to distort optimizer estimates for automatically generated queries, e.g. queries where some tool constructed a WHERE preducate containing a bunch of otherwise unnecessary expressions.
[15 Dec 2008 23:45] Kevin Lewis
Triage team, Please re-evaluate this beta tag.  This is a performance issue.
[18 Dec 2008 22:42] Omer Barnir
triage: changing tag from SR60BETA to SR60RC as the fixes rely on performance tuning that will be done during beta