Bug #20944 optimizer estimates more rows if condition gets narrowed / combined key / range
Submitted: 10 Jul 2006 15:25 Modified: 19 Oct 2006 2:13
Reporter: Martin Friebe (Gold Quality Contributor) (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:4.1.22-BK, 4.1.20, 5.0.22, 5.0.25-BK OS:Linux (Linux, freebsd)
Assigned to: Igor Babaev CPU Architecture:Any

[10 Jul 2006 15:25] Martin Friebe
Description:
using range access on 2 parts of a multicolumn index, mysql expects more rows, if the condition is narrowed down.

The example below demonstrates the problem only in mysql 4.1.
(note in addition to returning more rows with the stricter where, mysql does expect more rows than the table contains)

In Mysql 5.0, the example below will return 1024 for both queries, which is the total of rows in the table. (in this mysql 5.0, has decrease in the optimizer, as it expects to many rows, for the select wich limits only the 1st part of the index).
The problem of increasing expected rows while adding an AND condition, can be reproduced for mysql 5.0 by using the script and sql from bug #20932

sql for mysql 4.1

create table i1 (a int not null, b int not null, index (a,b));
insert into i1 values (1,2), (1,3), (2,3), (2,4);
insert into i1 select a+5, b*2 from i1;
insert into i1 select a+5, b*2 from i1;
insert into i1 select a+5, b*2 from i1;
insert into i1 select a+5, b*2 from i1;
insert into i1 select a+5, b*2 from i1;
insert into i1 select a+5, b*2 from i1;
insert into i1 select a+5, b*2 from i1;
insert into i1 select a+5, b*2 from i1;
analyze table i1;
select count(*) from i1;
show index from i1;
explain select count(*) from i1 where a in (11,12,16,17,21,22,26,27,31,32,36,37);
explain select count(*) from i1 where a in (11,12,16,17,21,22,26,27,31,32,36,37) and b in (8,12,16,24,32,48,64,96,128,192,256);
drop table i1;

for mysql 5.0 use the data generated by the script in bug #20932 , abd run the query printed by the script, with the full condition vs only the condition on the the coumn "n"
(you might want to drop the index "t (t,n)", but it does not make a difference

How to repeat:
output from mysql 4.1

select count(*) from i1;
+----------+
| count(*) |
+----------+
|     1024 |
+----------+
mysql> show index from i1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| i1    |          1 | a        |            1 | a           | A         |          17 |     NULL | NULL   |      | BTREE      |         |
| i1    |          1 | a        |            2 | b           | A         |          36 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)

mysql> explain select count(*) from i1 where a in (11,12,16,17,21,22,26,27,31,32,36,37);
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | i1    | range | a             | a    |       4 | NULL |  988 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
# condition only on the first column, the expectation is reasonable, and accurate
# note that mysql 5.0 has decreased optimization as it expects 1024 rows for this one.

mysql> explain select count(*) from i1 where a in (11,12,16,17,21,22,26,27,31,32,36,37) and b in (8,12,16,24,32,48,64,96,128,192,256);
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | i1    | range | a             | a    |       8 | NULL | 1046 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
# condition on first row is still the same, condition on 2nd row added.
# expected rows is bigger than on previous query, this must be wrong.
# also expected rows is biger than total count of rows in table
# mysql 5.0 does again cut of at 1024, it seems mysql5.0 realizes that there cannot be more rows expected than contained in the table

--------
output (snippets) from mysql 5.0
after running the perl script
index "t (t,n) has been dropped, the remining index covers (n,t)

explain select count(*) from i1 where n in (...)
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    | +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ |  1 | SIMPLE      | i1    | range | n             | n    | 23      | NULL | 1081 | Using where; Using index | +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ 1 row in set (0.15 sec) 

explain select count(*) from i1 where n in (...) and t in (....)

+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | i1    | range | n             | n    | 28      | NULL | 70577 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+-------+--------------------------+

#that is a lot more rows

Suggested fix:
might relate to the strange timings on similiar queries reported in #20932

also note that mysql 5.0 does estimate the first select less accurate than mysql 4.1 does.
[24 Jul 2006 14:03] Valeriy Kravchuk
Verified just as described with 5.0.25-BK on Linux (with data from bug #20932 and index `t` dropped):

mysql> explain select count(*) from i1 where n in ("364900570446472",...)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: i1
         type: range
possible_keys: n
          key: n
      key_len: 23
          ref: NULL
         rows: 1081
        Extra: Using where; Using index
1 row in set (0.05 sec)

mysql> explain select count(*) from i1 where n in ("364900570446472",...) and t in (559,...)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: i1
         type: range
possible_keys: n
          key: n
      key_len: 28
          ref: NULL
         rows: 70577
        Extra: Using where; Using index
1 row in set (11.95 sec)

And 4.1.22-BK on Linux, with test case from this bug report:

openxs@suse:~/dbs/4.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 4.1.22

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table i1 (a int not null, b int not null, index (a,b));
Query OK, 0 rows affected (0.06 sec)

emysql> insert into i1 values (1,2), (1,3), (2,3), (2,4);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into i1 select a+5, b*2 from i1;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into i1 select a+5, b*2 from i1;
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

imysql> insert into i1 select a+5, b*2 from i1;
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> insert into i1 select a+5, b*2 from i1;
Query OK, 32 rows affected (0.00 sec)
iRecords: 32  Duplicates: 0  Warnings: 0

mysql> insert into i1 select a+5, b*2 from i1;
Query OK, 64 rows affected (0.01 sec)
Records: 64  Duplicates: 0  Warnings: 0

mysql> insert into i1 select a+5, b*2 from i1;
Query OK, 128 rows affected (0.00 sec)
Records: 128  Duplicates: 0  Warnings: 0

mysql> insert into i1 select a+5, b*2 from i1;
iQuery OK, 256 rows affected (0.01 sec)
Records: 256  Duplicates: 0  Warnings: 0

mysql> insert into i1 select a+5, b*2 from i1;
1Query OK, 512 rows affected (0.01 sec)
Records: 512  Duplicates: 0  Warnings: 0

mysql> analyze table i1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.i1 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> select count(*) from i1;
+----------+
| count(*) |
+----------+
|     1024 |
+----------+
1 row in set (0.00 sec)

mysql> show index from i1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| i1    |          1 | a        |            1 | a           | A         |    17 |     NULL | NULL   |      | BTREE      |         |
| i1    |          1 | a        |            2 | b           | A         |    36 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)

mysql> explain select count(*) from i1 where a in
    -> (11,12,16,17,21,22,26,27,31,32,36,37);
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | i1    | range | a             | a    |       4 | NULL |  988 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

emysql> explain select count(*) from i1 where a in (11,12,16,17,21,22,26,27,31,32,36,37)
    -> and b in (8,12,16,24,32,48,64,96,128,192,256);
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | i1    | range | a             | a    |       8 | NULL | 1046 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
[19 Oct 2006 2:13] Igor Babaev
This is not a bug as the optimizer can make only inxact estimate on
the number of expected rows. It makes this estimate using all
range conditions in the WHERE clause.