Bug #580 Join is not optimized well if IN (<VALUES>) is used
Submitted: 4 Jun 2003 2:50 Modified: 26 Jun 2003 9:54
Reporter: Peter Zaitsev (Basic Quality Contributor) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0 OS:
Assigned to: Igor Babaev CPU Architecture:Any

[4 Jun 2003 2:50] Peter Zaitsev
Description:
This problem seems to exist in MySQL 3.23 as well, but it does not strikes that
hard as it seems to have lower value to prefer index scan against full table scan.

mysql> explain select * from t1,t2 where i=1 and t=i;
+-------+------+---------------+------+---------+-------+------+--------------------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+-------+------+---------------+------+---------+-------+------+--------------------------+
| t1    | ref  | i             | i    |       5 | const |    1 | Using where; Using index |
| t2    | ref  | t             | t    |       5 | const |    1 | Using where              |
+-------+------+---------------+------+---------+-------+------+--------------------------+
2 rows in set (0.00 sec)

mysql> explain select * from t1,t2 where i=2 and t=i;
+-------+------+---------------+------+---------+-------+------+--------------------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+-------+------+---------------+------+---------+-------+------+--------------------------+
| t1    | ref  | i             | i    |       5 | const |    1 | Using where; Using index |
| t2    | ref  | t             | t    |       5 | const |    1 | Using where              |
+-------+------+---------------+------+---------+-------+------+--------------------------+
2 rows in set (0.00 sec)

On MySQL 3.23.56:

mysql> explain select * from t1,t2 where i in (1,2) and t=i;
+-------+-------+---------------+------+---------+------+------+-------------------------+
| table | type  | possible_keys | key  | key_len | ref  | rows | Extra                   |
+-------+-------+---------------+------+---------+------+------+-------------------------+
| t1    | range | i             | i    |       5 | NULL |    2 | where used; Using index |
| t2    | ref   | t             | t    |       5 | t1.i |   10 | where used              |
+-------+-------+---------------+------+---------+------+------+-------------------------+
2 rows in set (0.00 sec)

As you see the real problem is range estimation based on constants is not done 
in case IN is used.  Statistics about cardinality is used instead.

This leads to the following behavior in MySQL 4.0:

+-------+-------+---------------+------+---------+------+------+--------------------------+
| table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+-------+-------+---------------+------+---------+------+------+--------------------------+
| t1    | index | i             | i    |       5 | NULL |    3 | Using where; Using index |
| t2    | ALL   | t             | NULL |    NULL | NULL |   29 | Using where              |
+-------+-------+---------------+------+---------+------+------+--------------------------+
2 rows in set (0.48 sec)

So MySQL 3.23 also does not find proper esimation for the range but it still selects index in this case.

How to repeat:
CREATE TABLE t1 (
  i int(11) default NULL,
  KEY i (i)
) TYPE=MyISAM;

--
-- Dumping data for table 't1'
--

INSERT INTO t1 VALUES (1),(2),(3);

--
-- Table structure for table 't2'

CREATE TABLE t2 (
  t int(11) default NULL,
  c char(1) default NULL,
  KEY t (t)
) TYPE=MyISAM;

--
-- Dumping data for table 't2'
--

INSERT INTO t2 VALUES (1,'a'),(2,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a'),(3,'a');
[26 Jun 2003 9:54] Igor Babaev
It's not a bug. The optimizer just does not propagate conjunctive equality. predicates. As a result it gets bad estimates in some queries with equi-joins.
I opened task #1000 to fix this inefficiency.