MySQL Bugs: #4019: 4.0.20 Optimiser makes a wrong decision

Bug #4019	4.0.20 Optimiser makes a wrong decision
Submitted:	5 Jun 2004 13:44	Modified:	20 Aug 2004 3:34
Reporter:	SINISA MILIVOJEVIC	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	4.0.20	OS:	Any (any)
Assigned to:	Igor Babaev	CPU Architecture:	Any

Description:
The optimiser makes wrong decision if IN is used instead of =.

This applies to IN (single_value) cases only, of course.

Tables in question are available from me.

How to repeat:

mysql> desc SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id IN  (27);
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| table     | type  | possible_keys                      | key
| key_len | ref           | rows   | Extra       |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| t1  | index | t1$c_id                      | t1$c_id
|       4 | NULL          | 108167 | Using index |
| t2 | ref   | t2$status_id,t2$c_id |
t2$c_id |       5 | t1.c_id |      1 | Using where |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id IN  (27);
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (5.80 sec)

It takes about 6 seconds to run. Now I'm going to change the in (27) to
= 27.

mysql> desc SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id =27;
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| table     | type | possible_keys                      | key
| key_len | ref            | rows  | Extra       |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| t2 | ref  | t2$status_id,t2$c_id |
t2$status_id |       1 | const          | 35413 |             |
| t1  | ref  | t1$c_id                      | t1$c_id
|       4 | t2.c_id |     1 | Using index |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id =27;
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (0.34 sec)

This is Regression bug. As tested by Matt Lord this worked well in MySQL 4.0.18

Igor, I can provide you with the dumpfile used to produce the tables which you see in the bug report.

It is on the support ftp site, the file is called rightnow_in_problem.dump.gz.

Igor,

Here is the original information passed on to me.  I apologize for not
having the dumpfile attached but it was beyone the 200K size limitation
imposed by bugs.

I have this strange optimizer issue that is similar to one I thought was
fixed earlier. In any case here is the issue. I simplified the query to
make an easier test case.

Notice the below query doesn't use the incidents$status_id index even
though status_id the only real limiter

mysql> desc SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id IN  (27);
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| table     | type  | possible_keys                      | key
| key_len | ref           | rows   | Extra       |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| contacts  | index | contacts$c_id                      | contacts$c_id
|       4 | NULL          | 108167 | Using index |
| incidents | ref   | incidents$status_id,incidents$c_id |
incidents$c_id |       5 | contacts.c_id |      1 | Using where |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id IN  (27);
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (5.80 sec)

It takes about 6 seconds to run. Now I'm going to change the in (27) to
= 27.

mysql> desc SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id =27;
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| table     | type | possible_keys                      | key
| key_len | ref            | rows  | Extra       |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| incidents | ref  | incidents$status_id,incidents$c_id |
incidents$status_id |       1 | const          | 35413 |             |
| contacts  | ref  | contacts$c_id                      | contacts$c_id
|       4 | incidents.c_id |     1 | Using index |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id =27;
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (0.34 sec)

Data to allow for the reproduction of the problem.

Attachment: rightnow_in_problem.dump.gz (application/x-gzip, text), 567 bytes.

Igor,

I just posted a dump file which has a very small dataset that still allows you to reproduce the problem.

Best Regards,

-matt-

Sorry, Matt, but I failed reproduce this problem. I had absolutely the same behavour for 4.0.18 abn 4.0.20. So did not observe any regression.

I also had two different plans for the queries above. This is because 4.0 does not replace 'a in (c)' by 'a=c'