Bug #4019 4.0.20 Optimiser makes a wrong decision
Submitted: 5 Jun 2004 13:44 Modified: 20 Aug 2004 3:34
Reporter: SINISA MILIVOJEVIC Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.20 OS:Any (any)
Assigned to: Igor Babaev CPU Architecture:Any

[5 Jun 2004 13:44] SINISA MILIVOJEVIC
Description:
The optimiser makes wrong decision if IN is used instead of =.

This applies to IN (single_value) cases only, of course.

Tables in question are available from me.

How to repeat:

mysql> desc SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id IN  (27);
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| table     | type  | possible_keys                      | key
| key_len | ref           | rows   | Extra       |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| t1  | index | t1$c_id                      | t1$c_id
|       4 | NULL          | 108167 | Using index |
| t2 | ref   | t2$status_id,t2$c_id |
t2$c_id |       5 | t1.c_id |      1 | Using where |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id IN  (27);
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (5.80 sec)

It takes about 6 seconds to run. Now I'm going to change the in (27) to
= 27.

mysql> desc SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id =27;
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| table     | type | possible_keys                      | key
| key_len | ref            | rows  | Extra       |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| t2 | ref  | t2$status_id,t2$c_id |
t2$status_id |       1 | const          | 35413 |             |
| t1  | ref  | t1$c_id                      | t1$c_id
|       4 | t2.c_id |     1 | Using index |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM t1 JOIN t2 ON
t1.c_id=t2.c_id WHERE t2.status_id =27;
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (0.34 sec)
[10 Jun 2004 8:01] Peter Zaitsev
This is Regression bug. As tested by Matt Lord this worked well in MySQL 4.0.18
[10 Jun 2004 16:36] Matthew Lord
Igor, I can provide you with the dumpfile used to produce the tables which you see in the bug report.

It is on the support ftp site, the file is called rightnow_in_problem.dump.gz.
[10 Jun 2004 20:04] Matthew Lord
Igor,

Here is the original information passed on to me.  I apologize for not
having the dumpfile attached but it was beyone the 200K size limitation
imposed by bugs.

I have this strange optimizer issue that is similar to one I thought was
fixed earlier. In any case here is the issue. I simplified the query to
make an easier test case.

Notice the below query doesn't use the incidents$status_id index even
though status_id the only real limiter

mysql> desc SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id IN  (27);
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| table     | type  | possible_keys                      | key
| key_len | ref           | rows   | Extra       |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
| contacts  | index | contacts$c_id                      | contacts$c_id
|       4 | NULL          | 108167 | Using index |
| incidents | ref   | incidents$status_id,incidents$c_id |
incidents$c_id |       5 | contacts.c_id |      1 | Using where |
+-----------+-------+------------------------------------+--------------
--+---------+---------------+--------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id IN  (27);
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (5.80 sec)

It takes about 6 seconds to run. Now I'm going to change the in (27) to
= 27.

mysql> desc SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id =27;
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| table     | type | possible_keys                      | key
| key_len | ref            | rows  | Extra       |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
| incidents | ref  | incidents$status_id,incidents$c_id |
incidents$status_id |       1 | const          | 35413 |             |
| contacts  | ref  | contacts$c_id                      | contacts$c_id
|       4 | incidents.c_id |     1 | Using index |
+-----------+------+------------------------------------+---------------
------+---------+----------------+-------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT count(*) FROM contacts JOIN incidents ON
contacts.c_id=incidents.c_id WHERE incidents.status_id =27;
+----------+
| count(*) |
+----------+
|    40056 |
+----------+
1 row in set (0.34 sec)
[18 Jun 2004 22:31] Matthew Lord
Data to allow for the reproduction of the problem.

Attachment: rightnow_in_problem.dump.gz (application/x-gzip, text), 567 bytes.

[18 Jun 2004 22:32] Matthew Lord
Igor,

I just posted a dump file which has a very small dataset that still allows you to reproduce the problem.

Best Regards,

-matt-
[20 Aug 2004 3:33] Igor Babaev
Sorry, Matt, but I failed reproduce this problem. I had absolutely the same behavour for 4.0.18 abn 4.0.20. So did not observe any regression.

I also had two different plans for the queries above. This is because 4.0 does not replace 'a in (c)' by 'a=c'