MySQL Bugs: #16700: optimizer marks subquery DEPENDENT though it's not

Bug #16700	optimizer marks subquery DEPENDENT though it's not
Submitted:	21 Jan 2006 13:13	Modified:	21 Jan 2006 13:51
Reporter:	Kiryll Mirnenko	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	5.0.18	OS:	FreeBSD (FreeBSD 6.0-RELEASE-p3)
Assigned to:		CPU Architecture:	Any

Description:
Consider having the table:

CREATE TABLE `prices` ( 
`id_product` int(11) NOT NULL default '0', 
`id_shop` mediumint(9) NOT NULL default '0', 
`seton` date NOT NULL default '0000-00-00', 
`price` decimal(7,2) NOT NULL default '0.00', 
PRIMARY KEY (`id_product`,`id_shop`,`seton`), 
KEY `id_product` (`id_product`), 
KEY `id_shop` (`id_shop`), 
KEY `id_shop_2` (`id_shop`, `seton`), 
KEY `id_product_2` (`id_product`,`price`), 
KEY `seton` (`seton`), 
KEY `price` (`price`) 
) ENGINE=InnoDB 

(let it be a price list for a set of shops offering (optionally) intersecting set of products; every price change for every product is stored, zero price means a shop stops selling a product. I need to query a minimal price for a specific product (120) for a date month ago).

The query is: 

SELECT MIN(p.price) FROM prices p WHERE (p.id_shop, p.seton) IN (SELECT pg.id_shop, MAX(pg.seton) FROM prices pg WHERE pg.id_product = 120 AND pg.seton <= DATE_SUB(CURDATE(), INTERVAL 30 DAY) AND pg.price <> 0 GROUP BY pg.id_shop); 

This query seems to run too long though both parts are fast enough.

if you run EXPLAIN 

+----+--------------------+-------+-------+---------------------------------------------+---------+---------+-------+-------+--------------------------+
| id | select_type        | table | type  | possible_keys                               | key     | key_len | ref   | rows  | Extra                    |
+----+--------------------+-------+-------+---------------------------------------------+---------+---------+-------+-------+--------------------------+
|  1 | PRIMARY            | p     | index | NULL                                        | price   | 4       | NULL  | 56162 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | pg    | ref   | PRIMARY,id_product,id_product_2,seton,price | PRIMARY | 4       | const |     1 | Using where              |
+----+--------------------+-------+-------+---------------------------------------------+---------+---------+-------+-------+--------------------------+

you'll see 3 strange things: 
1) inner query is marked as "DEPENDENT SUBQUERY" though it's not 
2) no key is used to optimize WHERE in outer query at all 
3) inner query key choise seems to be always PRIMARY regardless of number of price records for each product (when adding price and seton keys is wise) 

The first optimizer assumtion increases execution times dramatically as inner query is ran for every row of the table (i have ~60,000) despite the fact inner query resultset is of 10 rows and 
the outer query can simply match the rows via aprop indexes

Trying FORCE INDEX and rewriting it using JOINs gives no performance boost (EXPLAIN shows no difference in choises)

How to repeat:
CREATE TABLE `prices` ( 
`id_product` int(11) NOT NULL default '0', 
`id_shop` mediumint(9) NOT NULL default '0', 
`seton` date NOT NULL default '0000-00-00', 
`price` decimal(7,2) NOT NULL default '0.00', 
PRIMARY KEY (`id_product`,`id_shop`,`seton`), 
KEY `id_product` (`id_product`), 
KEY `id_shop` (`id_shop`), 
KEY `id_shop_2` (`id_shop`, `seton`), 
KEY `id_product_2` (`id_product`,`price`), 
KEY `seton` (`seton`), 
KEY `price` (`price`) 
) ENGINE=InnoDB;

EXPLAIN SELECT MIN(p.price) FROM prices p WHERE (p.id_shop, p.seton) IN (SELECT pg.id_shop, MAX(pg.seton) FROM prices pg WHERE pg.id_product = 120 AND pg.seton <= DATE_SUB(CURDATE(), INTERVAL 30 DAY) AND pg.price <> 0 GROUP BY pg.id_shop);

SELECT MIN(p.price) FROM prices p WHERE (p.id_shop, p.seton) IN (SELECT pg.id_shop, MAX(pg.seton) FROM prices pg WHERE pg.id_product = 120 AND pg.seton <= DATE_SUB(CURDATE(), INTERVAL 30 DAY) AND pg.price <> 0 GROUP BY pg.id_shop);

^C

Suggested fix:
none so far

Thank you for a problem report. It is a well-know problem (improper optimization of subqueries) that will be fixed eventually. Please, look at bug #12106, bug #4460 and many other similar bug reports.