Description:
This is our query to find "ACTIVE" countries. This query with subselect and distinct takes very long time to process.
select * from country where
fips_code in (
select distinct(fips_primary_country_code) from place
);
74 rows in set (2 min 11.61 sec)
------------------------------------
However if you break apart the two queries then it processes fine.
mysql> select distinct(fips_primary_country_code) from place;
74 rows in set (0.41 sec)
select * from country where fips_code in ('DA', 'GM', ... 'AS')
74 rows in set (0.00 sec)
------------------------------------
Here is the explain plan of the original query
mysql> explain select * from country where
-> fips_code in (
-> select distinct(fips_primary_country_code) from place
-> );
+----+--------------------+---------+------+---------------+------+---------+------+-------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+------+---------------+------+---------+------+-------+------------------------------+
| 1 | PRIMARY | country | ALL | NULL | NULL | NULL | NULL | 261 | Using where |
| 2 | DEPENDENT SUBQUERY | place | ALL | NULL | NULL | NULL | NULL | 96718 | Using where; Using temporary |
+----+--------------------+---------+------+---------------+------+---------+------+-------+------------------------------+
2 rows in set (0.00 sec)
How to repeat:
See above
Suggested fix:
Is there some way to hint that we are returning a very small subset of original table so that we generate a temporary table for the top level select?