Description:
We have found it to be necessary to provide hints to the query optimizer in order to reduce the execution times of joins that involve merge tables. The difference in execution times is by orders of magnitude, typically from 10 minutes down to 1 or 2 seconds. Our speculation is that the fact that the join in question involves merge tables may be an issue for the optimizer.
Specifics:
Table definitions:
CREATE TABLE `call` (
`call_key` bigint(20) unsigned NOT NULL default '0',
`alternate_db_key` int(10) unsigned NOT NULL default '0',
`start_time` int(11) NOT NULL default '-1',
`start_time_ext` int(11) NOT NULL default '-1',
`time_key` int(11) NOT NULL default '-1',
`end_time` int(11) NOT NULL default '-1',
`end_time_ext` int(11) NOT NULL default '-1',
.
.
.
PRIMARY KEY (`call_key`),
KEY `ak1` (`alternate_db_key`),
KEY `ie1` (`start_time`,`start_time_ext`),
KEY `ie2` (`end_time`,`end_time_ext`)
) TYPE=MyISAM
CREATE TABLE `cdr_media` (
`call_key` bigint(20) unsigned NOT NULL default '0',
`seq` smallint(5) unsigned NOT NULL default '0',
`probe_id` int(10) unsigned NOT NULL default '0',
`start_time` int(11) NOT NULL default '-1',
`start_time_ext` int(11) NOT NULL default '-1',
`time_key` int(11) NOT NULL default '-1',
`end_time` int(11) NOT NULL default '-1',
`end_time_ext` int(11) NOT NULL default '-1',
.
.
.
`packets_received` int(10) unsigned NOT NULL default '0',
.
.
.
PRIMARY KEY (`call_key`,`seq`),
KEY `ie2` (`probe_id`),
KEY `ie3` (`start_time`,`start_time_ext`)
) TYPE=MRG_MyISAM INSERT_METHOD=LAST UNION=(cdr_media_p9,cdr_media_p10,cdr_media_p11,cdr_media_p12,cdr_media_p13,cdr_media_p14,
cdr_media_p15,cdr_media_p16,cdr_media_p17,cdr_media_p18,cdr_media_p19,cdr_media_p20,cdr_media_p21,
cdr_media_p22,cdr_media_p23,cdr_media_p24,cdr_media_p25,cdr_media_p26,cdr_media_p27,cdr_media_p28)
Table sizes:
mysql> SELECT COUNT(*) FROM call;
+----------+
| COUNT(*) |
+----------+
| 6478115 |
+----------+
mysql> SELECT COUNT(*) FROM cdr_media;
+----------+
| COUNT(*) |
+----------+
| 7849908 |
+----------+
Query:
SELECT MIN(cdr.packets_received), MAX(cdr_packets_received)
FROM call c, cdr_media cdr
WHERE c.end_time BETWEEN 1108573200 AND 1108576799
AND cdr.call_key = c.call_key;
The ANALYSIS procedure had been run on all tables involved in the join (i.e on 'call', 'cdr_media_p9', ... 'cdr_media_p28').
Query plan produced by optimizer:
+-------+--------+---------------+---------+---------+--------------+---------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+---------+---------+--------------+---------+-------------+
| cdr | ALL | PRIMARY | NULL | NULL | NULL | 7848994 | |
| c | eq_ref | PRIMARY,ie2 | PRIMARY | 8 | cdr.call_key | 1 | Using where |
+-------+--------+---------------+---------+---------+--------------+---------+-------------+
Modified query:
SELECT MIN(cdr.packets_received), MAX(cdr_packets_received)
FROM call c USE INDEX(ie2), cdr_media cdr
WHERE c.end_time BETWEEN 1108573200 AND 1108576799
AND cdr.call_key = c.call_key;
Modified plan:
+-------+-------+---------------+---------+---------+------------+-------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+-------+---------------+---------+---------+------------+-------+-------------+
| c | range | ie2 | ie2 | 4 | NULL | 10050 | Using where |
| cdr | ref | PRIMARY | PRIMARY | 8 | c.call_key | 78492 | |
+-------+-------+---------------+---------+---------+------------+-------+-------------+
So, the question is: why did the optimizer elect to do a complete table scan of table cdr (all 7.8 million rows), combined with a random access of the call table (c) for each of those rows? The hint to use the index call.ie2 reversed the order of table access, with 10 thousand c table rows driving random accesses into the cdr table.
Could it be that the optimizer judged that because the cdr table contained 20 merged tables in its union, each random access into it (as in the modified plan) would require searching 20 tables, and would thus be prohibitive? If so, it was evidently spectacularly wrong in its judgement!
How to repeat:
Running the query with the large data sets we have gives the reported results consistently. If needed, test data can be provided.
Suggested fix:
None.