Bug #68640 | Optimizer does not handle group by properly in insert...select | ||
---|---|---|---|
Submitted: | 11 Mar 2013 20:01 | Modified: | 14 Mar 2013 13:44 |
Reporter: | Trey Raymond | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.5.29 | OS: | Linux (RHEL 6.3) |
Assigned to: | CPU Architecture: | Any | |
Tags: | GROUP BY, insert select |
[11 Mar 2013 20:01]
Trey Raymond
[12 Mar 2013 20:11]
Sveta Smirnova
Thank you for the report. SELECT really does not use temporary tables (at least for dummy data I used). You can run simple test: FLUSH STATUS; SELECT ... show status like '%tmp%'; But INSERT uses. I think it is necessary, because you insert into same table from which you select. If I use another table (create table audience_aggregate_hourly_age_gdr_city2 like audience_aggregate_hourly_age_gdr_city;) - no temporary table created. But EXPLAIN in version 5.6 shows temporary table is used for INSERT: explain insert ignore into audience_aggregate_hourly_age_gdr_city (hourID,campaignMap_campaignID,creativeMap_creativeID,age,gender,geoMap_WOE_ID,impressions,clicks,interactions,fallback_imps,fallback_clicks,richmedia_imps,richmedia_clicks,post_clicks_conv,post_imps_conv) select t1.hourID,t1.campaignMap_campaignID,t1.creativeMap_creativeID,t1.age,t1.gender, 0 as geoMap_WOE_ID, t1.impressions - COALESCE(sum(t2.impressions),0) as impressions, t1.clicks - COALESCE(sum(t2.clicks),0) as clicks, t1.interactions - COALESCE(sum(t2.interactions),0) as interactions, t1.fallback_imps - COALESCE(sum(t2.fallback_imps),0) as fallback_imps, t1.fallback_clicks - COALESCE(sum(t2.fallback_clicks),0) as fallback_clicks, t1.richmedia_imps - COALESCE(sum(t2.richmedia_imps),0) as richmedia_imps, t1.richmedia_clicks - COALESCE(sum(t2.richmedia_clicks),0) as richmedia_clicks, t1.post_clicks_conv - COALESCE(sum(t2.post_clicks_conv),0) as post_clicks_conv, t1.post_imps_conv - COALESCE(sum(t2.post_imps_conv),0) as post_imps_conv from audience_aggregate_hourly_age_gdr_city t1 left join audience_aggregate_hourly_age_gdr_city t2 force index (primary) on t1.campaignMap_campaignID=t2.campaignMap_campaignID and t1.age=t2.age and t1.gender=t2.gender and t1.hourID=t2.hourID and t1.creativeMap_creativeID=t2.creativeMap_creativeID and t2.geoMap_WOE_ID>1 and t2.hourID between 375288 and 375311 where t1.hourID between 375288 and 375311 and t1.geoMap_WOE_ID=1 group by t1.campaignMap_campaignID,t1.geoMap_WOE_ID,t1.age,t1.gender,t1.creativeMap_creativeID,t1.hourID having impressions > 0 or clicks > 0 or post_clicks_conv > 0 or post_imps_conv > 0 or interactions > 0 or fallback_imps > 0 or fallback_clicks > 0 or richmedia_imps > 0 or richmedia_clicks > 0 ; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 index PRIMARY,camp_crtv PRIMARY 15 NULL 2 Using where; Using temporary 1 SIMPLE t2 ref PRIMARY PRIMARY 3 test.t1.campaignMap_campaignID 1 Using where So this looks like not a bug for me.
[12 Mar 2013 20:49]
Trey Raymond
I could imagine a query needing one if we didn't have technologies like tx isolation/mvcc, but in this case I can't see how one is necessary, same table or not. Can you perhaps explain how writing the rows progressively rather than in a temp table would be beneficial here? Otherwise, looks like a bug.
[13 Mar 2013 18:35]
Sveta Smirnova
easier test case for MTR
Attachment: bug68640.test (application/octet-stream, text), 691 bytes.
[13 Mar 2013 18:55]
Trey Raymond
Yes, that's much simpler. However, in this case as well, that temp table is not necessary, but is created anyway, so it still looks like a bug. Any use case of a concurrent transaction doing something, at least any that I can think of, would be handled gracefully, temp table or not. Posting output here for people's reference: mysql> flush status; mysql> show status like 'Created_tmp_tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_tables | 0 | +-------------------------+-------+ mysql> select t11.f2, t11.f3, t12.f1 from t1 as t11 join t1 as t12; ... 25 rows in set (0.00 sec) mysql> show status like 'Created_tmp_tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_tables | 0 | +-------------------------+-------+ mysql> insert ignore into t1 select t11.f2, t11.f3, t12.f1 from t1 as t11 join t1 as t12; Query OK, 11 rows affected (0.00 sec) Records: 25 Duplicates: 14 Warnings: 0 mysql> show status like 'Created_tmp_tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_tables | 1 | +-------------------------+-------+
[14 Mar 2013 12:46]
Sveta Smirnova
Thank you for the feedback. Indeed, shorter test displays exactly same issue. We discussed this report internally with InnoDB and Optimizer teams and agreed this is not a bug. This is part of the discussion: ----<q>---- We do indeed have tx isoloation and mvcc, but in this case the whole query takes place in one transaction, meaning that the read cursor sees insertions immediately. It is indeed possible to have static cursors within a transaction (sort of having a sub-transaction that does not see the actions of the rest of the transaction). This could be implemented by copying to a temporary table (so no gain), or with some mvcc sub-transaction. ----</q>----- So it is not reasonable to implement this in different way.
[14 Mar 2013 13:44]
Trey Raymond
Okay, understood.