Bug #91418 derived_merge causing inccorect results with distinct subquery and uuid()
Submitted: 26 Jun 2018 7:08 Modified: 26 Jun 2018 7:33
Reporter: Chris Tyler Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7,8.0.11, 5.7.22 OS:Any
Assigned to: CPU Architecture:Any
Tags: derived_merge, distinct, subquery, UUID()

[26 Jun 2018 7:08] Chris Tyler
Description:
The optimizer setting 'derived_merge=on' causes an inconsistency in the way that a uuid() guid is assigned to rows within a distinct sub-selected query.

This issue was the root cause of an error produced when procedures that run fine on MySQL 5.6 were migrated to 5.7 and 8.0 databases.

We are populating a large denormalized staging table with records destined for multiple parent - child levels of target tables.  A primary key that is unique to the distinct set of columns used in a parent table is created in the select statement that populates the query.

This guid primary key value is generated through use of a uuid() function applied on a derived (sub-selected) table, which is itself generated as a distinct sub-select of business key values of the source data.  
When working properly (as in version 5.6 and below), this method generates a unique guid value for each unique business key.  However, version 5.7 / 8.0 derived_merge seems to break the isolation between the distinct sub-select and the outer query, causing a unique guid to be created for each row of the source query rather than for each distinct set of business values. 

How to repeat:
The steps below output 4 distinct IDs on 5.6 or with derived_merge=off, but will produce 6 rows on 5.7  and 8.0 with derived_merge=on.

drop table if exists TEST_SUB_PROBLEM;
create table TEST_SUB_PROBLEM
(
UID integer PRIMARY KEY AUTO_INCREMENT
,BID VARCHAR(10)
,THING_NAME VARCHAR(100)
,OTHER_IFO VARCHAR(100)
);

insert into TEST_SUB_PROBLEM(BID,THING_NAME,OTHER_IFO)
values ('thing1','name1','look a chicken')
,( 'thing1','name1','look an airplane')
,('thing2','name2','look a mouse')
,('thing3','name3','look a taperecorder')
,('thing3','name3','look an explosion')
,('thing4','name4','look at the stars')
;

drop table if exists TEST_SUB_PROBLEM_2;
create table TEST_SUB_PROBLEM_2
as
select 
TST.UID
,TST.BID
,TST.THING_NAME
,TST.OTHER_IFO
,vw2.DIST_UID
from TEST_SUB_PROBLEM TST
join (
    select uuid() as DIST_UID, vw.*
    from (
    select DISTINCT BID, THING_NAME from TEST_SUB_PROBLEM
    ) vw
    ) vw2 on vw2.BID = TST.BID
    ;
    
-- This query should have 4 unique values, but has 6 with defined_merge=on
select distinct DIST_UID from TEST_SUB_PROBLEM_2;
[26 Jun 2018 7:33] Umesh Shastry
Hello Chris Tyler,

Thank you for the report and test case.

Thanks,
Umesh
[7 Sep 2018 13:32] Sveta Smirnova
This report is most likely duplicate of bug #85117: error happens because code which generates a value in the subquery moved to the outer query.