Description:
The optimizer setting 'derived_merge=on' causes an inconsistency in the way that a uuid() guid is assigned to rows within a distinct sub-selected query.
This issue was the root cause of an error produced when procedures that run fine on MySQL 5.6 were migrated to 5.7 and 8.0 databases.
We are populating a large denormalized staging table with records destined for multiple parent - child levels of target tables. A primary key that is unique to the distinct set of columns used in a parent table is created in the select statement that populates the query.
This guid primary key value is generated through use of a uuid() function applied on a derived (sub-selected) table, which is itself generated as a distinct sub-select of business key values of the source data.
When working properly (as in version 5.6 and below), this method generates a unique guid value for each unique business key. However, version 5.7 / 8.0 derived_merge seems to break the isolation between the distinct sub-select and the outer query, causing a unique guid to be created for each row of the source query rather than for each distinct set of business values.
How to repeat:
The steps below output 4 distinct IDs on 5.6 or with derived_merge=off, but will produce 6 rows on 5.7 and 8.0 with derived_merge=on.
drop table if exists TEST_SUB_PROBLEM;
create table TEST_SUB_PROBLEM
(
UID integer PRIMARY KEY AUTO_INCREMENT
,BID VARCHAR(10)
,THING_NAME VARCHAR(100)
,OTHER_IFO VARCHAR(100)
);
insert into TEST_SUB_PROBLEM(BID,THING_NAME,OTHER_IFO)
values ('thing1','name1','look a chicken')
,( 'thing1','name1','look an airplane')
,('thing2','name2','look a mouse')
,('thing3','name3','look a taperecorder')
,('thing3','name3','look an explosion')
,('thing4','name4','look at the stars')
;
drop table if exists TEST_SUB_PROBLEM_2;
create table TEST_SUB_PROBLEM_2
as
select
TST.UID
,TST.BID
,TST.THING_NAME
,TST.OTHER_IFO
,vw2.DIST_UID
from TEST_SUB_PROBLEM TST
join (
select uuid() as DIST_UID, vw.*
from (
select DISTINCT BID, THING_NAME from TEST_SUB_PROBLEM
) vw
) vw2 on vw2.BID = TST.BID
;
-- This query should have 4 unique values, but has 6 with defined_merge=on
select distinct DIST_UID from TEST_SUB_PROBLEM_2;