Bug #85518 Distinct operations on temp tables allocate too little memory for sort keys
Submitted: 17 Mar 2017 14:11 Modified: 20 Mar 2017 16:35
Reporter: Steinar Gunderson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any

[17 Mar 2017 14:11] Steinar Gunderson
Description:
remove_dup_with_hash_index(), which is used for DISTINCT of temp tables in certain cases, assumes that the sort key of all fields is smaller than or equal to the record's length in memory. (It also forgets to send sort_length() through strnxfrmlen.) This is obviously not correct for UCA collations.

How to repeat:
SET sql_mode='';
CREATE TABLE t1(c1 int, c2 VARCHAR(1) COLLATE utf8mb4_0900_as_cs);
INSERT INTO t1 VALUES (1, 'a');
INSERT INTO t1 VALUES (2, 'A');
SELECT DISTINCT c2 FROM t1 GROUP BY c1;
DROP TABLE t1;

In debug mode, this will cause an assertion failure. In release mode, it will silently think that a == A, because the field allocated for c2's sort key is too small to hold all of it (missing strnxfrmlen call).

Suggested fix:
Calculate the field width correctly. This needs to be done before calling the function, in order to get the comparison against the sort buffer size right.
[20 Mar 2017 16:35] Paul DuBois
Posted by developer:
 
Noted in 8.0.2 changelog.

DISTINCT operations on temporary tables could produce incorrect
results due to allocation of too-small comparison keys.