Bug #82132 STRCMP not consistent with ORDER BY for utf8mb4_unicode_ci
Submitted: 6 Jul 2016 21:09 Modified: 2 Sep 2016 2:05
Reporter: Mark Derthick Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S1 (Critical)
Version:5.7.13, 5.6.31 OS:Windows (Using MySQL Workbench version 6.3)
Assigned to: CPU Architecture:Any
Tags: collate, order by, STRCMP, utf8mb4_unicode_ci

[6 Jul 2016 21:09] Mark Derthick
Description:
ORDER BY says @A comes first, but STRCMP says @B comes first.

How to repeat:
SET @A = CONVERT(CHAR(X'33fc' USING utf32) USING utf8mb4);
SET @B = CONVERT(CHAR(X'32' USING utf32) USING utf8mb4);

CREATE TABLE test (chr CHAR(1)) CHARSET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
INSERT INTO test VALUES(@A),(@B);

SELECT chr, HEX(chr) FROM test ORDER BY chr;
>> 
chr	HEX(chr)
㏼	E38FBC
2	32

SELECT STRCMP(@B, @A COLLATE utf8mb4_unicode_ci);
>> -1
[7 Jul 2016 7:28] MySQL Verification Team
Hello Mark Derthick,

Thank you for the report and test case.

Thanks,
Umesh
[8 Jul 2016 17:31] Mark Derthick
Additional information: If you insert all characters from X'0000' to X'FFFF' into test and order by chr, there are 470 adjacent chr pairs where STRCMP returns -1.  (X'33fc', X'0032') is just one example of the 470.
[2 Sep 2016 2:05] Paul DuBois
Posted by developer:
 
Noted in 8.0.1 changelogs.

STRCMP() for arguments with the utf8mb4_unicode_ci collation could
return results different from ORDER BY. STRCMP() has been corrected.