MySQL Bugs: #84030: Duplicate entry error when doing GROUP BY with UTF collation

Bug #84030	Duplicate entry error when doing GROUP BY with UTF collation
Submitted:	1 Dec 2016 18:40	Modified:	1 Jan 2017 19:21
Reporter:	Trevor Lohrbeer	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S3 (Non-critical)
Version:	5.6.34	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
Performing a SELECT...GROUP BY query with an aggregation function on a large table with a collation of `utf8mb4_unicode_520_ci` causes a "duplicate entry" error to occur.  A duplicate entry error shouldn't be occurring on a SELECT query.

How to repeat:
A query that causes this error is:

   SELECT COUNT(*) FROM reports GROUP BY `query` 

An example of a specific error is:

   Error Code: 1062. Duplicate entry 'mens gloves' for key 'group_key'

This error occurs even if 'mens gloves' occurs in only a single row in the `query` column. So there is no duplicate data even in the source table.

The query succeeds if the collation is explicitly set to a different collation:

   SELECT SUM(impressions) FROM reports GROUP BY `query` COLLATE utf8mb4_general_ci

It also succeeds if a WHERE clause is added to filter the result size:

   SELECT COUNT(*) FROM reports GROUP BY `query` WHERE `query` > 'o'

Depending on the size of the table, the query can also succeed if the size of the temp table is increased:

   SET SESSION max_heap_table_size=536870912; 
   SET SESSION tmp_table_size=536870912;

Thus the issue appears related specifically to the `utf8mb4_unicode_520_ci` collation on a table with a good number of rows (around 70-80K rows is when the error starts appearing in our live data).

This bug appears related to bug #62755 and bug #58081, but is distinct from those bugs because the issue is occurring due to the collation set on the table and/or the query.

Suggested fix:
A workaround is to set the collation of the table or column to `utf8mb4_general_ci` or to explicitly specify the collation in the query.

A less reliable workaround is to adjust the temp table size. But I suspect this just increases the size limit at which the error occurs rather than getting rid of the error entirely.

Please provide a dump file (create table plus insert data) if you wish private using the tab Files. Thanks.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".