Bug #89149 SELECT DISTINCT on multiple TEXT columns is slow
Submitted: 8 Jan 2018 23:41 Modified: 16 Jan 2018 17:07
Reporter: Leif Neve Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.7.20 OS:Red Hat (7.4)
Assigned to: CPU Architecture:Any
Tags: 5.5, 5.7, regression

[8 Jan 2018 23:41] Leif Neve
Description:
I have a table with two TEXT columns with around 65K rows. The following SELECT statement takes < 1 second in MySQL 5.5.58 but takes ~40 minutes in MySQL 5.7.20:

SELECT DISTINCT author,sort_author from itemsbyauthor;

How to repeat:
1) Create a database on a MySQL 5.5.58 server: 

mysql> create database sper

2) Source the attached file sper.sql to load the table itemsbyauthor: 

mysql> source sper.sql

3) Then run the command: 

mysql> SELECT DISTINCT author,sort_author from itemsbyauthor; 

Note that it takes < 1 second to run the command.

4) Repeat the above steps for a MySQL 5.7.20 server.

Note that it now takes ~40 minutes to run the command.

Suggested fix:
I have no suggestions for a fix.
[8 Jan 2018 23:49] Leif Neve
mysqldump output to use to recreate the itemsbyauthor table

Attachment: sper.sql.gz (application/x-gzip, text), 1.73 MiB.

[9 Jan 2018 7:53] MySQL Verification Team
Hello Sam Neve,

Thank you for the report and test case.

Thanks,
Umesh
[16 Jan 2018 17:07] Leif Neve
Further testing reveals that this is not a problem in MySQL 5.6.39.