Bug #80047 Data rows lost after adding DISTINCT with substr() in select list
Submitted: 19 Jan 2016 9:59 Modified: 26 Jan 2016 18:12
Reporter: Su Dylan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.7.8, 5.5.48, 5.6.28, 5.7.10 OS:Any
Assigned to: CPU Architecture:Any

[19 Jan 2016 9:59] Su Dylan
Description:
Output:
=====
mysql> create table t1(c1 char(30));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values('aaa'),('bbb'),('ccc');
ct substQuery OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select substr(c1, 1, 2147483647 + 1) from t1;
+-------------------------------+
| substr(c1, 1, 2147483647 + 1) |
+-------------------------------+
| aaa                           |
| bbb                           |
| ccc                           |
+-------------------------------+
3 rows in set (0.00 sec)

mysql> select distinct substr(c1, 1, 2147483647 + 1) from t1;
+-------------------------------+
| substr(c1, 1, 2147483647 + 1) |
+-------------------------------+
|                               |
+-------------------------------+
1 row in set (0.00 sec)

mysql>
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.8-rc  |
+-----------+
1 row in set (0.00 sec)

Problem:
=====
3 rows are expected to return, which are not ''.

How to repeat:

drop table if exists t1;
create table t1(c1 char(30));
insert into t1 values('aaa'),('bbb'),('ccc');
select substr(c1, 1, 2147483647 + 1) from t1;
select distinct substr(c1, 1, 2147483647 + 1) from t1;

Suggested fix:
3 rows are returned, not ''.
[19 Jan 2016 10:23] MySQL Verification Team
Hello Su Dylan,

Thank you for the report and test case.
Observed that 5.5.48, 5.6.28 and 5.7.10 are affected.

Thanks,
Umesh
[26 Jan 2016 18:12] Paul DuBois
Noted in 5.8.0 changelog.

SELECT DISTINCT SUBSTR() could incorrectly discard values as
duplicates for large position or length arguments.
[27 Jan 2016 9:59] Tor Didriksen
Code inspection showed that left() and right() had the same problem.
Also fixed.
[27 Jan 2016 15:13] Paul DuBois
Revised changelog entry:

SELECT DISTINCT SUBSTR() could incorrectly discard values as
duplicates for large position or length arguments. The same issue
also affected LEFT() and RIGHT().