Bug #112293 Wrong result if non-breaking space is contained in varchar
Submitted: 7 Sep 2023 14:04 Modified: 8 Sep 2023 6:51
Reporter: quan chen (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0, 8.0.34, 5.7.43 OS:Any
Assigned to: CPU Architecture:Any
Tags: character_set, collation

[7 Sep 2023 14:04] quan chen
Description:
CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into t values(1,'my_test '),(2,'my_test');
Note that the space in the first row is non-breaking space.

When execute the query:
select b from t where b='my_test';

We got result:
```
mysql> select b from t where b='my_test';
+------------+
| b          |
+------------+
| my_test   |
| my_test    |
+------------+
2 rows in set (0.00 sec)
```
the space is ignored, and 'my_test 'is considered as same as 'my_test';

But if execute the query with aggr:
select b,count(*) from t group by b;
We got result:
```
mysql> select b,count(*) from t group by b;
+------------+----------+
| b          | count(*) |
+------------+----------+
| my_test   |        1 |
| my_test    |        1 |
+------------+----------+
2 rows in set (0.00 sec)
```

the space is not be ignored, and 'my_test 'is considered different with 'my_test', so there are two rows in result.
Normally, the behavior in groupby and compare should be the same, in the other words, we should got only one row in the following query.

The reason of this problem is that the non-breaking space is not ignored when calc hash value for temptable, but is ignored when compare with 'my_test' in where condition.

How to repeat:
CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into t values(1,'my_test '),(2,'my_test');
set internal_tmp_mem_storage_engine = TempTable;
select b,count(*) from t group by b;
[8 Sep 2023 2:40] quan chen
We also tested other type of whitespace-characters like IDEOGRAPHIC SPACE, EM SPACE, and had the same problem too. From the code perspective(skip_trailing_space() in m_string.h), except for ascii spaces, all other spaces would have issues.
[8 Sep 2023 6:51] MySQL Verification Team
Hello quan chen,

Thank you for the report and test case.
Verified as described.

regards,
Umesh