Bug #107157 Contribution by Tencent: Collation compare handling is confused in Innobase
Submitted: 29 Apr 2022 3:49 Modified: 29 Apr 2022 4:58
Reporter: George qiao Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.22, 8.0.29 OS:Any
Assigned to: CPU Architecture:Any
Tags: collation innodb, server

[29 Apr 2022 3:49] George qiao
Description:
The following case gives a confuse result:

```
mysql> create table t1 (a char(10), primary key(a)) charset=utf8mb4 collate=utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.03 sec)

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `a` char(10) NOT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> insert into t1 values ('2');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values ('2 ');
ERROR 1062 (23000): Duplicate entry '2' for key 't1.PRIMARY'

mysql> select * from t1 where a = '2 ';
Empty set (0.00 sec)
```

The collation of the table is utf8mb4_0900_ai_ci, which is a NO_PAD collation.

The problem is, in InnoDB, the result of the comparison between '2' and '2 ' (a space followed the character 2) is equal (refer to the function innobase_mysql_cmp). Thus, the second insert statement reported "Duplicate entry".

However, the select statement returned an empty set, which means, in MySQL server, the result of the comparison between '2' and '2 ' (a space followed the character 2) is not equal (refer to the function Arg_comparator::compare_string).

How to repeat:
create table t1 (a char(10), primary key(a)) charset=utf8mb4 collate=utf8mb4_0900_ai_ci;
insert into t1 values ('2');
insert into t1 values ('2 ');
select * from t1 where a = '2 ';

Suggested fix:
From the user's perspective, '2' and '2 ' are different in SELECT but not in INSERT. 
The function `innobase_mysql_cmp()` works as the NO_PAD collation described in official document(Trailing Space Handling in Comparisons) are recommended.
https://dev.mysql.com/doc/refman/8.0/en/charset-binary-collations.html?
[29 Apr 2022 4:58] MySQL Verification Team
Hello George qiao,

Thank you for the report and test case.

regards,
Umesh