Bug #113034 Incorrect rows removal by DISTINCT with binary charset
Submitted: 9 Nov 2023 20:23 Modified: 10 Nov 2023 12:57
Reporter: Nathee Jaywaree Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.1.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: charset issue, distinct

[9 Nov 2023 20:23] Nathee Jaywaree
Description:
Same alphabets with different cases are being removed incorrectly by SELECT DISTINCT with binary charset.
It is expected that, with the binary charset, alphabets are compared case-sensitively.

How to repeat:
SET NAMES 'binary';
CREATE TABLE t0 (c0 TEXT);
INSERT INTO t0 VALUES('a');
INSERT INTO t0 VALUES('A');
SELECT * FROM t0; -- returns 0x61, 0x41
SELECT DISTINCT * FROM t0; -- returns 0x61;

Comparing directly is correct:
SELECT 'a'; -- returns 0x61
SELECT 'A'; -- returns 0x41
SELECT 'a' = 'A'; -- returns 0
[10 Nov 2023 12:57] MySQL Verification Team
Hi Mr. Jaywaree,

Thank you for your bug report.

However, this is not a bug.

You have simply made an error in defining the collation for your table t0.

We have defined it properly and this is what we got:

select * from t0;
+------+
| c0   |
+------+
| a    |
| A    |
+------+
2 rows in set (0.00 sec)

SELECT DISTINCT * FROM t0;
+------+
| c0   |
+------+
| a    |
| A    |
+------+
2 rows in set (0.07 sec)

You can learn how to set character sets and collations properly in MySQL in this chapter:

https://dev.mysql.com/doc/refman/8.2/en/charset.html

Not a bug.