Bug #115855 Unexpected comparison result for utf8mb4_0900_ai_ci collation
Submitted: 18 Aug 2024 5:11 Modified: 19 Aug 2024 18:03
Reporter: Long Gu Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.4.2, 9.0.1, 8.0.39 OS:Any
Assigned to: CPU Architecture:Any

[18 Aug 2024 5:11] Long Gu
Description:
Collection `utf8mb4_0900_ai_ci` is based on UCA 9.0.0 weight keys (http://www.unicode.org/Public/UCA/9.0.0/allkeys.txt), but the following query returns some unexpected results:

```
SELECT _utf8mb4'-' COLLATE utf8mb4_0900_ai_ci < _utf8mb4')' COLLATE utf8mb4_0900_ai_ci; -- returns 1
SELECT _utf8mb4'-' COLLATE utf8mb4_0900_ai_ci > _utf8mb4')' COLLATE utf8mb4_0900_ai_ci; -- returns 0
```

According to the UCA 9.0.0 weight keys:

E0029 ; [.0000.0000.0000] # TAG RIGHT PARENTHESIS
E002D ; [.0000.0000.0000] # TAG HYPHEN-MINUS

Given that both '-' (HYPHEN-MINUS, U+002D) and ')' (RIGHT PARENTHESIS, U+0029) have the same weight according to UCA 9.0.0, the comparison of these two characters is expected to be comparison of 002D('-') and 0029(')'). Therefore, the expected result for the query should be:

```
SELECT _utf8mb4'-' COLLATE utf8mb4_0900_ai_ci < _utf8mb4')' COLLATE utf8mb4_0900_ai_ci; -- expect to return 0
SELECT _utf8mb4'-' COLLATE utf8mb4_0900_ai_ci > _utf8mb4')' COLLATE utf8mb4_0900_ai_ci; -- expect to return 1
```

How to repeat:
```
SELECT _utf8mb4'-' COLLATE utf8mb4_0900_ai_ci < _utf8mb4')' COLLATE utf8mb4_0900_ai_ci; -- returns 1
SELECT _utf8mb4'-' COLLATE utf8mb4_0900_ai_ci > _utf8mb4')' COLLATE utf8mb4_0900_ai_ci; -- returns 0
```
[19 Aug 2024 6:41] MySQL Verification Team
Hello Long Gu,

Thank you for the report and feedback.

regards,
Umesh
[19 Aug 2024 18:03] Bernt Marius Johnsen
The codepoints that are referred, U+E0029 and U+E002D, are Tag characters. See e.g. https://en.wikipedia.org/wiki/Tags_(Unicode_block)

The hyphen and the right parenthesis used in the mentioned queries are U+0029 and U+002D and have the weights

002D  ; [*020D.0020.0002] # HYPHEN-MINUS
0029  ; [*0326.0020.0002] # RIGHT PARENTHESIS

MySQL behaves correct wrt these characters.