Bug #105904 CHAR_LENGTH treats unicode variation selector wrongly
Submitted: 15 Dec 2021 9:33 Modified: 22 Mar 2022 15:28
Reporter: Tsubasa Tanaka (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:ALL OS:CentOS (7.9)
Assigned to: CPU Architecture:x86

[15 Dec 2021 9:33] Tsubasa Tanaka
Description:
Unicode variation selector is treated 1 char by CHAR_LENGTH.

For example, x'E8919BF3A08481', "葛󠄁" is just 1 char, but CHAR_LENGTH returns 2.

https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char-length

How to repeat:
mysql80 8> CREATE TABLE t1 (val VARCHAR(32) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci);
Query OK, 0 rows affected (0.02 sec)

mysql80 8> INSERT INTO t1 VALUES (x'E8919BF3A08481');
Query OK, 1 row affected (0.02 sec)

mysql80 8> SELECT val, LENGTH(val), CHAR_LENGTH(val), HEX(SUBSTR(val, 1, 1)) AS left_hex, HEX(SUBSTR(val, 2, 1)) AS right_hex FROM t1;
+---------+-------------+------------------+----------+-----------+
| val     | LENGTH(val) | CHAR_LENGTH(val) | left_hex | right_hex |
+---------+-------------+------------------+----------+-----------+
| 葛󠄁      |           7 |                2 | E8919B   | F3A08481  |
+---------+-------------+------------------+----------+-----------+
1 row in set (0.00 sec)

----

x'F3A08481' is unicode-variation-selector-18.

- https://unicode-table.com/en/E0101/

Suggested fix:
CHAR_LENGTH should return 1.
[15 Dec 2021 9:48] MySQL Verification Team
Hello Tanaka-San,

Thank you for the report and feedback!

regards,
Umesh
[22 Mar 2022 15:28] Jon Stephens
Fixed in the 5.6, 5.7, and 8.0 versions of the Manual, in mysqldoc rev 72392; closed.

The changes should be online in a day or so.

Thanks!