Bug #116667 | New connection default collation_connection is inconsistent | ||
---|---|---|---|
Submitted: | 14 Nov 2024 14:28 | Modified: | 14 Nov 2024 20:15 |
Reporter: | xilin Chen | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 8.0.39 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[14 Nov 2024 14:28]
xilin Chen
[14 Nov 2024 14:53]
MySQL Verification Team
HI Mr. Chen, Thank you for your bug report. However, this is not a bug. There are a great number of collations within UTFmb4 character set. Hence, that is why you should be explicit in your commands and things will work as designed: mysql> set global collation_connection= utf8mb4_0900_ai_ci; Query OK, 0 rows affected (0.01 sec) mysql> quit reconnect: mysql> show variables like "collation_connection"; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_0900_ai_ci | +----------------------+--------------------+ 1 row in set (0.11 sec) Not a bug.
[14 Nov 2024 15:36]
xilin Chen
restart mysqld and reconnect, my result is: mysql> show global variables like 'collation_connection'; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_general_ci | +----------------------+--------------------+ mysql> show variables like 'collation_connection'; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_0900_ai_ci | +----------------------+--------------------+ even if I actively set global collation_connection = 'utf8mb4_general_ci', the result is the same after reconnecting. Because I found that global_system_variables.character_set_client will be set to my_charset_utf8mb4_general_ci after MySQL is started. (gdb) p global_system_variables.character_set_client $1 = (const CHARSET_INFO *) 0x4ce1200 <my_charset_utf8mb4_general_ci> If I actively change character_set_client to utf8mb4, set global character_set_client=utf8mb4, then the value of the system variable becomes my_charset_utf8mb4_0900_ai_ci (gdb) p global_system_variables.character_set_client $1 = (const CHARSET_INFO *) 0x4b8f2e0 <my_charset_utf8mb4_0900_ai_ci> This causes the collation_connection of the new connection to be inconsistent
[14 Nov 2024 15:46]
MySQL Verification Team
Hi Mr. Chen, As we said, you have to be explicit in your server configuration. There is no need to set anything manually, when you can use defaults file to set it any way you like it ...... Not a bug.
[14 Nov 2024 15:49]
MySQL Verification Team
There are so many ways on how to do it: https://dev.mysql.com/doc/refman/9.1/en/charset-connection.html
[14 Nov 2024 15:54]
xilin Chen
Hi collation_connection does not support direct settings in the configuration file, only collation_server can be set. My demonstration has collation_server=utf8mb4_general_ci and mysql will use this variable to set collation_connection when it starts, but if global character_set_client=utf8mb4 is set at runtime, global_system_variables.character_set_client will become my_charset_utf8mb4_0900_ai_ci
[14 Nov 2024 15:58]
MySQL Verification Team
Hi, You are correct, but that is a default behaviour and forced behaviour. Simply, all other collations within utfmb4 have many bugs. utf8mb4_0900_ai_ci is the one without bugs. Hence, this is intended behaviour. Not a bug.
[14 Nov 2024 20:15]
xilin Chen
Hi 8.0.39 sql/sql_connect.cc:400 bool thd_init_client_charset(THD *thd, uint cs_number) { ... if (!opt_character_set_client_handshake || !(cs = get_charset(cs_number, MYF(0))) || !my_strcasecmp(&my_charset_latin1, global_system_variables.character_set_client->m_coll_name, cs->m_coll_name)) { ... I think there may be one more condition: global_system_variables.collation_connection->m_coll_name is the same as global_system_variables.character_set_client->m_coll_name
[15 Nov 2024 10:48]
MySQL Verification Team
Hi, It is for the same reason ..... Simply, all other collations within utfmb4 have many bugs. utf8mb4_0900_ai_ci is the one without bugs. Since Unicode-related code comes with OS library, we can not fix that.