Bug #116667 New connection default collation_connection is inconsistent
Submitted: 14 Nov 2024 14:28 Modified: 14 Nov 2024 20:15
Reporter: xilin Chen Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.39 OS:Any
Assigned to: CPU Architecture:Any

[14 Nov 2024 14:28] xilin Chen
Description:
The server's collation_connection is utf8mb4_general_ci. When character_set_client is changed to utf8mb4, the collation of global_system_variables.character_set_client is utf8mb4_0900_ai_ci. At this time, the MySQL client will specify collation as 255 when establishing a new connection. It is found to be consistent with global_system_variables.character_set_client. Therefore, the default collation_connection for new connections will directly use the server's global_system_variables.collation_connection, and this value is utf8mb4_general_ci, which is inconsistent with the 255 specified by the client. It will return to normal after restarting mysqld

How to repeat:
1. Configuration file:
character_set_server=utf8mb4
collation_server=utf8mb4_general_ci

2. Start mysql and login

3. view variables
mysql> show global variables like 'character_set_client';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_client | utf8mb4 |
+----------------------+---------+

mysql> show variables like 'character_set_client';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_client | utf8mb4 |
+----------------------+---------+

mysql> show global variables like 'collation_connection';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
+----------------------+--------------------+

mysql> show variables like 'collation_connection';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
+----------------------+--------------------+

4. Change character_set_client to utf8 first, then to utf8mb4
set global character_set_client = utf8;
set global character_set_client = utf8mb4;
show global variables like 'character_set_client';

5. Exit mysql and login again to view the collation_connection of the current connection

6. At this time, it is found that the collation_connection is inconsistent with the first login
mysql> show variables like 'collation_connection';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
+----------------------+--------------------+
[14 Nov 2024 14:53] MySQL Verification Team
HI Mr. Chen,

Thank you for your bug report.

However, this is not a bug.

There  are a great number of collations within UTFmb4 character set.

Hence, that is why you should be explicit in your commands and things will work as designed:

mysql> set global collation_connection=  utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.01 sec)

mysql> quit

reconnect:

mysql> show variables like "collation_connection";
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
+----------------------+--------------------+
1 row in set (0.11 sec)

Not a bug.
[14 Nov 2024 15:36] xilin Chen
restart mysqld and reconnect, my result is:

mysql> show global variables like 'collation_connection';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
+----------------------+--------------------+

mysql> show variables like 'collation_connection';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
+----------------------+--------------------+

even if I actively set global collation_connection = 'utf8mb4_general_ci', the result is the same after reconnecting.

Because I found that global_system_variables.character_set_client will be set to my_charset_utf8mb4_general_ci after MySQL is started.

(gdb) p global_system_variables.character_set_client
$1 = (const CHARSET_INFO *) 0x4ce1200 <my_charset_utf8mb4_general_ci>

If I actively change character_set_client to utf8mb4, set global character_set_client=utf8mb4, then the value of the system variable becomes my_charset_utf8mb4_0900_ai_ci

(gdb) p global_system_variables.character_set_client
$1 = (const CHARSET_INFO *) 0x4b8f2e0 <my_charset_utf8mb4_0900_ai_ci>

This causes the collation_connection of the new connection to be inconsistent
[14 Nov 2024 15:46] MySQL Verification Team
Hi Mr. Chen,

As we said, you have to be explicit in your server configuration.

There is no need to set anything manually, when you can use defaults file to set it any way you like it ......

Not a bug.
[14 Nov 2024 15:49] MySQL Verification Team
There are so many ways on how to do it:

https://dev.mysql.com/doc/refman/9.1/en/charset-connection.html
[14 Nov 2024 15:54] xilin Chen
Hi

collation_connection does not support direct settings in the configuration file, only collation_server can be set. My demonstration has collation_server=utf8mb4_general_ci

and mysql will use this variable to set collation_connection when it starts, but if global character_set_client=utf8mb4 is set at runtime, global_system_variables.character_set_client will become my_charset_utf8mb4_0900_ai_ci
[14 Nov 2024 15:58] MySQL Verification Team
Hi,

You are correct, but that is a default behaviour and forced behaviour.

Simply, all other collations within utfmb4 have many bugs. utf8mb4_0900_ai_ci is the one without bugs.

Hence, this is intended behaviour.

Not a bug.
[14 Nov 2024 20:15] xilin Chen
Hi

8.0.39 sql/sql_connect.cc:400
bool thd_init_client_charset(THD *thd, uint cs_number) {
...
  if (!opt_character_set_client_handshake ||
      !(cs = get_charset(cs_number, MYF(0))) ||
      !my_strcasecmp(&my_charset_latin1,
                     global_system_variables.character_set_client->m_coll_name,
                     cs->m_coll_name)) {
...

I think there may be one more condition: global_system_variables.collation_connection->m_coll_name is the same as global_system_variables.character_set_client->m_coll_name
[15 Nov 2024 10:48] MySQL Verification Team
Hi,

It is for the same reason .....

Simply, all other collations within utfmb4 have many bugs. utf8mb4_0900_ai_ci is the one without bugs.

Since Unicode-related code comes with OS library, we can not fix that.