Bug #111426 First connected to server v8, then any v5 connections fail with utf8mb4 charset
Submitted: 15 Jun 2023 2:14 Modified: 6 Sep 2023 21:23
Reporter: David L Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S3 (Non-critical)
Version:v8.0.32, 8.0.33 OS:CentOS
Assigned to: CPU Architecture:Any

[15 Jun 2023 2:14] David L
Description:
Issue occurs when connecting to two MySQL servers in series, a v8.0 and v5.5,  "server_v8" and "server_v5".

"server_v8" reports as "Server version: 8.0.19 MySQL Community Server - GPL" 

"server_v5" reports as "Server version: 5.5.51-log MySQL Community Server (GPL)" 

I am using MySQL Connector/Python v8.0.32 and connecting with charset=utf8mb4

I first connect successfully to "server_v8", then attempt connect to "server_v5", but it throws error:

```
>>> remote_conn2 = mysql.connector.connect(**config)
Traceback (most recent call last):
  File "/data/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/connection_cext.py", line 608, in cmd_query
    self._cmysql.query(
_mysql_connector.MySQLInterfaceError: Unknown collation: 'utf8mb4_0900_ai_ci'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/pooling.py", line 293, in connect
    return CMySQLConnection(*args, **kwargs)
  File "/data/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/connection_cext.py", line 118, in __init__
    self.connect(**kwargs)
  File "/data/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1182, in connect
    self._post_connection()
  File "/data/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1152, in _post_connection
    self.set_charset_collation(self._charset_id)
  File "/data/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/abstracts.py", line 1113, in set_charset_collation
    self._execute_query(f"SET NAMES '{charset_name}' COLLATE '{collation_name}'")
  File "/data/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/connection_cext.py", line 616, in cmd_query
    raise get_mysql_exception(
mysql.connector.errors.DatabaseError: 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_ci'
```

--------------------------------------------------

Some other scenarios where no error occurs:

If I connect first to "server_v5" instead, connection is successful, and a subsequent connection to "server_v8" is also successful.

If I connect to ONLY ONE server, "server_v8" or "server_v5", have no errors and connect successfully.

--------------------------------------------------

Misc troubleshooting and debugging:

I tried to debug as much as I could myself with `pdb`.  I see logic in the source code that will change variables if it detects MySql Server V8 that is listed below.

I have stepped through when I first connect to "server_v8", it changes cls.desc so `cls.desc == MYSQL_CHARACTER_SETS`; but subsequent connection to "server_v5" will already have `cls.desc == MYSQL_CHARACTER_SETS` instead of using `MYSQL_CHARACTER_SETS_57`.

#/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/constants.py
683:     # Use LTS character set as default
684:     desc: List[
685:         Optional[Tuple[str, str, bool]]
686:     ] = MYSQL_CHARACTER_SETS_57  # type: ignore[assignment]
687:     mysql_version: Tuple[int, ...] = (5, 7)

692:     @classmethod
693:     def set_mysql_version(cls, version: Tuple[int, ...]) -> None:
694:         """Set the MySQL major version and change the charset mapping if is 5.7.
695: 
696:         Args:
697:             version (tuple): MySQL version tuple.
698:         """ 
699:         cls.mysql_version = version[:2]
700:         if cls.mysql_version == (8, 0):
701:             cls.desc = MYSQL_CHARACTER_SETS

Debugged and inspected charset integer and collation used when doing the two connections in series.
First connection to "server_v8" uses charset integer 45 below, and collation is utf8mb4_general_ci
#/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/constants.py
776:                 info = cls.desc[charset]

#/pyenv/versions/3.10.4/lib/python3.10/site-packages/mysql/connector/abstracts.py
1113:         self._execute_query(f"SET NAMES '{charset_name}' COLLATE '{collation_name}'")
(Pdb) cls.desc[45]
('utf8mb4', 'utf8mb4_general_ci', False)

Second connection to "server_v5", charset integer is 255, and collation utf8mb4_0900_ai_ci
(Pdb) cls.desc[255]
('utf8mb4', 'utf8mb4_0900_ai_ci', True)

Debugged again, similar to above, but only one connection to "server_v5". charset integer 45 used with collation utf8mb4_general_ci.
255 is out of range in this debug session.

(Pdb) cls.desc[45]
('utf8mb4', 'utf8mb4_general_ci', True)
(Pdb) cls.desc[255]
*** IndexError: list index out of range

--------------------------------------------------

I am new to bug reports, please let me know if any questions.  

How to repeat:
import mysql.connector
config = {
  'user': 'user1',
  'password': 'pass1',
  'host': 'server_v8',
  'database': 'foobar',
  'charset': 'utf8mb4',
}
remote_conn1 = mysql.connector.connect(**config)
remote_conn1.close()

config = {
  'user': 'user2',
  'password': 'pass2',
  'host': 'server_v5',
  'database': 'foobar',
  'charset': 'utf8mb4',
}
remote_conn2 = mysql.connector.connect(**config)

Suggested fix:
No concrete suggested fix, but I am suspecting some variables changing due to connecting to v8 server, will breaks subsequent connections to any v5 server.
[15 Jun 2023 2:47] David L
Forgot to add 

OS is Centos7

Python 3.10.4 (main, May 19 2022, 15:06:50) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
[15 Jun 2023 7:46] MySQL Verification Team
Hello David L,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh
[6 Sep 2023 21:23] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Connector/Python 8.2.0 release, and here's the proposed changelog entry from the documentation team:

With multiple simultaneous connections, the character set information is
shared between connections which could be problematic if two connections
were to different major MySQL server versions, such as MySQL 5.x and MySQL
8.x.

Thank you for the bug report.