Bug #114832 Collation set on connect is ignored
Submitted: 1 May 7:21 Modified: 24 Jun 22:34
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S1 (Critical)
Version:8.4.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: cext, collation, collation, Collations

[1 May 7:21] Daniël van Eeden
Description:
The collation on `mysql.connector.connect(collation=..., ...)` isn't sent to the server correctly.

Both hosts are on Fedora 40. One is with the MySQL YUM repo and 8.4.0 and the other is with the versions that came with the OS (without the YUM repo).

### HostA
$ /tmp/t.py
utf8mb4_ja_0900_as_cs : Connection established, collation_connection is utf8mb4_0900_ai_ci
utf8mb4_bin           : Connection established, collation_connection is utf8mb4_0900_ai_ci
utf8mb4_general_ci    : Connection established, collation_connection is utf8mb4_0900_ai_ci
binary                : Connection established, collation_connection is b'binary'
foobar                : Connection failed: Collation 'foobar' unknown
$ rpm -q mysql-connector-python3 mysql-community-libs
mysql-connector-python3-8.4.0-1.fc40.x86_64
mysql-community-libs-8.4.0-10.fc40.x86_64

### HostB
# ./t.py 
utf8mb4_ja_0900_as_cs : Connection established, collation_connection is utf8mb4_ja_0900_as_cs
utf8mb4_bin           : Connection established, collation_connection is utf8mb4_bin
utf8mb4_general_ci    : Connection established, collation_connection is utf8mb4_general_ci
binary                : Connection established, collation_connection is bytearray(b'binary')
foobar                : Connection failed: Collation 'foobar' unknown.
# rpm -q mysql-connector-python3 mysql-common
mysql-connector-python3-8.0.21-12.fc40.noarch
mysql-common-8.0.36-3.fc40.noarch

See also:
- https://bugs.mysql.com/bug.php?id=114818
- https://github.com/mysql/mysql-connector-python

How to repeat:
#!/bin/python3 -t
import mysql.connector

collations = [
    "utf8mb4_ja_0900_as_cs",
    "utf8mb4_bin",
    "utf8mb4_general_ci",
    "binary",
    "foobar",
]

for collation in collations:
    try:
        c = mysql.connector.connect(
            host="127.0.0.1",
            port=3306,
            user="test",
            password="test",
            collation=collation,
            ssl_disabled=True,
        )
        cur = c.cursor()
        cur.execute("SHOW SESSION VARIABLES LIKE 'collation_connection'")
        for col in cur:
            print(
                f"{collation:22s}: Connection established, collation_connection is {col[1]}"
            )
        cur.close()
        c.close()
    except mysql.connector.errors.ProgrammingError as e:
        print(f"{collation:22s}: Connection failed: {e}")
[1 May 7:58] Daniël van Eeden
Note that `cmd_change_user()` shows the same issue where it is sending the right collation with 8.0 and not with 8.4.

------------------------------------------------
#!/bin/python3 -t
import mysql.connector

c = mysql.connector.connect(
    host="127.0.0.1",
    port=3306,
    user="test",
    password="test",
    collation="utf8mb4_ja_0900_as_cs",
    ssl_disabled=True,
)
cur = c.cursor()
cur.execute("SHOW SESSION VARIABLES LIKE 'collation_connection'")
for col in cur:
    print(
        f"Connection established, collation_connection is {col[1]}"
    )
cur.close()
c.cmd_change_user(username="test",password="test", charset=303)
c.close()
[1 May 9:10] MySQL Verification Team
Hello Daniël,

Thank you for the report and feedback.

regards,
Umesh
[2 May 15:28] Daniël van Eeden
This bug happens when running with MySQL Connector/Python 8.4.0 installed via the RPM. This doesn't happen if I install from the git repo with `python3 setup.py install --user`.

When using `use_pure=True` this also doesn't happen. So it looks like this issue is in the C Extension.
[6 May 9:21] Daniël van Eeden
Looks like this sends 
SET NAMES 'utf8mb4' COLLATE 'utf8mb4_ja_0900_as_cs'

https://github.com/mysql/mysql-connector-python/blob/dc71cebe53615110ff00dbb8b629f5457ece1...

An later on this sends
SET NAMES utf8mb4

https://github.com/mysql/mysql-connector-python/blob/dc71cebe53615110ff00dbb8b629f5457ece1...

- Sending a `SET NAMES` with only a charset and no collation would set the collation back to the default for that charset.
- Sending a `SET NAMES` twice adds to connection setup time.
- Sending a `SET NAMES` should not be needed at all if the collation is <=255 (fits in 1 byte) and set during the handshake. Whether this should work for >255 is unclear.

When using `mysql.connector.connect(collation="gb18030_unicode_520_ci", use_pure=True, ...)`:
- Handshake with collation 45 (utf8mb4_general_ci)
- SET NAMES 'gb18030' COLLATE 'gb18030_unicode_520_ci'
- collation_connection is gb18030_unicode_520_ci (Correct)

When using `mysql.connector.connect(collation="gb18030_unicode_520_ci", use_pure=False, ...)`:
- Handshake with collation 255 (utf8mb4_0900_ai_ci)
- SET NAMES 'gb18030' COLLATE 'gb18030_unicode_520_ci'
- SET NAMES gb18030
- collation_connection is gb18030_chinese_ci (Incorrect, default for gb18030)
[23 May 13:57] Souma Kanti Ghosh
Posted by developer:
 
Hello Daniël,

Thank you for the contribution.

Regards,
Souma Kanti Ghosh
[24 Jun 22:34] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Connector/Python 9.0.0 release, and here's the proposed changelog entry from the documentation team:

With the C extension, the collation connection option was ignored.

Thank you for the bug report.