Description:
Query on information_schema.columns is returning byte string for two columns, but this is unexpected.
I am using MySQL Connector/Python v8.0.32 and connecting with charset=utf8mb4
OS is Centos 7
Python 3.10.4 (main, May 19 2022, 15:06:50) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
MySql server reports as "Server version: 8.0.19 MySQL Community Server - GPL"
From this snippet of code, the `DATA_TYPE` and `COLUMN_TYPE` returning byte string b'int'
Note names have been sanitized.
import mysql.connector
config = {
'user': 'user1',
'password': 'pass1',
'host': 'mysqlserver',
'database': 'foobar',
'charset': 'utf8mb4',
}
remote_conn1 = mysql.connector.connect(**config)
query = """SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, COLUMN_TYPE,
NUMERIC_PRECISION, NUMERIC_SCALE
FROM information_schema.columns
WHERE TABLE_NAME = 'bar' AND TABLE_SCHEMA = 'foo'
AND COLUMN_NAME = 'id'
ORDER BY ORDINAL_POSITION
"""
remote_cur.execute(query)
rows = remote_cur.fetchall()
rows
remote_conn1.close()
>>> rows
[('id', b'int', None, b'int', 10, 0)]
-------------------------------
Some misc troubleshooting, mysql CLI, etc.
MySQL [(none)]> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
+--------------------------+---------+
| Variable_name | Value |
+--------------------------+---------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
+--------------------------+---------+
7 rows in set (0.03 sec)
MySQL [(none)]> SHOW SESSION VARIABLES LIKE 'collation\_%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
+----------------------+--------------------+
3 rows in set (0.04 sec)
MySQL [(none)]> SHOW CREATE TABLE foo.bar;
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bar | CREATE TABLE `bar` (
`id` int NOT NULL AUTO_INCREMENT,
`device_id` int DEFAULT NULL,
`s_device_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=720477 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
SELECT DATA_TYPE
FROM information_schema.columns
WHERE TABLE_NAME = 'bar' AND TABLE_SCHEMA = 'foo'
AND COLUMN_NAME = 'id'
ORDER BY ORDINAL_POSITION;
+-----------+
| DATA_TYPE |
+-----------+
| int |
+-----------+
Note, if i re-ran python snippet but specified `use_pure=True` when connecting, will NOT see byte strings anymore.
Suspecting issue is isolated to the C implementation.
Not seeing any charset or collation that suggests binary or byte strings should be returned, so only expecting unicode strings.
How to repeat:
Create MySql table:
CREATE TABLE `foo.bar` (
`id` int NOT NULL AUTO_INCREMENT,
`device_id` int DEFAULT NULL,
`s_device_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=720477 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
Run python snippet and see byte string b'int' returned:
import mysql.connector
config = {
'user': 'user1',
'password': 'pass1',
'host': 'mysqlserver',
'database': 'foobar',
'charset': 'utf8mb4',
}
remote_conn1 = mysql.connector.connect(**config)
query = """SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, COLUMN_TYPE,
NUMERIC_PRECISION, NUMERIC_SCALE
FROM information_schema.columns
WHERE TABLE_NAME = 'bar' AND TABLE_SCHEMA = 'foo'
AND COLUMN_NAME = 'id'
ORDER BY ORDINAL_POSITION
"""
remote_cur.execute(query)
rows = remote_cur.fetchall()
rows
remote_conn1.close()
>>> rows
[('id', b'int', None, b'int', 10, 0)]
Suggested fix:
Query should return unicode strings instead of byte strings.