Bug #111427 Query on information_schema.columns is returning byte string instead of unicode
Submitted: 15 Jun 2023 2:56 Modified: 30 Aug 2023 22:40
Reporter: David L Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S3 (Non-critical)
Version:v8.0.32, 8.0.33 OS:CentOS
Assigned to: CPU Architecture:Any

[15 Jun 2023 2:56] David L
Description:

Query on information_schema.columns is returning byte string for two columns, but this is unexpected.

I am using MySQL Connector/Python v8.0.32 and connecting with charset=utf8mb4

OS is Centos 7
Python 3.10.4 (main, May 19 2022, 15:06:50) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux

MySql server reports as "Server version: 8.0.19 MySQL Community Server - GPL"

From this snippet of code, the `DATA_TYPE` and `COLUMN_TYPE` returning byte string b'int'

Note names have been sanitized.

import mysql.connector
config = {
  'user': 'user1',
  'password': 'pass1',
  'host': 'mysqlserver',
  'database': 'foobar',
  'charset': 'utf8mb4',
}
remote_conn1 = mysql.connector.connect(**config)

query = """SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, COLUMN_TYPE, 
                NUMERIC_PRECISION, NUMERIC_SCALE
           FROM information_schema.columns
           WHERE TABLE_NAME = 'bar' AND TABLE_SCHEMA = 'foo'
           AND COLUMN_NAME = 'id'
           ORDER BY ORDINAL_POSITION
        """ 
remote_cur.execute(query)
rows = remote_cur.fetchall()
rows
remote_conn1.close()

>>> rows
[('id', b'int', None, b'int', 10, 0)]

-------------------------------

Some misc troubleshooting,  mysql CLI, etc.

MySQL [(none)]> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8    |
| character_set_connection | utf8    |
| character_set_database   | utf8mb4 |
| character_set_filesystem | binary  |
| character_set_results    | utf8    |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8    |
+--------------------------+---------+
7 rows in set (0.03 sec)

MySQL [(none)]> SHOW SESSION VARIABLES LIKE 'collation\_%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8_general_ci    |
| collation_database   | utf8mb4_0900_ai_ci |
| collation_server     | utf8mb4_0900_ai_ci |
+----------------------+--------------------+
3 rows in set (0.04 sec)

MySQL [(none)]> SHOW CREATE TABLE foo.bar;
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                     | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bar                       | CREATE TABLE `bar` (
  `id` int NOT NULL AUTO_INCREMENT,
  `device_id` int DEFAULT NULL,
  `s_device_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=720477 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |

SELECT DATA_TYPE
           FROM information_schema.columns
           WHERE TABLE_NAME = 'bar' AND TABLE_SCHEMA = 'foo'
           AND COLUMN_NAME = 'id'
           ORDER BY ORDINAL_POSITION;
+-----------+
| DATA_TYPE |
+-----------+
| int       |
+-----------+

Note, if i re-ran python snippet but specified `use_pure=True` when connecting, will NOT see byte strings anymore.
Suspecting issue is isolated to the C implementation.

Not seeing any charset or collation that suggests binary or byte strings should be returned, so only expecting unicode strings.

How to repeat:

Create MySql table:

CREATE TABLE `foo.bar` (
  `id` int NOT NULL AUTO_INCREMENT,
  `device_id` int DEFAULT NULL,
  `s_device_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=720477 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |

Run python snippet and see byte string b'int' returned:

import mysql.connector
config = {
  'user': 'user1',
  'password': 'pass1',
  'host': 'mysqlserver',
  'database': 'foobar',
  'charset': 'utf8mb4',
}
remote_conn1 = mysql.connector.connect(**config)

query = """SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, COLUMN_TYPE, 
                NUMERIC_PRECISION, NUMERIC_SCALE
           FROM information_schema.columns
           WHERE TABLE_NAME = 'bar' AND TABLE_SCHEMA = 'foo'
           AND COLUMN_NAME = 'id'
           ORDER BY ORDINAL_POSITION
        """ 
remote_cur.execute(query)
rows = remote_cur.fetchall()
rows
remote_conn1.close()

>>> rows
[('id', b'int', None, b'int', 10, 0)]

Suggested fix:
Query should return unicode strings instead of byte strings.
[15 Jun 2023 8:21] MySQL Verification Team
Hello David L,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh
[15 Jun 2023 11:43] Nuno Mariz
Posted by developer:
 
Bug verified when using the C extension (default).
Here's the suggested fix:

diff --git a/src/mysql_capi.c b/src/mysql_capi.c
index 0f50f85e..5dad3aab 100644
--- a/src/mysql_capi.c
+++ b/src/mysql_capi.c
@@ -2762,7 +2762,8 @@ MySQL_fetch_row(MySQL *self)
             PyTuple_SET_ITEM(result_row, i, mytopy_bit(row[i], field_lengths[i]));
         }
         else if (field_type == MYSQL_TYPE_BLOB) {
-            if ((field_flags & BLOB_FLAG) && (field_flags & BINARY_FLAG)) {
+            if ((field_flags & BLOB_FLAG) &&
+                (field_flags & BINARY_FLAG) && field_charsetnr == 63) {
                 value = PyBytes_FromStringAndSize(row[i], field_lengths[i]);
             }
             else {
[30 Aug 2023 22:40] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Connector/Python 8.2.0 release, and here's the proposed changelog entry from the documentation team:

With the C extension, querying information_schema.columns returned byte
strings instead of unicode strings in the result values.

Thank you for the bug report.