Bug #110422 Connector/Python throws UnicodeDecodeError on cursor.execute()
Submitted: 18 Mar 2023 12:05 Modified: 23 Jun 2023 20:13
Reporter: Friedrich Volkmann Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S2 (Serious)
Version:8.0.32 OS:Debian
Assigned to: CPU Architecture:Any

[18 Mar 2023 12:05] Friedrich Volkmann
Description:
Python version: 3.9

To narrow down the problem, I've simplified a complex query to:

sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, isnull(b.epitheton), b.epitheton separator ', ') from bestimmung b where b.besuch_id=besuch.id and b.rang in (1,2) and not exists(select 0 from bestimmung b2)),1,1 from besuch, ort where ort.id=besuch.ort_id or 1=1 order by besuch.id"
cursor.execute(sqlstr, bindparams)

This leads to:
  File "[some filepath].py", line 732, in <module>
    cursor.execute(sqlstr, bindparams)
  File "/home/.local/lib/python3.9/site-packages/mysql/connector/cursor_cext.py", line 330, in execute
    result = self._cnx.cmd_query(
  File "/home/.local/lib/python3.9/site-packages/mysql/connector/connection_cext.py", line 632, in cmd_query
    return self.fetch_eof_columns()
  File "/home/.local/lib/python3.9/site-packages/mysql/connector/connection_cext.py", line 514, in fetch_eof_columns
    fields = prep_stmt.fetch_fields() if prep_stmt else self._cmysql.fetch_fields()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 256: invalid start byte

Same for this:
sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, isnull(b.epitheton), b.epitheton separator ', ') from bestimmung b where b.besuch_id=besuch.id and b.rang in (1,2) and not exists(select 0 from bestimmung b2)),1,1 from besuch, ort where ort.id in (besuch.ort_id, 123456789) order by besuch.id"

But all of these work:
sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, isnull(b.epitheton), b.epitheton separator ', ') from bestimmung b where b.besuch_id=besuch.id and b.rang in (1,2) and not exists(select 0 from bestimmung b2)),1,1 from besuch, ort where ort.id=besuch.ort_id and 1=1 order by besuch.id"
sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, isnull(b.epitheton), b.epitheton separator ', ') from bestimmung b where b.besuch_id=besuch.id and b.rang in (1,2) and not exists(select 0 from bestimmung b2)),1,1 from besuch, ort where ort.id=besuch.ort_id order by besuch.id"
sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, isnull(b.epitheton), b.epitheton separator ', ') from bestimmung b where b.besuch_id=besuch.id and b.rang in (1,2)),1,1 from besuch, ort where ort.id=besuch.ort_id or 1=1 order by besuch.id"
sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, isnull(b.epitheton), b.epitheton separator ', ') from bestimmung b where b.besuch_id=besuch.id and not exists(select 0 from bestimmung b2)),1,1 from besuch, ort where ort.id=besuch.ort_id or 1=1 order by besuch.id"
sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, isnull(b.epitheton) separator ', ') from bestimmung b where b.besuch_id=besuch.id and b.rang in (1,2) and not exists(select 0 from bestimmung b2)),1,1 from besuch, ort where ort.id=besuch.ort_id or 1=1 order by besuch.id"
sqlstr = "select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,(select group_concat(distinct concat_ws(' ',taxon,ifnull(b.epitheton, 'sp.')) order by b.taxon, b.epitheton separator ', ') from bestimmung b where b.besuch_id=besuch.id and b.rang in (1,2) and not exists(select 0 from bestimmung b2)),1,1 from besuch, ort where ort.id=besuch.ort_id or 1=1 order by besuch.id"

connection_cext.py line 514 is:
fields = prep_stmt.fetch_fields() if prep_stmt else self._cmysql.fetch_fields()

prep_stmt is None, but further debugging is beyond my abilities.

Involved tables, as exported by phpMyAdmin:
CREATE TABLE besuch (
  id int(11) NOT NULL,
  user int(11) NOT NULL,
  angelegt_d datetime NOT NULL DEFAULT current_timestamp(),
  geaendert_d datetime NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  hoehle int(11) DEFAULT NULL,
  besuch_d date NOT NULL,
  kommentar varchar(10000) DEFAULT NULL,
  besuch_t time DEFAULT NULL,
  ort_id int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
CREATE TABLE ort (
  id int(11) NOT NULL,
  koosystem varchar(10) NOT NULL,
  rw double NOT NULL,
  hw double NOT NULL,
  kooabw int(6) DEFAULT NULL,
  sh int(4) DEFAULT NULL,
  land char(2) DEFAULT NULL,
  bl varchar(20) DEFAULT NULL,
  bezirk_id int(11) DEFAULT NULL,
  gemeinde_id int(11) DEFAULT NULL,
  beschreibung varchar(10000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  user int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
CREATE TABLE bestimmung (
  besuch_id int(11) NOT NULL,
  fund_i int(11) NOT NULL,
  i int(11) NOT NULL,
  user int(11) NOT NULL,
  angelegt_d datetime NOT NULL DEFAULT current_timestamp(),
  geaendert_d datetime NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  taxon varchar(30) NOT NULL,
  kommentar varchar(1000) DEFAULT NULL,
  epitheton varchar(30) DEFAULT NULL,
  ssp varchar(20) DEFAULT NULL,
  methode char(1) DEFAULT NULL,
  detail char(1) DEFAULT NULL,
  bestimmer varchar(30) DEFAULT NULL,
  rang int(1) NOT NULL,
  geschlecht char(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

I haven't set the severety to the highest level as the bug seems to occur in rare cases only, but for me personally it conforms to the definition of an S1 bug because I haven't found a workaround that doesn't change the result set.

How to repeat:
I assume that creating the tables and a python script with those select statements will repeat the outcomes.
[24 Mar 2023 12:29] MySQL Verification Team
Hello Friedrich Volkmann,

Thank you for the bug report.
Could you please provide repeatable test case (please make it as private if you prefer) to confirm this issue at our end?

Regards,
Ashwini Patil
[7 Apr 2023 12:37] Friedrich Volkmann
sql script for table creation:

CREATE TABLE bugtest_table1 (
  id int(11) NOT NULL,
  ort_id int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
CREATE TABLE bugtest_table2 (
  id int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
CREATE TABLE bugtest_table3 (
  besuch_id int(11) NOT NULL,
  taxon varchar(30) NOT NULL,
  epitheton varchar(30) DEFAULT NULL,
  rang int(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

python script (with login data obfuscated):

#!/usr/bin/python3
import mysql.connector
from mysql.connector import errorcode
CONF_DB_NAME='...'
CONF_DB_USER='...'
CONF_DB_PASS='...'
biodb = mysql.connector.connect(database=CONF_DB_NAME, user=CONF_DB_USER, password=CONF_DB_PASS)
cursor = biodb.cursor()
sqlstr = "select (select group_concat(distinct concat_ws(' ',taxon,ifnull(t3.epitheton, 'sp.')) order by t3.taxon, isnull(t3.epitheton), t3.epitheton separator ', ') from bugtest_table3 t3 where t3.besuch_id=bugtest_table1.id and t3.rang in (1,2) and not exists(select 0 from bugtest_table3 b3)),1,1 from bugtest_table1, bugtest_table2 where bugtest_table2.id=bugtest_table1.ort_id order by bugtest_table1.id"
cursor.execute(sqlstr)
cursor.close()
biodb.close()

output:

Traceback (most recent call last):
  File "/home/www/volki.at/fund/mysql-bug/./bugtest.py", line 10, in <module>
    cursor.execute(sqlstr)
  File "/home/.local/lib/python3.9/site-packages/mysql/connector/cursor_cext.py", line 330, in execute
    result = self._cnx.cmd_query(
  File "/home/.local/lib/python3.9/site-packages/mysql/connector/connection_cext.py", line 632, in cmd_query
    return self.fetch_eof_columns()
  File "/home/.local/lib/python3.9/site-packages/mysql/connector/connection_cext.py", line 514, in fetch_eof_columns
    fields = prep_stmt.fetch_fields() if prep_stmt else self._cmysql.fetch_fields()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 256: invalid start byte

I hope that's repeatable enough.

As a sidenote, if I use shorter table names t1,t2,t3 instead of bugtest_table1,bugtest_table2,bugtest_table3, then the error does not occur. That makes the problem seem like an internal string size issue.
[11 Apr 2023 11:43] MySQL Verification Team
Hello Friedrich Volkmann,

Thank you for the bug report.
Verified as described.

Regards,
Ashwini Patil
[23 Jun 2023 20:13] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Connector/Python 8.1.0 release, and here's the proposed changelog entry from the documentation team:

A UnicodeDecodeError error was raised when using a complex query that
produced a long field name alias.

Thank you for the bug report.