Bug #94944 Optimistic utf8 decoding erroneously decodes binary data.
Submitted: 8 Apr 2019 19:07 Modified: 9 Apr 2019 11:57
Reporter: Marius _ Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / Python Severity:S2 (Serious)
Version:8.0.15, 8.0.11 OS:Any
Assigned to: CPU Architecture:Any

[8 Apr 2019 19:07] Marius _
Description:
Hi guys,

This bug is effectively reported in this github comment: https://github.com/mysql/mysql-connector-python/commit/413817a9025af99b453a44ebc864e80895c...

What happens is starting with Connector/Python version 8.0.11 all data type `BINARY` is being optimistically decoded to UTF-8 by the _STRING_to_python function. This leads to bizarre behaviors where you'd expect a `bytearray` to come back from a `BINARY` type column, but the data comes out as `unicode`. This happens in the edge case where a binary blob can be accidentally successfully decoded as UTF-8.

How to repeat:
Consider this ID: uuid.UUID('56283a26-2d44-11e9-bb8f-06773227025c')

Insert this into a column type `BINARY(16)`:

cursor.execute('insert into Table (my_id) values (%s)', (my_id.bytes,))

Then select it out:

cursor.execute('select my_id from Table')
my_id = cursor.fetchall()[0]['my_id']
print(type(my_id)) # --> `unicode`

Suggested fix:
The commit that introduced this behavior presumably fixes this bug: https://bugs.mysql.com/bug.php?id=83516

While it fixes that bug, it introduces this application-breaking behavior. Whereas before binary(16) data would come back as `bytearray`, now it's a mix of `bytearray` and `unicode` depending on whether the optimistic conversion accidentally succeeds.

The logic should be fixed so that not all binary data is optimistically decoded to strings. I'm not deeply familiar with the code but perhaps this decoding should only happen in the case of JSON and not any other binary data.
[9 Apr 2019 11:57] MySQL Verification Team
Hello Marius,

Thank you for the report.

regards,
Umesh
[1 Apr 19:00] William OLLIVIER
The issue is definitely present in the very last version of the Python connector (8.0.22 at the time this is written), and is affecting us randomly in our production systems.

Note that this is not happening for all UUIDs (see example by the OP).

This is definitely a regression, as it was definitely working fine with version 8.0.6. Unfortunately we cannot use version 8.0.6 anymore because it's not compatible with newer versions of SSL

This should really be looked into. This is the official MySQL connector, the bug was reported a year ago now, and still isn't solved.

I'll attach a more complete example to reproduce the issue, with additional SQLAlchemy errors that ensue.
[1 Apr 19:02] William OLLIVIER
Minimum example with SQLAlchemy showing the problem precisely

Attachment: mcre.py (text/x-python), 1.52 KiB.