Bug #75077 UnicodeDecodeError for binary data in SHOW ENGINE INNODB STATUS
Submitted: 2 Dec 2014 19:27 Modified: 9 Apr 2015 14:00
Reporter: Ed Dawley Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / Python Severity:S3 (Non-critical)
Version:2.0.2 OS:Any
Assigned to: CPU Architecture:Any

[2 Dec 2014 19:27] Ed Dawley
Description:
Tested under python 2.6.  Should affect 2.7 as well.

When querying for innodb engine status, a UnicodeDecodeError will be thrown if the output contains invalid UTF8 data. This appears to be caused by the fact that such a statement doesn't have any table/column charset defined.

Current workaround is to set use_unicode= False and manually call .decode("utf-8", "replace") on each text column of a resultset.  Obviously this is pretty inefficient since mysql connector is already transforming each column.

ed@dawley ~$ python utf8.py
Traceback (most recent call last):
  File "utf8.py", line 42, in <module>
    print cursor.fetchall()
  File "/usr/local/ed/packrat/lib/python2.6/site-packages/mysql/connector/cursor.py", line 827, in fetchall
    for row in rows]
  File "/usr/local/ed/packrat/lib/python2.6/site-packages/mysql/connector/conversion.py", line 390, in row_to_python
    result[i] = self._cache_field_types[field_type](row[i], field)
  File "/usr/local/ed/packrat/lib/python2.6/site-packages/mysql/connector/conversion.py", line 548, in _STRING_to_python
    return value.decode(self.charset)
  File "/usr/local/ed/packrat/lib64/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 1087: invalid continuation byte

How to repeat:
See attached test file

Suggested fix:
Have the connector either ignore or replace invalid UTF8 characters.
[2 Dec 2014 19:28] Ed Dawley
Creates a foreign key error with invalid utf8

Attachment: utf8.py (text/x-python-script), 1.24 KiB.

[2 Dec 2014 19:31] Ed Dawley
Entered incorrect version.
[3 Dec 2014 6:11] Umesh Shastry
Hello Ed D,

Thank you for the report and test case.

Thanks,
Umesh
[3 Dec 2014 6:11] Umesh Shastry
// On local test box
// Env details

[root@cluster-repo mysql-advanced-5.6.23]# rpm -qa|grep mysql-connector
mysql-connector-python-commercial-2.0.2-1.el6.noarch
[root@cluster-repo mysql-advanced-5.6.23]#
[root@cluster-repo mysql-advanced-5.6.23]# python --version
Python 2.6.6

// Used provided test case
[root@cluster-repo mysql-advanced-5.6.23]# python test.py
Traceback (most recent call last):
  File "test.py", line 42, in <module>
    print cursor.fetchall()
  File "/mysql/connector/cursor.py", line 827, in fetchall
  File "/mysql/connector/conversion.py", line 390, in row_to_python
  File "/mysql/connector/conversion.py", line 548, in _STRING_to_python
  File "/usr/lib64/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 1051: invalid continuation byte
[root@cluster-repo mysql-advanced-5.6.23]#
[20 Feb 2015 8:08] Peeyush Gupta
The problem seems to be with the server, when there are some characters in the show status command which can not be decoded, it should send some information about the kind of data it is sending. checking the data for characters of this kind is probably not a good idea.

another workaround can be to use raw type cursor for queries of this kind.
[21 Mar 2015 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[23 Mar 2015 15:35] Ed Dawley
The mysql client is able to handle this with a local character set of utf8.  The python bindings should as well.  Using raw mode is a poor workaround.