Bug #71909 Better support for raw/binary data
Submitted: 3 Mar 2014 11:28 Modified: 3 Nov 2014 15:14
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S4 (Feature request)
Version:1.0.10, 2.0.0-1 OS:Any
Assigned to: CPU Architecture:Any

[3 Mar 2014 11:28] Daniël van Eeden
Description:
I have a table create with:
CREATE TABLE t1 (value VARCHAR(100)) DEFAULT CHARSET latin1;
INSERT INTO t1 VALUES('Daniël'); -- Will be encoded as latin1/iso-8859-1
SET NAMES BINARY;
INSERT INTO t1 VALUES('Daniël'); -- Will be encoded as utf8

Then I try to create a Python script which detects wrongly encoded data, for that to work I need to fetch the data from the columns without any charset conversions happening.

First I tried:
conn.set_collation_charset('binary')
But doesn't work at all.

Then I tried:
conn.execute('SET NAMES BINARY')
That somewhat works.

In the end seems that setting raw=True on the connection does the trick.

How to repeat:
Try to fetch raw data from MySQL with Connector/Python

Suggested fix:
1. Make sure this is documented well.
http://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html
I would expect this to be mentioned in "Character Encoding"

https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnection-set-cha...
I would expect a note about raw/binary here.

2. Enhance the error messages for conn.set_collation_charset('binary')
Or make sure to not do any conversion when the charset is binary.
[25 Aug 2014 12:11] MySQL Verification Team
Hello Daniël,

Thank you for the feature request!
Verified as described.

Thanks,
Umesh
[25 Aug 2014 12:14] MySQL Verification Team
//

// tried below steps

import mysql.connector

cnx = mysql.connector.connect(user='root', database='test')
cursor = cnx.cursor()
query = ("SELECT value from t1")      

cursor.execute(query)
cursor.execute('SET NAMES BINARY')

for (value) in cursor:
  print("{0}".format(value))

cursor.close()
cnx.close()

>>> import datetime
>>> import mysql.connector
>>>
>>> cnx = mysql.connector.connect(user='root', database='test')
>>> cursor = cnx.cursor()
>>>
>>> query = ("SELECT value from t1")
>>>
...
>>> cursor.execute(query)
>>>
>>> for (value) in cursor:
...   print("{0}".format(value))
...
(u'Dani\xebl',)
(u'Dani\xc3\xabl',)
>>> cursor.close()
True
>>> cnx.close()
>>>
>>> import datetime
>>> import mysql.connector
>>>
>>> cnx = mysql.connector.connect(user='root', database='test')
>>> cursor = cnx.cursor()
>>> cnx.set_collation_charset('binary')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MySQLConnection' object has no attribute 'set_collation_charset'
>>> query = ("SELECT value from t1")
>>>
...
>>> cursor.execute(query)
>>>
>>> for (value) in cursor:
...   print("{0}".format(value))
...
(u'Dani\xebl',)
(u'Dani\xc3\xabl',)
>>> cursor.close()
True
>>> cnx.close()
>>>
[5 Sep 2014 9:07] Geert Vanderkelen
Posted by developer:
 
Workaround is to use raw=True, but we need to better handle the 'binary' character set.
[3 Nov 2014 15:14] Paul DuBois
Noted in 2.0.2, 2.1.1 changelogs.

When the character set was binary, character set conversion could
occur. Conversion is no longer done and binary data is returned as
is.