Bug #69710 error with GBK string 赵孟頫
Submitted: 10 Jul 2013 11:37 Modified: 7 Nov 2013 15:09
Reporter: jim green green Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / Python Severity:S3 (Non-critical)
Version:1.0.10 OS:Mac OS X
Assigned to: Geert Vanderkelen CPU Architecture:Any

[10 Jul 2013 11:37] jim green green
Description:
Here is the python code:
>>> s = u'赵孟頫'.encode('gbk')
>>> s
'\xd5\xd4\xc3\xcf\xee\\'

The last byte of GBK string 赵孟頫 is \x5c, the same as a backslash. It cause a SQL syntax error.

mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''?????\\')' at line 4

My python code is:
# db is mysql.connector object
sql = '''
    INSERT INTO scraped_products(
        site_prd_id,site_id,brand)
    VALUES(
        %(site_prd_id)s,%(site_id)s,%(brand)s)
    '''
dat = {
    'site_prd_id' : 'test',
    'site_id' : 1,

    'brand' : u'赵孟頫'.encode('gbk'),
}
self.db.ping(True, 3, 1)
self.db.cursor().execute(sql, dat)

How to repeat:
I dont know...
[10 Jul 2013 11:52] Geert Vanderkelen
HI Jim,

Thanks for reporting this bug. Good I'm monitoring stackoverflow. :)

Below is a smaller test case, and how to repeat.

-Geert

# -*- coding: utf-8 -*-

import mysql.connector

cnx = mysql.connector.connect(database='test', user='root',
                              charset='gbk', use_unicode=False)
cur = cnx.cursor()

cur.execute("DROP TABLE IF EXISTS gbktest")
table = (
    "CREATE TABLE gbktest ("
    "id INT AUTO_INCREMENT KEY, "
    "c1 VARCHAR(40)"
    ") CHARACTER SET 'gbk'"
)
cur.execute(table)

data = {
    'c1' : u'赵孟頫'.encode('gbk'),
}
cur.execute("INSERT INTO gbktest (c1) VALUES (%(c1)s)", data)
cur.execute("DROP TABLE IF EXISTS scraped_products")
[7 Nov 2013 15:09] Paul Dubois
Noted in 1.1.3 changelog.

There was a problem saving data containing the backslash character or 
0x5c using multi-byte character sets such as sjis, big5, or gbk. To
handle this, there is a new HexLiteral type. When a backslash is
found in such as sjis, big5, or gbk data, the string is sent as a
hexadecimal literal to MySQL.