Bug #67306 Connector/Python : String concatenation in the library causes UnicodeDecode Erro
Submitted: 20 Oct 2012 10:58 Modified: 1 Dec 2012 0:35
Reporter: Prasanna Santhanam Email Updates:
Status: Closed Impact on me:
Category:Connector / Python Severity:S2 (Serious)
Version:1.0.7 OS:Any
Assigned to: Geert Vanderkelen CPU Architecture:Any
Tags: connector, MySQL, python

[20 Oct 2012 10:58] Prasanna Santhanam
When usernames are passed in as unicode to the connector, connection attempts will fail because string concatenation of mixed-charset types causes UnicodeDecodeErrors:

55     def _prepare_auth(self, usr, pwd, db, flags, seed):
 56         """Prepare elements of the authentication packet"""
 57         if usr is not None and len(usr) > 0:
 58             _username = usr + '\x00'
 59         else:
 60             _username = '\x00'
 62         if pwd is not None and len(pwd) > 0:
 63             _password = utils.int1store(20) +\
 64                 self._scramble_password(pwd,seed)
 65         else:
 66             _password = '\x00'
 68         if db is not None and len(db):
 69             _database = db + '\x00'
 70         else:
 71             _database = '\x00'
 73         return (_username, _password, _database)

In the above if _username is given in unicode and not a simple string, the concatenation when auth packet is formed below fails because database type and scrambled text are of str type

 75     def make_auth(self, seed, username=None, password=None, database=None,
 82         auth = self._prepare_auth(username, password, database,
 83                                   client_flags, seed)
 84         return utils.int4store(client_flags) +\
 85                utils.int4store(max_allowed_packet) +\
 86                utils.int1store(charset) +\
 87                '\x00' * 23 + auth[0] + auth[1] + auth[2]

One encounters something similar to:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa2 in position 1: ordinal not in range(128)

How to repeat:
Supply unicode text for username when preparing a connection.
[1 Nov 2012 11:50] Geert Vanderkelen
Thanks for reporting this problem. We indeed need to decode username and password when they are unicode objects.

There is no problem in Python v3 since strings are unicode anyway (hurray for Python 3!)
[1 Dec 2012 0:35] John Russell
Added to changelog for 1.0.8: 

When a username or password was passed in as Unicode to
Connector/Python, connection attempts failed with UnicodeDecodeError
exceptions due to string concatenation of mixed-charset types. This
issue affected programs running under Python 2, and did not affect
Python 3.