Bug #29853 Cannot use character encoding with connection parameters
Submitted: 17 Jul 2007 18:56 Modified: 31 Aug 2007 14:19
Reporter: Nathan Sharp Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.0.6 OS:Any (Tested on Ubuntu Feisty and Windows XP)
Assigned to: CPU Architecture:Any
Tags: charset encoding unicode username user

[17 Jul 2007 18:56] Nathan Sharp
Description:
If your username, password, or database name contain any extended characters, such as Japanese characters, you will not be able to make the connection using the Java based connector.  I have tried this against:
Ubuntu Feisty and Windows XP
MySQL Connector/J 5.0.6 and 3.1.12
MySQL Server v5.0.41 and v5.0.19

How to repeat:
Create a database and add a user with Japanese characters as the username (e.g. grant all on mydb.* to 'ユーザ名'@'localhost' identfied by 'bogus').  You can correctly use this account from the command line or from the MySQL Query Browser.  From a Java program, issue the following commands:

a = "ユーザ名";
d = new com.mysql.jdbc.Driver();
c = java.sql.DriverManager.getConnection("jdbc:mysql://localhost/mydb", a, "bogus");

You will receive:

java.sql.SQLException: Access denied for user 'ユã'@'localhost' (using password: YES)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:946)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2934)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:870)
        at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3333)
        at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1232)
        at com.mysql.jdbc.Connection.createNewIO(Connection.java:2749)
        at com.mysql.jdbc.Connection.<init>(Connection.java:1553)
        at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:285)
        at java.sql.DriverManager.getConnection(DriverManager.java:525)
        at java.sql.DriverManager.getConnection(DriverManager.java:171)
...

Suggested fix:
The problem is the about the same as bug #18086, although I doubt the fix for that issue will work here.  The code as it stands in 5.0.6 just falls back on the default encoding of the Java platform to send the username, password, and database names to the server, regardless of how the server is configured.  The fix for 18086 hard-codes that to Cp1252, which won't help with Japanese characters.  The code needs to query the server for the proper encoding to use in the same fashion that it does after the connection has been established.
[17 Jul 2007 20:33] Mark Matthews
The question I have is what should be the correct behavior? At the point where we can set the character set for authentication, we don't know what the server is using. We could either use the "encoding" parameter passed in the JDBC URL (if there is one), or we could use UTF-8, except that some characters outside the BMP aren't handled by MySQL's implementation of UTF-8.

I don't have a preference for either one, but UTF-8 seems more seamless except for the corner cases when the characters end up as 4-byte sequences.
[17 Jul 2007 21:18] Nathan Sharp
Preferentially I'd like to see it do whatever the command line tool and MySQL Query Browser do.  Blanket using utf-8 will likely solve it, though, and certainly is easier :-)  

I'm not really familiar enough with utf-8 and the languages we are using to know if your concern is a problem or not for me.
[17 Jul 2007 23:12] MySQL Verification Team
See bug: http://bugs.mysql.com/bug.php?id=29576 regarding the same issue
with C API.
[31 Jul 2007 5:36] Tonci Grgin
Nathan, Mark, I will set this report to "Verified" as we are aware of this problem which can be partially fixed in c/J. Problem remains with characters outside BMP and with 7bit encodings. I think this should be documented properly too...
[31 Jul 2007 12:17] Nathan Sharp
Thank you Tonci!
[29 Aug 2007 19:20] Mark Matthews
This is fixed in the 5.1 source repository, it will be part of 5.1.3.
[31 Aug 2007 14:19] MC Brown
A note has been added to the 5.1.3 changelog: 

Connector/J now connects using an initial character set of utf-8 solely for the purpose of authentication to allow user names or database names in any character set to be used in the JDBC connection URL.