Bug #9206 "characterEncoding=UTF-8" broken ("UTF8" works though)
Submitted: 15 Mar 2005 22:00 Modified: 16 Mar 2005 21:06
Reporter: Jon Andersen Email Updates:
Status: Closed Impact on me:
Category:Connector / J Severity:S3 (Non-critical)
Version:3.1.7 OS:Microsoft Windows (Windows)
Assigned to: Mark Matthews CPU Architecture:Any

[15 Mar 2005 22:00] Jon Andersen
The driver does not properly accept "characterEncoding=UTF-8&characterSetResults=UTF-8" in JDBC URLs.  It may be that the driver internal JDBC URL parsing is broken by the dash "-" character.

The following exception occurs when _any_ query is executed on the connection:
java.sql.SQLException: unable to borrow: java.sql.SQLException: Unknown column 'UTF' in 'field list'

Notice that the driver is looking for column 'UTF', which isn't what it was told to do!  It was told 'UTF-8' was the character set to use.

Note that the Connector/J documentation states that "UTF-8" is a valid character set.  This shown at the bottom of page:

We tested and observed this bug on both Windows XP and Mac OSX.  The older drivers such as mysql-connector-java-3.0.16-ga-bin.jar are fine.  The newer drivers such as mysql-connector-java-3.1.7-bin.jar and mysql-connector-java-3.2.0-alpha-bin.jar are affected by this bug on both OSes.  We also tested against MySQL-4.1.5-gamma as well as other MySL server versions.  We are using the driver under Tomcat 5, with the JDBC URL being retrieved from a properties file (and its being retrieved OK).


How to repeat:
1. Use a JDBC URL that specifies "UTF-8".  For example:

2. Execute any query on the connection.

3.  You will get the exception"
java.sql.SQLException: unable to borrow: java.sql.SQLException: Unknown column 'UTF' in 'field list'

Suggested fix:

Specify "UTF8" in the JDBC URL _instead_ of "UTF-8".  This goes against the Connector/J documentation, but it does work on all current versions of the driver (3.016, 3.17, and 3.20-alpha).

Figure out why the driver is getting "UTF" for the character set instead of "UTF-8" (which is specified in the URL).  This is probably occuring in the URL parsing.  And then fix it.  Also fix the documentation.
[15 Mar 2005 23:06] Mark Matthews

This is a bug in that characterSetResults does not allow UTF-8, because it's passed directly to the server (but characterEncoding does, because there's quite a bit of code surrounding to convert canonical forms to names that the MySQL server understands). 

However, there really is no reason to force characterSetResults unless you're using a character encoding that's not known by the JDBC driver. Since UTF-8 is known by JDBC, the driver will use the character sets that the server tells it to via the field-level metadata for a result set.
[15 Mar 2005 23:08] Mark Matthews
P.S. Please post _full_ stack traces with your bug reports, unless they contain sensitive information. If they are sensitive, you can attach them as a file which only MySQL developers can see.

Full stack traces help us see problems much clearer and thus resolve bugs quicker.
[15 Mar 2005 23:23] Jon Andersen

Thanks so much for your prompt response to this bug report!  We will take your word for it, that removing "characterSetResults=UTF-8" will have no ill effects.  Our database is set to UTF-8 so if it works like you said it does, there will be no problems.  I did some preliminary testing and it appears that your workaround is fine.

P.S. Note that this is a new bug, it didn't affect Connector/J 3.0.16.
[16 Mar 2005 16:59] Anthony Whyte
On clarification regarding the suggested fix: you need not specify "UTF8" in place of "UTF-8" for the parameter value of parameter "characterEncoding" so long as you drop from your connection string the parameter "characterSetResults"  as in

[16 Mar 2005 21:06] Mark Matthews
This is fixed for version 3.1.8.