Bug #4010 StringUtils.escapeEasternUnicodeByteStream is still broken for GBK
Submitted: 4 Jun 2004 16:15 Modified: 15 Aug 2004 22:42
Reporter: Jim Rao Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:3.0.14 OS:Windows (Windows 2000/XP)
Assigned to: Mark Matthews CPU Architecture:Any

[4 Jun 2004 16:15] Jim Rao
Description:
Ok, this method has caused several problems before and is fixed again and again. But I think it is still broken, at least for GBK charset. I have no idea why it is necessary to escape multibyte character bytes in strings, the manual says that escape should consider charset, and points to the function mysql_real_escape_string(). After browsing through the source code, it seems that the special treatement for multibyte character is to skip them instead of escape them, so the very idea of escapeEasternUnicodeByteStream confuses me.

The only explanation I can think of is to somehow work around a bug which makes it impossible to insert blobs into GBK database (This bug seems to be reported multiple times, but ignored), although I don't understand how escaping strings (instead of blobs) could solve it.

How to repeat:
Test case is modeled after Connector/J's unit test, the test is run against a windows installation of mysql 4.0.20a, with default-character-set set to gbk. Result is:
java.sql.SQLException: Syntax error or access violation,  message from server: "You have an error in your SQL syntax.  Check the manual that corresponds to your MySQL server version for the right syntax to use near ''?\')' at line 1"
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1977)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1163)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1272)
	at com.mysql.jdbc.Connection.execSQL(Connection.java:2236)
	at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1741)
	at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1588)
	at test.MySQLConnectorJavaTest.doGBKEscapeBugTest(MySQLConnectorJavaTest.java:58)
	at test.MySQLConnectorJavaTest.testGBKEscapeBug(MySQLConnectorJavaTest.java:26)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:324)
	at junit.framework.TestCase.runTest(TestCase.java:154)
	at junit.framework.TestCase.runBare(TestCase.java:127)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at junit.framework.TestSuite.runTest(TestSuite.java:208)
	at junit.framework.TestSuite.run(TestSuite.java:203)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:392)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:276)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:167)

******************************************************************************
/*
 * Created on 2004-6-4
 */
package test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Properties;

import junit.framework.TestCase;

/**
 * @author RJX
 */
public class MySQLConnectorJavaTest
extends TestCase
{	

	public void testGBKEscapeBug()
	throws Exception 
	{
		//doGBKEscapeBugTest("\u4e2d\u6587\u5904\u7406"); // sanity check
		doGBKEscapeBugTest("\u603d"); // this is a valid Chinese char
		//doGBKEscapeBugTest("?"); // it looks like this if you can read Chinese
	}
	
	public void doGBKEscapeBugTest(String testString) 
	throws Exception 
	{		
		String dbUrlBase = "jdbc:mysql:///";
		String dbName = "gbktest";
		String tableName = "gbkEscapeTest";
		Properties connProps = new Properties();
		connProps.put("useUnicode", "true");
		connProps.put("characterEncoding", "GBK");
		connProps.put("user", "root");

		Connection gbkConn = DriverManager.getConnection(dbUrlBase + "test", connProps);
		Statement gbkStmt = gbkConn.createStatement();
		ResultSet rs = null;
		try 
		{
			gbkStmt.executeUpdate("DROP DATABASE IF EXISTS " + dbName);
			gbkStmt.executeUpdate("CREATE DATABASE " + dbName);
			gbkStmt.close();
			gbkConn.close();
			
			gbkConn = DriverManager.getConnection(dbUrlBase + dbName, connProps);
			gbkStmt = gbkConn.createStatement();
			gbkStmt.executeUpdate("DROP TABLE IF EXISTS " + tableName);
			gbkStmt.executeUpdate("CREATE TABLE " + tableName + " (field1 varchar(10))");

			PreparedStatement sjisPStmt = gbkConn.prepareStatement("INSERT INTO " + tableName + " VALUES (?)");
			sjisPStmt.setString(1, testString);
			sjisPStmt.executeUpdate();

			rs = gbkStmt.executeQuery("SELECT * FROM " + tableName);
			rs.next();
			String retrString = rs.getString(1);
			assertEquals(testString, testString, retrString);
		} 
		finally 
		{
			gbkStmt.executeUpdate("DROP TABLE IF EXISTS " + tableName);
			if(rs != null) rs.close();
			gbkStmt.close();
			gbkConn.close();
		}
	}
	
	/* (non-Javadoc)
	 * @see junit.framework.TestCase#setUp()
	 */
	protected void setUp() throws Exception
	{
		Class.forName("com.mysql.jdbc.Driver").newInstance();
	}

}

******************************************************************************

Some other food for thought:
1. Set default-character-set to latin1, while keeping the useUnicode/characterEncoding setting will make the test case pass.
2. Set default-character-set to latin1, and remove the useUnicode/characterEncoding setting will make the test case fail due to assertion.

Suggested fix:
Depends on what exactly this method is supposed to do. What happened is that when calling setString on Chinese char \u603d, which is 0x905C under GBK encoding, the 0x5C part got escaped into 0x5C5C. After the escaped result 0x905C5C is shipped to the server, the server's ctype-gbk recognizes that 0x905C is a GBK char and skips it, leaves the last 0x5C which pairs up with the ending single quote and messed up the sql. So just as stated in the begining, the cause of the problem seems to be that this method is against mysql server's rule on handling multibyte char escapes.
[4 Jun 2004 16:27] Mark Matthews
The only bug is that I can see is that the driver is 'unaware' that you're using GBK for the server, so it is double escaping. Previous versions of the server didn't handle 'emebedded' 0x5c characters that are present in the GBK character set.

Thanks to your testcase, this will now be fixed.
[5 Jun 2004 3:15] Jim Rao
Hi,

Thank you for the quick response. I wonder if you could explain a bit more on the purpose of this method, or point me to some docs/source/mail that do? As far as I can see, no server ever considered the special case of 0x5C as part of a GBK multibyte char, it is skipped like the rest of the GBK bytes. Michael Widenius' comment in bug 369 indicates that \ and ' that are part of multi-byte char should not be escaped, is this still true?

Thanks
[5 Jun 2004 3:30] Mark Matthews
This issue was originally started with Connector/J when users were using it _without_ configuring their server to GBK. So ignore this case.

You are correct that this does not need to be done when the server is configured to use GBK. If you force the driver to use GBK, and your server is not configured to accept GBK, then the patch I propose should still work.

You can test our fix to this on Monday, when that snapshot build goes out.
[5 Jun 2004 17:40] Jim Rao
Ok, I see, thank you for the explanation. I'll try the fixed version once it is available.
[14 Jun 2004 17:03] Jim Rao
I just tried the 20040614 nightly build of v3.0, it seems that the problem still exists. The good news is that the driver have no problem inserting other characters of GBK except those ends with 0x5C (and two others which could not be encoded by J2SDK).
[14 Jun 2004 17:09] Mark Matthews
That is odd, given that your testcase passes...Do you have different data you are inserting now?
[15 Jun 2004 4:06] Jim Rao
Yes, it's strange, I tried the old data and the newer one I built last week. Maybe I got the wrong version? Is there a way to determine if the source code and/or binary I got is correctly patched?
[12 Aug 2004 17:50] Mark Matthews
What are the dates on the .jar file (or the files inside)?
[14 Aug 2004 2:39] Jim Rao
The dates are 2004/6/14, but that doesn't matter now. I have just downloaded the 20040813 nightly build, and it passed the old and new unit test, so this bug is finally fixed, thank you.