Description:
escapeSJISByteStream() is used to escape the '0x5c' in the high byte of double-byte characters such as GBK, BIG5, SJIS.
I think the author may be not fimilar with GBK and BIG5.
GBK characer set
GBK/2: B0A1-F7FE CJK UNIFIED IDEOGRAPH
GBK/3: 8140-A0FE CJK UNIFIED IDEOGRAPH
GBK/4: AA40-FEA0 CJK UNIFIED IDEOGRAPH
GBK/1: A1A1-A9FE symbol
GBK/5: A840-A9A0 symbol
gb2312 is subset of GBK, gb2312=GBK/1 + GBK/2
At com.mysql.jdbc.StringUtils.java line 311-312
if (((loByte >= 0x81) && (loByte <= 0x9F))
|| ((loByte >= 0xE0) && (loByte <= 0xFC))) {
It is not contain the whole GBK characer set.
if we use the conection url
"jdbc:mysql://localhost/test?useUnicode=true&characterEncoding=GBK"
when we insert a string field which contains chinese characers, there will be a StringIndexOutOfBoundsException.
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
at java.lang.String.charAt(String.java:460)
at com.mysql.jdbc.StringUtils.escapeSJISByteStream StringUtils.java:280)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:105)
at com.mysql.jdbc.PreparedStatement.setString(PreparedStatement.java:1068)
at TestSql.main(TestSql.java:19)
How to repeat:
CREATE TABLE mytable (name VARCHAR(20));
import java.sql.*;
public class TestSql {
public static String dbDriver = "com.mysql.jdbc.Driver";
public static String dbURL = "jdbc:mysql://localhost/test?useUnicode=true&characterEncoding=GBK"; // use gb2321 instead GBK no problem, because needn't call escapeSJISByteStream().
public static String user = "root";
public static String password = "root";
public static void main(String[] args)
throws ClassNotFoundException, SQLException {
Class.forName(dbDriver);
Connection conn = DriverManager.getConnection(dbURL, user, password);
PreparedStatement stmt = conn.prepareStatement(
"insert into mytable (name) values ( ? )");
stmt.setString(1, "\u4e2d\u6587"); //two chinese characters
//stmt.setString(1, "abcd"); if insert "abcd" no problem
stmt.execute();
stmt.close();
conn.close();
}
}
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
at java.lang.String.charAt(String.java:460)
at com.mysql.jdbc.StringUtils.escapeSJISByteStream StringUtils.java:280)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:105)
at com.mysql.jdbc.PreparedStatement.setString(PreparedStatement.java:1068)
at TestSql.main(TestSql.java:19)
Suggested fix:
very easy
com.mysql.jdbc.StringUtils.java line 311-312
if (((loByte >= 0x81) && (loByte <= 0x9F))
|| ((loByte >= 0xE0) && (loByte <= 0xFC))) {
replace it with
if (loByte >= 0x80) {....
everything is ok.
For double-byte characters such as GBK, BIG5, the high bit of loByte is always '1', it seperates double-byte characters from
the standard ASCII.
in addition, escaping the '0x5c' dose not need the origin String. The above rule is enough.
For this reason, many Chinese users still use the mm.mysql.
mm.mysql. have a problem in deal with the 0x5c.
The high byte of some characters is 0x5c, but these characters are seldom used.
Thank you for Chinese users.