Bug #1051 SJIS encoding error - reproducible
Submitted: 15 Aug 2003 1:10 Modified: 28 Mar 2014 11:48
Reporter: Rick Knowles Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S1 (Critical)
Version:3.1 and 3.0 OS:Any (java compliant)
Assigned to: Alexander Soklakov CPU Architecture:Any

[15 Aug 2003 1:10] Rick Knowles
Description:
I've found a reproducible bug when inserting SJIS encoded text into mysql 4.1. When inserting ?²“¡?@Œ\ (which is the windows 1252 version of 4 SJIS kanji characters representing the name of a japanese person) via jdbc I get a sql syntax error (unclosed character literal).

The reason for the syntax error (from reviewing the sql syntax error in the debug log) appears to be because of the final \ char in the single-byte equivalent of this string. It's causing the jdbc driver to add a second slash (trying to escape the \ unnecessarily) which causes the server to think that the quote closing the string is being escaped. This then leaves an unclosed string and the statement gets rejected for invalid syntax.

Also *PLEASE* add the project root / name of the Connector/J source in BitKeeper to the Connector/J FAQ. I would love to try out the most recent source, but I can't find anywhere any references to what name to use at when checking the mysql BitKeeper web interface.

Thank you

How to repeat:
Try inserting any SJIS character that contains a \ as the second byte of the final kanji (using a preparedStatement). This bug does not occur with EUC encoding.

Suggested fix:
I suspect the answer lies in only escaping the double-byte version of the kanji (not the single-byte version), but I'm new to mysql, so I'll leave it with you. Suffice to say that the trailing slash scenario occurs often enough that in 20+ of the 100 tables in our application there is at least 1 field that contains one of these problems.
[15 Aug 2003 6:34] Mark Matthews
This is a duplicate bug which has already been fixed, but not yet released because Connector/J releases are waiting for MySQL-4.1.1 and the protocol changes it will have.

Connector/J hasn't been migrated to BitKeeper yet due to some as yet unsolved compatibility problems with the development tools used to maintain Connector/J. 

However, for the past year or so nightly snapshots have been made available. You can get a nightly snapshot build from http://mmmysql.sourceforge.net/snapshots/

The URL for nightly snapshots is temporary until our web team finishes the snapshots download page for _all_ MySQL products, which should be in the next few weeks. Watch the 'announce@lists.mysql.com' mailing list for an announcement when it is ready to use.
[15 Aug 2003 11:51] Rick Knowles
Thank you - I'll try one of the snapshots. Wouldn't this make a good faq entry ? (about the snapshots) 

Apologies about the duplicate - I did search for "sjis" in the bug tracker, but it seems I must have only searched open bugs. Sorry and thanks again.

Rick
[15 Aug 2003 12:11] Rick Knowles
Not quite fixed. The latest stable driver (2003/8/15) returns the same error, while the latest dev driver (same) inserts a mix of numbers and text into the field in question in some cases, and in other cases it just fails like the below.

Not sure if the character encoding will survive but here's a sample of the error anyway.

Hope this helps - thanks.

Rick 

---------------------------------

[root@weblin jdbcTransfer]# java -cp commons-collections.jar:commons-dbcp.jar:commons-pool.jar:classes12.jar:mysql-3.1-new.jar:. -Dfile.encoding=Shift_JIS TransferTableData -todriver com.mysql.jdbc.Driver -tourl jdbc:mysql://weblin/mrk?characterEncoding=sjis\&useUnicode=true -fromurl jdbc:oracle:oci8:@mrkun2  -validating -fixslashes -includes mr
FROM: suppressed/****@jdbc:oracle:oci8:@mrkun2 (oracle.jdbc.driver.OracleDriver)
TO:   suppressed/****@jdbc:mysql://weblin/mrk?characterEncoding=sjis&useUnicode=true (com.mysql.jdbc.Driver)
Starting at Sat Aug 16 04:06:46 JST 2003
Found 1 tables to transfer ... commencing ...
STARTED: Thread MR started at: Sat Aug 16 04:06:47 JST 2003
Error field 0 = JNJMC16629 (10)
Error field 1 = 0001 (4)
Error field 2 = 0000000009 (10)
Error field 3 = 0000001041 (10)
Error field 4 = null
Error field 5 = 1056 (4)
Error field 6 = 1054 (4)
Error field 7 = 0001330 (7)
Error field 8 = J&J エチコン 千島 明人 (14)
Error field 9 = 千島 明人 (5)
Error field 10 = チシマ アキヒト (8)
Error field 11 = 0 (1)
Error field 12 = 16629 (5)
Error field 13 = null
Error field 14 = null
Error field 15 = null
Error field 16 = null
Error field 17 = null
Error field 18 = null
Error field 19 = debug@so-netm3.co.jp (20)
Error field 20 = 135-0016 (8)
Error field 21 = 東京都江東区東陽6-3-2 イースト21タワー (23)
Error field 22 = debug@so-netm3.co.jp (20)
Error field 23 = debug@so-netm3.co.jp (20)
Error field 24 = debug@so-netm3.co.jp (20)
Error field 25 = debug@so-netm3.co.jp (20)
Error field 26 = null
Error field 27 = 0 (1)
Error field 28 = 0 (1)
Error field 29 = 2001-06-01 09:00:00.0 (21)
Error field 30 = 2001-06-01 18:00:00.0 (21)
Error field 31 = 2002-01-15 10:52:51.0 (21)
Error field 32 = 2001-09-11 18:12:14.0 (21)
Error field 33 = JNJ00001A1 (10)
Error field 34 = 2002-08-01 08:14:45.0 (21)
Error field 35 = 2002-08-01 08:14:45.0 (21)
Error field 36 = null
Error field 37 = null
Error field 38 = null
Error field 39 = null
Error field 40 = null
Error field 41 = null
Error field 42 = null
Error field 43 = null
Error field 44 = null
Error field 45 = null
Error field 46 = null
Error field 47 = null
Error field 48 = null
Error field 49 = null
Error field 50 = null
Error field 51 = null
Error field 52 = null
Error field 53 = null
Error field 54 = null
Error field 55 = null
Error field 56 = null
Error field 57 = null
Error field 58 = null
Error field 59 = null
Error field 60 = null
java.sql.SQLException: Communication link failure: java.io.IOException, underlying cause: Unexpected end of input stream

** BEGIN NESTED EXCEPTION ** 

java.io.IOException
MESSAGE: Unexpected end of input stream

STACKTRACE:

java.io.IOException: Unexpected end of input stream
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1708)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2031)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1305)
        at com.mysql.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:1012)
        at com.mysql.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:727)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1745)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1556)
        at TransferTableData.insertRow(TransferTableData.java:79)
        at TransferTableData.run(TransferTableData.java:164)
        at java.lang.Thread.run(Thread.java:534)

** END NESTED EXCEPTION **

        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1838)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2031)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1305)
        at com.mysql.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:1012)
        at com.mysql.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:727)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1745)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1556)
        at TransferTableData.insertRow(TransferTableData.java:79)
        at TransferTableData.run(TransferTableData.java:164)
        at java.lang.Thread.run(Thread.java:534)
ERROR: Thread MR failed after 242 lines in 942ms.
java.sql.SQLException: Communication link failure: java.sql.SQLException
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1300)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1387)
        at com.mysql.jdbc.Connection.execSQL(Connection.java:1734)
        at com.mysql.jdbc.Connection.commit(Connection.java:843)
        at TransferTableData.run(TransferTableData.java:208)
        at java.lang.Thread.run(Thread.java:534)
Finished spawning transfer threads
[15 Aug 2003 15:23] Mark Matthews
Can you give an example of a String that is failing (in Unicode escape \unnnn format, please, so it doesn't get munged by the bug system).

Also, have you set the database or table or column in question to use the SJIS character set? (From your stacktrace I see you're using MySQL-4.1, which does different things with strings than 4.0).
[15 Aug 2003 15:25] Mark Matthews
The docs will change once we get an official snapshot download site up. I'm curious as to what FAQ you're referring to as there's only the documentation that comes with the driver, and on the website, but no 'official' FAQ (other than a 'troubleshooting' section in the docs).
[15 Aug 2003 20:09] Rick Knowles
Looks like FAQ might have been the wrong word to use. I was actually meaning in that main documentation ( both the one at http://www.mysql.com/documentation/connector-j/index.html and included with the download archive). I said FAQ because I found a FAQ entry similar to what I was looking for in the mysql ODBC driver FAQ. 

Regarding the dodgy string .... if you set the page encoding in the browser when viewing the bug report to SJIS, the dodgy string will appear in correct kanji. As far as a unicode version hmmm ... not sure on the process of doing that. I'll try to find a way, and a send it back after this. If you have an easy way of doing it ... (I've tried writing a java program and using iconv, but they both give back garbage).

Thanks again,

Rick
[16 Aug 2003 0:22] Rick Knowles
That last comment was meant to read "if you have an easy way of doing it, please let me know and I'll do it", but that's what I get for replying at 4am.

Also yes I am using mysql 4.1.0-alpha (according to the download archive name), and I configured as follows:

CFLAGS="-O3 -march=athlon" CXX=gcc CXXFLAGS="-O3 -march=athlon \
-felide-constructors -fno-exceptions -fno-rtti" ./configure \
--prefix=/usr/local/mysql --enable-assembler \
--with-mysqld-ldflags=-all-static --enable-thread-safe-client --with-debug --with-charset=sjis --with-extra-charsets=complex

Using GCC 3.2.3 on Glibc 2.3.2 (basically a stock redhat 9 system) - hope this helps.

Rick
[16 Aug 2003 22:42] Rick Knowles
OK I got it ... an example of the offending string in xml'd utf16 form is:

佐藤 圭

Rick
[18 Aug 2003 8:11] Mark Matthews
Unfortunately, your data doesn't help me. I need the string in it's 'SJIS' encoding, not UTF-16.

You can write a simple Java program that just dumps the values of the string as ints by casting each char in the string to an int. That would be enough for me to work with. Your debugger might even be able to do it, since the string is short. I use Eclipse, and can tell it to list the values of primitives in decimal, hex, or both in the debugger.
[18 Aug 2003 8:56] Rick Knowles
OK ... obviously communication error there ...

char 0 = 20304
char 1 = 34276
char 2 = 12288
char 3 = 22317

Rick
[18 Aug 2003 8:57] Rick Knowles
I did what you said:

public class testjs
{
  public static void main(String argv[])
  {
    String test = "<8d>&#65410;&#65394;<93>&#65410;&#65377;<81>@<8c>\";
    for (int n = 0; n < test.length(); n++)
      System.out.println("char " + n + " = " + (int) test.charAt(n));
  }
}

[rickk@gw rickk]$ javac -encoding SJIS testjs.java 
[rickk@gw rickk]$ java testjs

the above submission was what it gave back - I didn't just copy it.
[19 Aug 2003 9:00] Mark Matthews
Are you setting this string via PreparedStatement.setString(), or is it a literal part of your query? I'm trying to mock up a testcase that actually fails and am getting nowhere fast at the moment (everything is working).
[19 Aug 2003 17:13] Rick Knowles
Yes - it's via prepared statement. The dev driver seems to work on that string but fail on others. The stable nightly snapshot of 8/17 was still giving that error.

Possibly JVM/environment related ? I was using Sun 1.4.2 for linux. I had LC_ALL and LANG set to ja_JP.SJIS .... 

Or maybe it's something in the way I compiled the server ?

If there's any tests you want me to run in the problem environment (I have 2 or 3 of them) just send them. If we get really stuck I can give you an account and let you log in on the actual box. Now might be a good time to take this "off list" if you like - mail me direct if you like.

Rick
[19 Aug 2003 17:13] Rick Knowles
Sorry - my direct email is r-knowles@so-netm3.co.jp
[10 Sep 2003 11:03] Mark Matthews
Have you tried a recent nightly version of 3.0 or 3.1? Your testcase works on my end with the latest 3.0.
[12 Dec 2003 8:40] [ name withheld ]
I experienced exactly the same error except with the database startup option –-default-character-set=big5. Some big5 characters, which the last byte containing the character of '\' and will cause this error. This error only happens when these buggy big5 characters are the last character of a string. 

Also, this error only happens in a 'PreparedStatement', the normal 'statement' (statement.executeUpdate('sql')) works fine.

I have created a sample program and source files at:
http://218.254.204.201/examples/jsp/CAL/page1.html

Platform:
Windows 2000 Professional
Java SDK 1.4
Mysql 4.0.16
Connector/J 
  version 3.0.9,
  version 3.1.0, 
  nightly version of mysql-connector-java-3.1-nightly-20031212-bin.jar

Thanks.
Kevin.
[12 Dec 2003 9:05] Mark Matthews
Kevin, from your source files, it looks like you're forcing the driver to use latin1 as the encoding:

useUnicode=true&characterEncoding=latin1

The escaping code won't work correctly if you've told the driver that the character encoding it should use is 'latin1'. You should either just let the driver figure it out from the server (i.e. remove 'useUnicode=true&characterEncoding=latin1' from your URL), or change the character encoding to the correct one.
[12 Dec 2003 19:32] [ name withheld ]
Thanks, Mark, that really helps. The reason I set the characterEncoding=latin1 because the big5 characters were not saved correctly with removing 'useUnicode=true&characterEncoding=latin1'. Finally, I figured out I had to convert the charset from "iso-8859-1" to "big5" in the JSP before passing the values as the statement parameters:

stmt.setString(1, new String(request.getParameter("text1").getBytes("iso-8859-1"), "big5"));

Thanks for the help.
[28 Mar 2014 11:48] Alexander Soklakov
I close it as Not a Bug.