Bug #11614 | Clob send as UTF-8 gets truncated if UTF-encoding is longer then string in char | ||
---|---|---|---|
Submitted: | 28 Jun 2005 14:16 | Modified: | 17 Oct 2005 19:42 |
Reporter: | Roland Littwin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Connector / J | Severity: | S2 (Serious) |
Version: | 3.1.10 | OS: | Any (All) |
Assigned to: | Mark Matthews | CPU Architecture: | Any |
[28 Jun 2005 14:16]
Roland Littwin
[30 Jun 2005 9:29]
Vasily Kishkin
Could you please write here test case and table definition ?
[30 Jun 2005 13:19]
Roland Littwin
Here a possible test case. 1. Create table and content CREATE TABLE `kmng`.`error` ( `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, `text` TEXT NOT NULL, PRIMARY KEY(`id`) ) TYPE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci; insert into error (id,text) values (1,''); 2. Call the following method and see what it enters. For mysql 4.1.12 it is ööö instead of öööööö as expected. public void writeLine(Connection con) throws SQLException { Statement st = con.createStatement(); ResultSet res = st.executeQuery("SELECT text FROM error WHERE id=1"); if(res.next()) { Clob c = res.getClob(1); c.truncate(0); c.setString(1,"öööööö"); PreparedStatement stmt = con.prepareStatement ( "UPDATE error SET text = ? WHERE id=1"); stmt.setClob ( 1, c ); stmt.executeUpdate(); stmt.close(); } res.close(); st.close(); }
[4 Jul 2005 15:49]
Vasily Kishkin
Tested on Win 2000 Sp4 , JDK 1.5, JDBC 3.1.8. Test case is attached
[4 Jul 2005 15:49]
Vasily Kishkin
Test case in UTF-8
Attachment: test.java (text/java), 940 bytes.
[5 Jul 2005 15:06]
Roland Littwin
First fix suggestion does not work correctly for clobs with more then 8192 byte and non-ascii characters. This is because getBytes is called in a loop for 8192 character chunks. So in the last chunk which is less than 8192 the unused charaters stored from the last iteration were also able to increase the byte array creted by encoding. The new fix converts only the first length characters so the problem does not occur. Fix suggestion that works also for clobs > 8192 bytes and is also faster :-) public static final byte[] getBytes(char[] c, SingleByteCharsetConverter converter, String encoding, String serverEncoding, int offset, int length, boolean parserKnowsUnicode) throws SQLException { try { byte[] b = null; if (converter != null) { b = converter.toBytes(c, offset, length); } else if (encoding == null) { byte[] temp = new String(c).getBytes(); b = new byte[length]; System.arraycopy(temp, offset, b, 0, length); } else { String s = new String(c,0,length); // Copy only used part, so old chars from previos iteration don't disturb result byte[] temp = s.getBytes(encoding); // if String to UTF-8 conversion length increses for non-ASCII chars length =temp.length; // new length !!! b = new byte[length]; System.arraycopy(temp, offset, b, 0, length); if (!parserKnowsUnicode && (encoding.equalsIgnoreCase("SJIS") //$NON-NLS-1$ || encoding.equalsIgnoreCase("BIG5") //$NON-NLS-1$ || encoding.equalsIgnoreCase("GBK"))) { //$NON-NLS-1$ if (!encoding.equalsIgnoreCase(serverEncoding)) { b = escapeEasternUnicodeByteStream(b, s, offset, length); } } } return b; } catch (UnsupportedEncodingException uee) { throw new SQLException(Messages.getString("StringUtils.10") //$NON-NLS-1$ + encoding + Messages.getString("StringUtils.11"), SQLError.SQL_STATE_ILLEGAL_ARGUMENT); //$NON-NLS-1$ } }
[20 Jul 2005 15:02]
Arnaud de Bossoreille
I encountered the same problem: a text field corrupted because it contains UTF-8 characters and is longer than 8192 characters. The problem occurs with at least the 3.1.8 and 3.1.10 versions but it does not with the 3.2.0-alpha version. The last fix works fine but I think patching the other cases of the surrounding condition would be better (I put a patch here: http://esca.idm.fr/~bozo/mysql/mysql-clob-utf8.patch) It would nice to have a stable release with this correction. Regards, Arnaud.