Bug #2358 java app replaces unicode characters by ?
Submitted: 12 Jan 2004 9:32 Modified: 12 Jan 2004 9:40
Reporter: [ name withheld ] Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version:3.0.9 OS:Microsoft Windows (ms win xp)
Assigned to: CPU Architecture:Any

[12 Jan 2004 9:32] [ name withheld ]
Description:
||Shriharih||

Win XP using Mysql 4.1 alpha and connecting via connector/j 3.0.9 .
The databse and table are set to characterset utf8, I also append to the JDBC database URL, characterset utf8 (ucs2 not supported it says).
Inserts and retrievals are working fine, but when I send a unicode string with Indic characters, the table stores only ?'s which show up in the next select * and also in control center.

How to repeat:
||Shriharih||

Set up a database and table with utf8 encoding and connect using characterset utf8 via connector/j. Send a unicode string with Indic or rather non-English characters. It will replace them by ?'s .
[12 Jan 2004 9:40] Mark Matthews
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.
[12 Jan 2004 9:42] Mark Matthews
Please tell us _what_ characters you are using, and show _exactly_ what JDBC connection URL you are using (preferably, all in a standalone testcase, using unicode escape sequence '\uNNNN' so that web browsers, etc, don't munge the characters in question).

'Non-english' characters is not specific enough. The JDBC driver unit tests cover this case (UTF-8 character encodings), so we need to know which _specific_ characters are not working for you.
[13 Jan 2004 3:10] [ name withheld ]
Thanks for the fast response.
I was using Forte for java but here's a sample standalone stuff I wrote which is giving me problems -

conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/binding?useUnicode=true&characterEncoding=utf8&user=ram&password=shyam"); 
stmt = conn.createStatement(); 
stmt.execute("INSERT into binding.test1(name) values(\"test2 \u0918\u0919\u091c\u092d \");"); 

The statement inserted without error but when I browsed the data in control center I saw that the following had been inserted -
"test2 ���भ"
Note that the utf8 encoding of the first three unicode Hindi characters are identical and wrong; the fourth unicode Hindi character is correct.
In fact if you retrieve and display it in a java text field it shows a box and question mark for the first three characters, only the fourth character shows correctly.

It's the same problem with other Indic characters in the \u09xx range. Some of them store and retrieve correctly while some are obviously wrong and identically stored.

[Control center has a problem with accepting input and displaying these characters while java does not have the display problem at least. But I've filed that as another bug.]

- Aditya Gilra (bug-filer)
[14 Jan 2004 7:39] [ name withheld ]
||Shriharih||

No follow-up to the comment I wrote?
I think I've specified the unicode characters giving problems clear enough with sample code. But the status still remains 'can't repeat'.

Hope you'll reconsider since this bug is important to me.