Bug #42494 | latin1 over UTF-8 encoding fail with mysql-connector-java-5.1.7 | ||
---|---|---|---|
Submitted: | 30 Jan 2009 22:16 | Modified: | 11 Feb 2009 13:46 |
Reporter: | Wilkin Cheung | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | Connector / J | Severity: | S2 (Serious) |
Version: | 5.0.19 | OS: | Linux (RedHat 4 Enterprise) |
Assigned to: | CPU Architecture: | Any | |
Tags: | character encoding, driver, jdbc, Latin1, UTF-8 |
[30 Jan 2009 22:16]
Wilkin Cheung
[30 Jan 2009 23:14]
Wilkin Cheung
The Mind is just a java servlet that runs on Tomcat
[3 Feb 2009 13:19]
Susanne Ebrecht
Hello Wilkin, I fear that this is not a bug. It is just a problem with understanding encoding handling. First of all you should know: JDBC always is using UTF8. You can't change this. Second: We have had a bug in older MySQL server 5.0 and 5.1 version according to converting utf8 data. So if you use an older MySQL version please update to actual server version (5.0.75 or 5.1.30). Now let me try to explain you encoding handling: Usually we are talking about three different kinds of encoding: Environment encoding: that is the encoding your environment is using. Means your browser, or your shell, or your terminal or whatever. For using JDBC this should be utf8. Means your browser should be set to utf8. Otherwise you will get gibberish because your browser wasn't able to interpret your keyboard right. Client encoding: this means in MySQL the variables character_set_client, character_set_result, character_set_connection. The client encoding should match to your environment encoding. So if your environment encoding is utf8 these values should be utf8 too. When you are using JDBC then the JDBC driver will handle this for you. You should not change this variables in your Java code by your own. When you will do it, you will get gibberish. Database encoding: here is meant the encoding of your database/table/column. In MySQL every single column is able to have a different encoding. It doesn't matter which encoding the column has as far as you never try to insert/select values that aren't possible in the character set. I mean here for example your column is latin1 and you try to get in/out hebrew signs. This won't work because latin1 don't know about hebrew. So when you column is latin1 and your environment/client is utf8 and you don't use non-latin1 signs then you just can type what you want and the system will fully automatic and transparent convert your input from utf8 into latin1. Just and example by using the letter ä: Environment=utf8 client encoding is set to utf8 Column is latin1 You make: insert into t values('ä'); Now you can test with select length() or hex() if the ä is stored right. Your column is latin1 so length result should be 1. Is the result 2 you have gibberish and either environment or client encoding is set wrong. For utf8 columns an ä should have length 2. Also here if length is 1 it is gibberish. The advantage from JDBC is that you just have to make sure that your environment (browser) is using utf8. JDBC will full automatic and transparent negotiate the client encoding with your mysql system. What happened at your szenario my guess is: your browser wasn't set to utf8 or Tomcat/apache wasn't set to utf8.
[6 Feb 2009 17:41]
Wilkin Cheung
Thanks for the comments - turns out the JDBC connection string characterEncoding should be set to "ISO8859_1". Even though "ISO8859_1" is equivalent to "latin1". Please close
[11 Feb 2009 13:46]
Tonci Grgin
Wilkin, I'm glad problem is solved.