Bug #4512 | Wrong Character Set handling | ||
---|---|---|---|
Submitted: | 12 Jul 2004 12:30 | Modified: | 28 Mar 2014 14:13 |
Reporter: | Heinz Doerr | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | Connector / J | Severity: | S2 (Serious) |
Version: | 3.0.14 | OS: | Any (UCS or UTF enabled environment) |
Assigned to: | Alexander Soklakov | CPU Architecture: | Any |
[12 Jul 2004 12:30]
Heinz Doerr
[12 Jul 2004 16:29]
Mark Matthews
[snip] > Filename and Strings passed into the Connector/J API SHOULD HAVE NOTHING > todo > with the local settings. Strings and char[] in Java are UCS2 and totally > independant on > any local settings (like default local, flle.encoding, ...). Therefore the > real translation > without taking care of broken and obsolete missuse of chars should be > always [snip] Unfortunately, the string passed in LOAD DATA LOCAL INFILE _does_ have something to do with the local encoding in the case when the MySQL client and the server do not have 'matching' character sets. When this is the case, when the driver transforms the strings to bytes, if the server's character set doesn't match the client, the string sent to the server is corrupted, thus causing the server to return the filename to load to the client in a corrupted state. The reason the 'default' JVM character set is used in this case is because even the LOAD DATA LOCAL INFILE statement doesn't respect character sets, so we can't use an encoding like UCS-2 and send the Java string 'opaquely'. The 'default' is a 'best-guess', and works for most situations. The 'default' character set of the JVM almost always allows the filename to be parsed correctly. If you are going to be 'mixing' character sets (i.e. JVM is different than MySQL server and/or the characters you place in your 'LOAD DATA LOCAL' statement), then we will have to expand the 'bugfix' to let you specify a character set to send LOAD DATA LOCAL queries to the server as a connection property. [snip] > The bigger problem is that the Jdbc Connector handles characters in a way > Sun > introduced char back in the '90 with Java 1.1. Starting with 1.2 (I guess) > these > getBytes() and new String(bytes) stuff got depreciated but unfortunatelly > Sun never > removed these methods from the API. Now we have SW out there which handles > > chars as being unsigned bytes. Which they are NOT. This is not Java > compliant since > 1.2 !!! The Connector/J has done a lot to patch this situation which works > sometimes, > but basically generates quite some trouble if you use char[] and String's > as intended > by the API from Sun version >= 1.2 . Neither String.getBytes() or new String(byte[]) are deprecated, at least not in any documentation from Sun that I have access to. Could you please clarify your statement "The Connector/J has done a lot to patch this situation which works sometimes, but basically generates quite some trouble if you use char[] and String's as intended by the API from Sun version >= 1.2" as I'm not sure if this is a comment, or actually part of the bug report.
[28 Mar 2014 14:13]
Alexander Soklakov
I close this report as "Can't repeat" because there is no feedback for a long time and codebase is too old. Please, feel free to reopen it if the problem still exists in current driver.