Bug #33967 | Character encoding corruption in server or MyODBC, on newer Linux OSs | ||
---|---|---|---|
Submitted: | 21 Jan 2008 22:37 | Modified: | 5 Feb 2008 12:31 |
Reporter: | Ken Johanson | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Connector / ODBC | Severity: | S2 (Serious) |
Version: | < 5.1.22-rc-log | OS: | Linux (FC5 vs FC8) |
Assigned to: | CPU Architecture: | Any | |
Tags: | charsets, ODBC, unixODBC |
[21 Jan 2008 22:37]
Ken Johanson
[22 Jan 2008 4:18]
Valeriy Kravchuk
What exact version of MyODBC do you use?
[22 Jan 2008 5:33]
Ken Johanson
The unixODBC version is 2.2.11-6.2.1; the MyODBC version is 3.51.23-0, I also had this problem with the last FC5-native version (something substantially older).
[25 Jan 2008 16:47]
Tonci Grgin
Hi Ken and thanks for your report. What happens if you use 3.51.17 (for example)?
[25 Jan 2008 17:04]
Ken Johanson
Tonci, I hope you don't mind if I speculate, that 3.51.17 will have the same problem, because a newer version (3.51.23-0) has the identical behavior, and the old Fedora Core 5 version (3.51.12-1.2.1) also has it. 3.51.17 is in-between. Unfort. I have limited access to the servers so cannot easily change/test versions. However I suspect it is partly unixODBC and/or glibc related, because the same MyODBC version does not have this problem on Fedora 8. However it does seem to indicate that MyODBC is relying on having a matching glib API (stream/char en/decoding?) on both the server and client. And if they don't match corruption could occur. If it were possible for MyODBC to filter the stream before it is sent to the native decoder, that could save corruption.
[25 Jan 2008 17:35]
Jess Balint
Can you check the value of $LANG on both hosts? (under the environment that will run the ODBC applications)
[25 Jan 2008 17:42]
Ken Johanson
$LANG is "en_US.UTF-8" on both the FC5 and FC8 clients.
[25 Jan 2008 17:55]
Jess Balint
Can you try connecting with the command line client and running "Set names utf8" and then "select val,hex(val) from table where ...." for one of the rows that is returning incorrect data? Also, please mention the expected value.
[25 Jan 2008 18:10]
Ken Johanson
I am only able (so far) to reproduce this problem using a "check table" command. I have not (had the time to) find a regular table that will reproduce; is there a way I can get the hex value for the fields of check table? I can say, that the mysql client itself (being non-odbc) show normal characters; there is no corruption. This is also true of JDBC drivers. Not sure if this helps..
[1 Feb 2008 10:53]
Tonci Grgin
Ken, there is a reason I asked for 3.51.16 to test with... You will probably be able to repeat the problem with *any ad-hoc* query and *any connector* and Bug#10491 is reason why: " This problem represents grave issue for connectors. They can't work around it, at least not reliably. Ad hoc user queries like this do not allow connector to distinguish between "SHOW CREATE TABLE", where it should treat BIANRY as UTF8, and "SELECT varbinary_col FROM some_table", where it really should be binary..." Now, your data is returned to ODBC client in form of 0x"your actual string" so you may try to parse leading 0x out where appropriate. Any other user-issued query should be safe from this problem providing you use CAST( ... AS ... construct. I don't think this is a bug nor that it has anything to do with Linux OS... What is your opinion?
[1 Feb 2008 17:13]
Ken Johanson
Interesting. I cant say that it is the same bug, though, here's why: 1) I also am/have-been using the MySQL-AB JDBC Driver mysql-connector-java-3.1.12 ( $Date: 2005-11-17 15:53:48 +0100) driver, and the same SHOW commands have not return malformed values for any server version I've tried. 2) The data mangling is not a caused by the server OR ODBC connecteor directly (IMO): becasue as I described ealrier ONLY the glib and unixODBC versions being different between the client/server cause this -- even when *identicval* server and my-odbc version installed. 3) I did discover a new behavior difference in char en/decoding on FC8. The *extact* same Sun JVM 1.5.11 binarys, installed on FC8, now throw a character encoding exception if given a bytestream with a 0xA9 (©) to java.io.FileReader (and $LANG=en_US.UTF-8 is on both FC5 and FC8). Presumaby the Sun FileReader uses native libs for char coding. InputStreamReader(is, "UTF-8") does NOT have this problem. So this may not be a bug with the odbc driver per se, but a change in glibc. Still I think if data could be corrupted the odbc driver should fail fast (like the jvm is doing), or be patched to use a failsafe stream coder. Or it may simply point to a bug or change in glibc that should be documented.
[5 Feb 2008 12:31]
Tonci Grgin
Ken, every connector team is independent in their decisions, as specs are too different, thus the fact that something works or works differently in one connector does not mean that behavior can be transferred to other connector. I don't see the bug here except for the server one I mentioned before and it is my opinion that's the root cause of your problems. So "SET NAMES ..." or any other trick will not help. You can try upgrading MySQL server and/or downgrading MyODBC, that's all. CAST() might have helped but you can't use it with ad-hoc queries.