Bug #33967 Character encoding corruption in server or MyODBC, on newer Linux OSs
Submitted: 21 Jan 2008 22:37 Modified: 5 Feb 2008 12:31
Reporter: Ken Johanson Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / ODBC Severity:S2 (Serious)
Version:< 5.1.22-rc-log OS:Linux (FC5 vs FC8)
Assigned to: CPU Architecture:Any
Tags: charsets, ODBC, unixODBC

[21 Jan 2008 22:37] Ken Johanson
Description:
The exact same mysql server and myodbc binaries, exhibit different behavior when tested on a server running on Fedora core 6 vs running on FC8 (presumably for a dependent lib change) -- the server OS version (libs) is the ONLY difference...

Served FC6, all is fine. Served on FC8 the following example command executed through MyODBC (viewed on any ODBC client) (tested PHP 5 and another odbc client), shows corrupt characters:

> check table test;

Specifically, the value cell for 'Op' and 'Msg_type' is corrupt:
mysql.test | che�� | sta��� | OK

I have thus far not examined the packet stream to identify what bytes really are transferred (I am not a char encoding guru).

Unixodbc's isql client is slightly different: the values are truncated and the column delimiting formatting is dramatically different between FC6 and FC8.

Marking serious as it may cause unpredictable and non-obvious character data corruption. I have not been able to reproduce outside of a "check table" test so far (for lack of testing/time), however it likely indicates a much bigger problem.

How to repeat:
Here is the server's config (same on both servers):

[mysqld]
set-variable=collation_server=utf8_unicode_ci
set-variable=character_set_server=utf8
set-variable=max_allowed_packet=16M
set-variable=max_connections=1000
set-variable=sql-mode=PIPES_AS_CONCAT,ANSI_QUOTES,NO_BACKSLASH_ESCAPES
basedir=/usr/local/mysql/
lower_case_table_names=1
datadir=/usr/local/mysql/data

Disabling the utf8 params did not change the symptom.

The odbc.ini DSN contains only standard database, host, user, password entries.
[22 Jan 2008 4:18] Valeriy Kravchuk
What exact version of MyODBC do you use?
[22 Jan 2008 5:33] Ken Johanson
The unixODBC version is 2.2.11-6.2.1; the MyODBC version is 3.51.23-0, I also had this problem with the last FC5-native version (something substantially older).
[25 Jan 2008 16:47] Tonci Grgin
Hi Ken and thanks for your report. What happens if you use 3.51.17 (for example)?
[25 Jan 2008 17:04] Ken Johanson
Tonci, I hope you don't mind if I speculate, that 3.51.17 will have the same problem, because a newer version (3.51.23-0) has the identical behavior, and the old Fedora Core 5 version (3.51.12-1.2.1) also has it. 3.51.17 is in-between. 

Unfort. I have limited access to the servers so cannot easily change/test versions. However I suspect it is partly unixODBC and/or glibc related, because the same MyODBC version does not have this problem on Fedora 8. However it does seem to indicate that MyODBC is relying on having a matching glib API (stream/char en/decoding?) on both the server and client. And if they don't match corruption could occur. If it were possible for MyODBC to filter the stream before it is sent to the native decoder, that could save corruption.
[25 Jan 2008 17:35] Jess Balint
Can you check the value of $LANG on both hosts? (under the environment that will run the ODBC applications)
[25 Jan 2008 17:42] Ken Johanson
$LANG is "en_US.UTF-8" on both the FC5 and FC8 clients.
[25 Jan 2008 17:55] Jess Balint
Can you try connecting with the command line client and running "Set names utf8" and then "select val,hex(val) from table where ...." for one of the rows that is returning incorrect data? Also, please mention the expected value.
[25 Jan 2008 18:10] Ken Johanson
I am only able (so far) to reproduce this problem using a "check table" command. I have not (had the time to) find a regular table that will reproduce; is there a way I can get the hex value for the fields of check table?

I can say, that the mysql client itself (being non-odbc) show normal characters; there is no corruption. This is also true of JDBC drivers. Not sure if this helps..
[1 Feb 2008 10:53] Tonci Grgin
Ken, there is a reason I asked for 3.51.16 to test with... You will probably be able to repeat the problem with *any ad-hoc* query and *any connector* and Bug#10491 is reason why:
" This problem represents grave issue for connectors. They can't work around it, at least not reliably. Ad hoc user queries like this do not allow connector to distinguish between "SHOW CREATE TABLE", where it should treat BIANRY as UTF8, and "SELECT varbinary_col FROM some_table", where it really should be binary..."

Now, your data is returned to ODBC client in form of 0x"your actual string" so you may try to parse leading 0x out where appropriate. Any other user-issued query should be safe from this problem providing you use CAST( ... AS ... construct.

I don't think this is a bug nor that it has anything to do with Linux OS... What is your opinion?
[1 Feb 2008 17:13] Ken Johanson
Interesting. I cant say that it is the same bug, though, here's why:

1) I also am/have-been using the MySQL-AB JDBC Driver mysql-connector-java-3.1.12 ( $Date: 2005-11-17 15:53:48 +0100) driver, and the same SHOW commands have not return malformed values for any server version I've tried.

2) The data mangling is not a caused by the server OR ODBC connecteor directly (IMO): becasue as I described ealrier ONLY the glib and unixODBC versions being different between the client/server cause this -- even when *identicval* server and my-odbc version installed.

3) I did discover a new behavior difference in char en/decoding on FC8. The *extact* same Sun JVM 1.5.11 binarys, installed on FC8, now throw a character encoding exception if given a bytestream with a 0xA9 (©) to java.io.FileReader (and $LANG=en_US.UTF-8 is on both FC5 and FC8). Presumaby the Sun FileReader uses native libs for char coding. InputStreamReader(is, "UTF-8") does NOT have this problem.

So this may not be a bug with the odbc driver per se, but a change in glibc. Still I think if data could be corrupted the odbc driver should fail fast (like the jvm is doing), or be patched to use a failsafe stream coder.

Or it may simply point to a bug or change in glibc that should be documented.
[5 Feb 2008 12:31] Tonci Grgin
Ken, every connector team is independent in their decisions, as specs are too different, thus the fact that something works or works differently in one connector does not mean that behavior can be transferred to other connector.

I don't see the bug here except for the server one I mentioned before and it is my opinion that's the root cause of your problems. So "SET NAMES ..." or any other trick will not help. You can try upgrading MySQL server and/or downgrading MyODBC, that's all. CAST() might have helped but you can't use it with ad-hoc queries.