Bug #69837 UTF8 charset is unusable
Submitted: 25 Jul 2013 9:37 Modified: 23 Jan 2014 14:10
Reporter: Alexander Yanushko Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / ODBC Severity:S2 (Serious)
Version:5.2.5 OS:Linux (Fedora 19)
Assigned to: Bogdan Degtyariov CPU Architecture:Any

[25 Jul 2013 9:37] Alexander Yanushko
Description:
It's impossible to insert/update UTF8 strings into a table when they are inline SQL statement. If strings are bound to placeholders there is no problem.
In version 5.2.1 there is no problem.

Here is a fragment of strace log from isql utility (unixODBC) when I tried to execute the following statement:
insert into AIDoc values(7, 'Петя', 0);

================================================================
socket(PF_NETLINK, SOCK_RAW, 9)         = 4
sendmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(2)=[{"<\0\0\0d\4\1\0\0\0\0\0\0\0\0\0", 16}, {"insert into AIDoc values(7, '\320\237\320\265\321\202\321\217', 0);\0", 44}], msg_controllen=0, msg_flags=0}, 0) = 60
close(4)                                = 0
time(NULL)                              = 1374659942
time(NULL)                              = 1374659942
poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(3, "\37\0\0\0\3set @@sql_select_limit=DEFAULT", 35) = 35
read(3, "\7\0\0\1\0\0\0\2\0\0\0", 16384) = 11
time(NULL)                              = 1374659942
poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(3, "4\0\0\0\3insert into AIDoc values(7, '\303\220\302\237\303\220\302\265\303\221\302\202\303\221\302\217', 0);", 56) = 56
read(3, "\7\0\0\1\0\1\7\2\0\0\0", 16384) = 11
================================================================

Correct UTF8 bytes are written into log ('Петя'='\320\237\320\265\321\202\321\217'), but wrong are passed to MySQL server ('Петя'!='\303\220\302\237\303\220\302\265\303\221\302\202\303\221\302\217')

As a result wrong data are written into the table.

How to repeat:
The simplest way is to run isql utility and issue the following statement:
SELECT 'Петя';

SQL> SELECT 'Петя';
+---------+
| Петя|
+---------+
| �е|
+---------+
SQLRowCount returns 1
1 rows fetched
SQL>

Here is my odbc.ini file:
[build@effi-build-fc19 ~]$ cat /etc/odbc.ini
[MySQL]
Driver =  /usr/lib/libmyodbc5.so
Port = 3306
Server = localhost
Charset = UTF8

[mysql_alib]
Driver = MySQL
Database = alib
Server = localhost
User = root
Charset = utf8

And here is my odbcinst.ini file:
[build@effi-build-fc19 ~]$ cat /etc/odbcinst.ini
# Driver from the mysql-connector-odbc package
# Setup from the unixODBC package
[MySQL]
Description     = ODBC for MySQL
Driver          = /usr/lib/libmyodbc5.so
#Driver          = /usr/lib/libiodbc.so
#Setup          = /usr/lib/libiodbcinst.so
FileUsage       = 1

Suggested fix:
Just downgrade to mysql-connector-odbc-5.1.11-1.fc18.i686.rpm
[25 Jul 2013 9:46] Alexander Yanushko
Sorry, I didn't try version 5.2.1. In fact I downgraded mysql-connector-odbc to version 5.1.11 and it works fine.
[26 Jul 2013 9:33] Bogdan Degtyariov
Hi Alexander,

thank you for you interest in MySQL software and for reporting the problem in MySQL Connector/ODBC.

Please note that S1 is not the right severity for such issues. S1 corresponds to a complete loss of service, crashes and infinite hangs with no workarounds.
In your case the workaround is to either downgrade the driver or use some other charset such as CP1251 for Cyrillic.

I have two questions about the problem:

 1. What is the exact version of UnixODBC you used?

 2. The odbcinst.ini file shows the driver library name as libmyodbc5.so, which
    is not right for Connector/ODBC 5.2.5. It has to be either libmyodbc5w.so 
    (Unicode version) or libmyodbc5a.so (ANSI version). So, which version of
    Connector/ODBC 5.2.5 you used? Was it Unicode (w) or ANSI (a)?

Thanks.
[26 Jul 2013 10:54] Alexander Yanushko
Bogdan, добрый день ;-)

Regarding severity I have no option to swith to another charset because
1) DB server is running on another machine and DB has a lot of UTF8 data
2) Application server we use works with UTF8 and cannot be easily rewritten to support other charsets
The only option for me is driver downgrade, and I thought it is the reason to mark the bug as critical in _current_ version. Sorry, I don't quite familiar with your Severity policy.

Regarding your questions
1. I tried several ones without any differencies. Now I use default unixODBC version for Fedora 19:
[root@effi-build-fc19 ~]# rpm -qa | grep ODBC
unixODBC-devel-2.3.1-4.fc19.i686
unixODBC-2.3.1-4.fc19.i686

2. Yes, now I use downgraded version of Connector/ODBC:
[root@effi-build-fc19 ~]# rpm -qa | grep connector
mysql-connector-odbc-5.1.11-1.fc18.i686
[root@effi-build-fc19 ~]# rpm -qf /usr/lib/libmyodbc5.so
mysql-connector-odbc-5.1.11-1.fc18.i686
And with this version I have no problem with UTF8

With version 5.2.5 I have the following libraries:
[root@effi-build-fc19 ~]# rpm -qa | grep connector
mysql-connector-odbc-5.2.5-2.fc19.i686
[root@effi-build-fc19 ~]# ll /usr/lib/libmyodbc*
lrwxrwxrwx. 1 root root     14 Jul 26 14:43 /usr/lib/libmyodbc5.so -> libmyodbc5w.so
-rwxr-xr-x. 1 root root 328064 Jun 17 10:59 /usr/lib/libmyodbc5w.so
[root@effi-build-fc19 ~]# rpm -qf /usr/lib/libmyodbc5w.so
mysql-connector-odbc-5.2.5-2.fc19.i686

As you can see, now I have Unicode version (libmyodbc5.so is just a symlink to libmyodbc5w.so), and still the following result:
[root@effi-build-fc19 ~]# isql mysql_alib
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select 'Вася';
+---------+
| Вася|
+---------+
| �а|
+---------+
SQLRowCount returns 1
1 rows fetched
SQL>

But during investigation of the problem I tried different configurations and packages, including libmyodbc5a.so. As well as iODBC driver manager instead of unixODBC - with no success.

Thank you, 
Alexander.
[30 Jul 2013 9:58] Bogdan Degtyariov
Hi Alexander,

thank you for your reply. I was able to repeat the bug.
Setting the status "Verified".
[1 Aug 2013 13:01] Bogdan Degtyariov
test case

Attachment: bug69837.c (text/plain), 3.31 KiB.

[1 Aug 2013 13:05] Bogdan Degtyariov
Hi Alexander,

Can you please try the ANSI version of Connector/ODBC 5.2.5 (libmyodbc5a.so)?
It worked well for me. I uploaded the C test case (bug69837.c).
Here is the output I had:

[dbs@dbs-pc-lnx6 app]$ ./bug69837
Connected!
Result: [Петя UTF8]
Done!

isql worked too:

[dbs@dbs-pc-lnx6 app]$ isql -v test525a
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select 'Петя' as c1;
+---------+
| c1      |
+---------+
| Петя|
+---------+

The ANSI version of the driver does not do any conversion and sends/receives the data to/from the server as is.
Please let me know the results.
[1 Aug 2013 14:38] Alexander Yanushko
Hello Bogdan,

Unfortunately I'm unable to test ANSI version because of lack of RPM for FC19. It was my first idea to try ANSI version, but I didn't manage to find proper RPM. All available ANSI packages require libodbc.so.1, while unixODBC available for FC19 provide libodbc.so.2. 

[build@effi-build-fc19 ~]$ sudo rpm -Uhv mysql-connector-odbc-5.2.5-1.ansi.i386.rpm
error: Failed dependencies:
        libodbc.so.1 is needed by mysql-connector-odbc-5.2.5-1.ansi.i386
        libodbcinst.so.1 is needed by mysql-connector-odbc-5.2.5-1.ansi.i386

I would gladly test ANSI version if you helped me to get it. 

Best regards, yan@
[2 Aug 2013 7:20] Bogdan Degtyariov
Hi Alexander,

I am surprised to see that UNICODE and ANSI version in FC distribution are linked against different versions of UnixODBC (libodbc.so.1/2).

libodbc.so.2 is backward compatible with libodbc.so.1. Therefore, you can just create a symlink libodbc.so.2 -> libodbc.so.1 and then try installing the ANSI RPM package again.
Please let me know how it goes.
Thanks.
[2 Aug 2013 8:22] Alexander Yanushko
Hi Bogdan,

It works with ANSI vesion. But I had to make a couple of hacks.

1) I got mysql-connector-odbc-5.2.5-1.ansi.i386.rpm package and install id with --nodeps option
# rpm -Uhv mysql-connector-odbc-5.2.5-1.ansi.i386.rpm --nodeps

2) I made two symlinks for old versions of libodbc.so and libodbcinst.so
# ln -s /usr/lib/libodbc.so.2 /usr/lib/libodbc.so.1
# ln -s /usr/lib/libodbcinst.so.2 /usr/lib/libodbcinst.so.1

3) I made a symlink to libmyodbc5a.so (alternatively it was possible to modify odbcinst.ini)
# ln -s  /usr/lib/libmyodbc5a.so /usr/lib/libmyodbc5.so

4) And finally, I had to make a symlink for mysql.sock (I don't know why ANSI driver tries to open a different socket)
# ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock

After all these manipulations driver works fine:
[build@effi-build-fc19 lib]$ isql mysql_alib
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select 'Петя';
+---------+
| Петя|
+---------+
| Петя|
+---------+
SQLRowCount returns 1
1 rows fetched
SQL>

Thank you for advises. Hope you finally will fix the problem with UNICODE version :-)

Good luck, yan@
[8 Aug 2013 8:02] Bogdan Degtyariov
Alexander,

The problem with the character conversion is a very complex one. I will try to explain why it is so complex and why it does not totally depend on the ODBC driver:

The ANSI version of Connector/ODBC is specially created for such occasions as you reported: you want to send the "raw" UTF8 strings to the server and get the same "raw" UTF8 result, which can be represented as SQLCHAR* type mapped to char* in C/C++.

UTF8 is a very special character set with variable-long characters that take 1 to 3 bytes. However, unlike to other multibyte character sets (such as UTF16, UTF32, UCS2 and so on) none of UTF8 characters contain \0 byte. This fact makes UTF8 compatible with programs and APIs that are using single-byte character sets and are treating \0 byte as a string termination character.

The UNICODE version of Connector/ODBC works with the Wide (W) character strings SQLWCHAR* defined as wchar_t* in Windows and unsigned short* in Linux/Unix (we assume the type definitiona provided by UnixODBC driver manager). These wide character strings can be treated as UTF16 strings that consist of fixed-length two-byte characters. One of two bytes of such characters can be \0 and it will not be treated as a string termination, only if both bytes are \0\0 they give the wide zero character \0000 terminating the wide string.

Many ODBC API functions that work with the string data have the ANSI and UNICODE (also known as Wide) versions such as:

ANSI:      SQLExecDirect (SQLHSTMT hstmt, SQLCHAR  *str, SQLINTEGER str_len)
UNICODE:   SQLExecDirectW(SQLHSTMT hstmt, SQLWCHAR *str, SQLINTEGER str_len)

Let us see what happens when your application is using the ANSI version of SQLExecDirect() function, but at the same time is loading the UNICODE ODBC driver (libmyodbc5w.so):

 1. The ANSI function takes the ANSI single-byte character string. As I explained above it can also be a UTF8 string.
 
 2. The driver manager knows that the driver is UNICODE and therefore the data must be converted from ANSI to UNICODE. It knows nothing about UTF8 data and therefore each byte of UTF8 non-latin charactes is treated and converted separately instead of a single UTF8 character. The result of this conversion is the data corruption and the UNICODE function SQLExecDirectW receives the incorrectly converted string.
 
 3. The corrupted input cannot produce the correct output, so the result is a complete rubbish.
 
 The real problem here is that with UnixODBC 2.3.1 you cannot connect to any other hosts but localhost and any other socket files but the hardcoded /var/lib/mysql/mysql.sock
[23 Jan 2014 14:10] Bogdan Degtyariov
This bug will be closed because the problem should be resolved by using the ANSI version of ODBC driver. We keep the ANSI version specially for such occasions.