Bug #32627 double recoding when inserting utf8 data in utf8 database
Submitted: 22 Nov 2007 15:51 Modified: 29 Feb 2008 1:19
Reporter: Susanne Ebrecht Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / ODBC Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Jim Winstead CPU Architecture:Any

[22 Nov 2007 15:51] Susanne Ebrecht
Description:
Hi,

Using server: 5.1.22
Using: libiodbc

I wanted to store utf8 data in an utf8 database.
When I look into the database, unfortunately the data looks like double recoded.

I made tests with C-API and MyODBC. It works well with C-API but not with MyODBC.

I'll attach the c-files, both are utf8.

For testing, I stored the German umlauts: 'äöüß' and in a second row: 'ÄÖÜSZ'. Of course the 'SZ' is always displayed ok.

I made a: select length(column) on the columns. It doesn't matter if the column is text, varchar or char. Using MyODBC the length is 16 for 'äöüß' and 14 for 'ÄÖÜSZ'.
Using MyODBC the length has the correct value of 8 in both rows.

Looking into the trace file, the umlauts are correct there. I will add the trace file too.

Looking into the server log:

C-API:

071122 16:27:13    1 Connect    miracee@localhost on test
                    1 Query     set names utf8
                    1 Query     create database if not exists utf8capitest character set utf8 collate utf8_unicode_ci
                    1 Query     create table if not exists utf8capitest.utf8(id serial, t text, v varchar(100), c char(100), primary key(id
))
                    1 Query     insert into utf8capitest.utf8(t, v, c) values ('äöüß','äöüß','äöüß')
                    1 Query     insert into utf8capitest.utf8(t, v, c) values ('ÄÖÜSZ','ÄÖÜSZ','ÄÖÜSZ')
                    1 Quit

MyODBC:

071122 16:27:20    2 Connect    miracee@localhost on test
                    2 Query     SET NAMES utf8
                    2 Query     SET character_set_results = NULL
                    2 Query     SET SQL_AUTO_IS_NULL = 0
                    2 Query     create database if not exists utf8test character set utf8 collate utf8_unicode_ci
                    2 Query     create table if not exists utf8test.utf8(id serial, t text, v varchar(100), c char(100), primary key(id))
                    2 Query     insert into utf8test.utf8(t, v, c) values('äöüÃ<U+009F>','äöüÃ<U+009F>','äöüÃ<U+009F>')
                    2 Query     insert into utf8test.utf8(t, v, c) values('Ã<U+0084>Ã<U+0096>Ã<U+009C>SZ','Ã<U+0084>Ã<U+0096>Ã<U+009C>SZ','
Ã<U+0084>Ã<U+0096>Ã<U+009C>SZ')
                    2 Quit

I also tried to add 'Charset = utf8' at the odbc.ini for the DSN but that doesn't matter. It's the same with and without this.

How to repeat:
Test files attached.
Don't forget to change the DSN name at the odbc test file.
I compiled the test files with:

ODBC testfile:

gcc -g -Wall  -I /PATH/include/ -o OUTPUTNAME utf8tests.c -L /PATH/lib -liodbc

C-API testfile:

gcc -g -Wall -I/PATH/include/ -o OUTPUTNAME utf8capitest.c -L /PATH/lib -lmysqlclient

Suggested fix:
It seems, MyODBC does a double recoding here.
[22 Nov 2007 15:52] Susanne Ebrecht
C-API test

Attachment: utf8capitest.c (text/x-csrc), 1.84 KiB.

[22 Nov 2007 15:54] Susanne Ebrecht
trace file

Attachment: bugs.odbc5.mysql.20071122-162720.trace (application/octet-stream, text), 6.53 KiB.

[3 Dec 2007 18:08] Jess Balint
Issuing "set names utf8" is incorrect and future versions of the driver will prevent it from being executed.
[4 Dec 2007 8:54] Susanne Ebrecht
Jess,

I didn't do a "set names". The "set names utf8" what you see at the logs is from the driver not from me. The driver automatically makes this "set names utf8".
[29 Feb 2008 1:19] Jim Winstead
This is due to iODBC. When used with a Unicode driver (like C/ODBC 5.1), it takes all arguments to non-W methods and does an ANSI-to-Unicode conversion, and then calls the W methods of the driver. If you want to use Unicode data with ODBC, the only relatively portable way to do it is by using SQLWCHAR.