MySQL Bugs: #32627: double recoding when inserting utf8 data in utf8 database

Bug #32627	double recoding when inserting utf8 data in utf8 database
Submitted:	22 Nov 2007 15:51	Modified:	29 Feb 2008 1:19
Reporter:	Susanne Ebrecht	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	Connector / ODBC	Severity:	S3 (Non-critical)
Version:	5.1	OS:	Any
Assigned to:	Jim Winstead	CPU Architecture:	Any

Description:
Hi,

Using server: 5.1.22
Using: libiodbc

I wanted to store utf8 data in an utf8 database.
When I look into the database, unfortunately the data looks like double recoded.

I made tests with C-API and MyODBC. It works well with C-API but not with MyODBC.

I'll attach the c-files, both are utf8.

For testing, I stored the German umlauts: 'äöüß' and in a second row: 'ÄÖÜSZ'. Of course the 'SZ' is always displayed ok.

I made a: select length(column) on the columns. It doesn't matter if the column is text, varchar or char. Using MyODBC the length is 16 for 'äöüß' and 14 for 'ÄÖÜSZ'.
Using MyODBC the length has the correct value of 8 in both rows.

Looking into the trace file, the umlauts are correct there. I will add the trace file too.

Looking into the server log:

C-API:

071122 16:27:13    1 Connect    miracee@localhost on test
                    1 Query     set names utf8
                    1 Query     create database if not exists utf8capitest character set utf8 collate utf8_unicode_ci
                    1 Query     create table if not exists utf8capitest.utf8(id serial, t text, v varchar(100), c char(100), primary key(id
))
                    1 Query     insert into utf8capitest.utf8(t, v, c) values ('äöüß','äöüß','äöüß')
                    1 Query     insert into utf8capitest.utf8(t, v, c) values ('ÄÖÜSZ','ÄÖÜSZ','ÄÖÜSZ')
                    1 Quit

MyODBC:

071122 16:27:20    2 Connect    miracee@localhost on test
                    2 Query     SET NAMES utf8
                    2 Query     SET character_set_results = NULL
                    2 Query     SET SQL_AUTO_IS_NULL = 0
                    2 Query     create database if not exists utf8test character set utf8 collate utf8_unicode_ci
                    2 Query     create table if not exists utf8test.utf8(id serial, t text, v varchar(100), c char(100), primary key(id))
                    2 Query     insert into utf8test.utf8(t, v, c) values('Ã¤Ã¶Ã¼Ã<U+009F>','Ã¤Ã¶Ã¼Ã<U+009F>','Ã¤Ã¶Ã¼Ã<U+009F>')
                    2 Query     insert into utf8test.utf8(t, v, c) values('Ã<U+0084>Ã<U+0096>Ã<U+009C>SZ','Ã<U+0084>Ã<U+0096>Ã<U+009C>SZ','
Ã<U+0084>Ã<U+0096>Ã<U+009C>SZ')
                    2 Quit

I also tried to add 'Charset = utf8' at the odbc.ini for the DSN but that doesn't matter. It's the same with and without this.

How to repeat:
Test files attached.
Don't forget to change the DSN name at the odbc test file.
I compiled the test files with:

ODBC testfile:

gcc -g -Wall  -I /PATH/include/ -o OUTPUTNAME utf8tests.c -L /PATH/lib -liodbc

C-API testfile:

gcc -g -Wall -I/PATH/include/ -o OUTPUTNAME utf8capitest.c -L /PATH/lib -lmysqlclient

Suggested fix:
It seems, MyODBC does a double recoding here.

C-API test

Attachment: utf8capitest.c (text/x-csrc), 1.84 KiB.

trace file

Attachment: bugs.odbc5.mysql.20071122-162720.trace (application/octet-stream, text), 6.53 KiB.

Issuing "set names utf8" is incorrect and future versions of the driver will prevent it from being executed.

Jess,

I didn't do a "set names". The "set names utf8" what you see at the logs is from the driver not from me. The driver automatically makes this "set names utf8".

This is due to iODBC. When used with a Unicode driver (like C/ODBC 5.1), it takes all arguments to non-W methods and does an ANSI-to-Unicode conversion, and then calls the W methods of the driver. If you want to use Unicode data with ODBC, the only relatively portable way to do it is by using SQLWCHAR.