Bug #24323 UTF8 table data converted as latin1 data when sent to client
Submitted: 14 Nov 2006 23:10 Modified: 3 Jan 2007 13:35
Reporter: Jørgen Thomsen Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.26 OS:Linux (Linux )
Assigned to: CPU Architecture:Any
Tags: conversion, result, utf8

[14 Nov 2006 23:10] Jørgen Thomsen
Description:
Using utf8 between server and client causes the conversion of utf8 data found in the database to be converted to utf8 as if it was latin1. There should be no conversion. First detected by displaying a row in phpMyAdmin. Subsequent debugging including hexdumps of received data verified the problem.

How to repeat:
mysql> SET NAMES 'utf8';
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW SESSION VARIABLES LIKE 'character_set_client';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| character_set_client | utf8  |
+----------------------+-------+
1 row in set (0.00 sec)

mysql> SHOW SESSION VARIABLES LIKE 'character_set_connection';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| character_set_connection | utf8  |
+--------------------------+-------+
1 row in set (0.00 sec)

mysql> SHOW SESSION VARIABLES LIKE 'character_set_results';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| character_set_results | utf8  |
+-----------------------+-------+
1 row in set (0.00 sec)

mysql> SHOW FULL FIELDS FROM `kundeprodukter`;
 produkt      | varchar(80)  | utf8_bin   | NO   | PRI | NULL

mysql> SELECT produkt  FROM `kundeprodukter`  WHERE `nr` =197;
+--------------+
| produkt      |
+--------------+
| kælder.dk |
+--------------+
Real value should be kælder.dk

'latin1' data is displayed correctly i.e. converted to utf8 just as the à and | chars above which are the two chars of the utf8 value of the char 'æ'.

If the SET NAMES is omitted and the values left at 'latin1', no conversion is taking place and utf8 data is displayed correctly and latin1 data displayed incorrectly in an utf8 terminal.
[21 Nov 2006 16:08] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of

echo $LANG

from the same window where you run mysql command line client. Send the results of

mysql --version

also.
[21 Nov 2006 17:53] Jørgen Thomsen
mysql  Ver 14.12 Distrib 5.0.26, for pc-linux-gnu (i686) using readline 5.0

LANG=en_US
[22 Nov 2006 8:07] Valeriy Kravchuk
In my case it is:

openxs@suse:~/work/mysql-4.0> echo $LANG
en_US.UTF-8

So, maybe, this is the problem? Does your command line window support utf8 character set at all? Please, check.
[22 Nov 2006 17:49] Jørgen Thomsen
If it only were so simple. 
The server is making a wrong conversion and sending the wrong data to the client. Please, focus on that instead.
The result data "6b c383 c2a6 6c64 6572 2e64 6b" is not the proper utf8 representation of "kælder.dk" but "kælder.dk". The server is converting utf8 source data into utf8 result data as if the source data is latin1. 

18:39:17.335727 IP jth.net.59415 > 192.168.1.20.mysql: P 39:98(59) ack 53 win 757 <nop,nop,timestamp 316790930 220131869>
        0x0000:  4508 006f 90a4 4000 4006 2676 c0a8 0102  E..o..@.@.&v....
        0x0010:  c0a8 0114 e817 0cea d2d7 9030 c959 9dc8  ...........0.Y..
        0x0020:  8018 02f5 83c8 0000 0101 080a 12e1 d892  ................
        0x0030:  0d1e f21d 3700 0000 0353 454c 4543 5420  ....7....SELECT.
        0x0040:  7072 6f64 756b 7420 2046 524f 4d20 606b  produkt..FROM.`k
        0x0050:  756e 6465 7072 6f64 756b 7465 7260 2020  undeprodukter`..
        0x0060:  5748 4552 4520 606e 7260 203d 3139 37    WHERE.`nr`.=197
18:39:17.336504 IP 192.168.1.20.mysql > jth.net.59415: P 53:167(114) ack 98 win 46 <nop,nop,timestamp 220135772 316790930>
        0x0000:  4508 00a6 3267 4000 4006 847c c0a8 0114  E...2g@.@..|....
        0x0010:  c0a8 0102 0cea e817 c959 9dc8 d2d7 906b  .........Y.....k
        0x0020:  8018 002e e173 0000 0101 080a 0d1f 015c  .....s.........\
        0x0030:  12e1 d892 0100 0001 0146 0000 0203 6465  .........F....de
        0x0040:  6606 6a74 686e 6574 0e6b 756e 6465 7072  f.jthnet.kundepr
        0x0050:  6f64 756b 7465 720e 6b75 6e64 6570 726f  odukter.kundepro
        0x0060:  6475 6b74 6572 0770 726f 6475 6b74 0770  dukter.produkt.p
        0x0070:  726f 6475 6b74 0c21 00f0 0000 00fd 8b50  rodukt.!.......P
        0x0080:  0000 0005 0000 03fe 0000 0200 0d00 0004  ................
        0x0090:  0c6b c383 c2a6 6c64 6572 2e64 6b05 0000  .k....lder.dk...
        0x00a0:  05fe 0000 0200                           ......
[3 Jan 2007 13:35] Valeriy Kravchuk
Sorry for a delay with this bug report. I was not able to repeat the behaviour described with latest 5.0.34-BK on Linux:

openxs@suse:~/dbs/5.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.0.34-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> set names utf8;
Query OK, 0 rows affected (0.02 sec)

mysql> show variables like 'char%';
+--------------------------+--------------------------------------------+
| Variable_name            | Value                                      |
+--------------------------+--------------------------------------------+
| character_set_client     | utf8                                       |
| character_set_connection | utf8                                       |
| character_set_database   | latin1                                     |
| character_set_filesystem | binary                                     |
| character_set_results    | utf8                                       |
| character_set_server     | latin1                                     |
| character_set_system     | utf8                                       |
| character_sets_dir       | /home/openxs/dbs/5.0/share/mysql/charsets/ |
+--------------------------+--------------------------------------------+
8 rows in set (0.00 sec)

mysql> create table t24323u (produkt varchar(80) collate utf8_bin primary key);
Query OK, 0 rows affected (0.04 sec)

mysql> show full fields from t24323u;
+---------+-------------+-----------+------+-----+---------+-------+---------------------------------+---------+
| Field   | Type        | Collation | Null | Key | Default | Extra | Privileges                      | Comment |
+---------+-------------+-----------+------+-----+---------+-------+---------------------------------+---------+
| produkt | varchar(80) | utf8_bin  | NO   | PRI |         |       | select,insert,update,references |         |
+---------+-------------+-----------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.01 sec)

mysql> insert into t24323u values ('kælder.dk');
Query OK, 1 row affected (0.05 sec)

mysql> select * from t24323u;
+------------+
| produkt    |
+------------+
| kælder.dk |
+------------+
1 row in set (0.00 sec)

Please, check if my test was correct. Please, inform about any difference with your exact test case.