Bug #47192 Wrong table format when using UTF8 strings
Submitted: 8 Sep 2009 16:13 Modified: 24 Oct 2009 13:03
Reporter: Jean-Denis Muys Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:ALL OS:Any (Mac OS X 10.6)
Assigned to: Jon Stephens CPU Architecture:Any
Tags: utf8

[8 Sep 2009 16:13] Jean-Denis Muys
Description:
The mySQL client command doesn't handle formatting correctly when in presence of multi-byte characters in an UTF8 string.

This bug seems similar to, if not exactly the same as, bug report #17939 (http://bugs.mysql.com/bug.php?id=17939) which was closed in 2006. Since that bug is closed, I chose to open a new one. I apologize if this is not the correct procedure.

Here is a sample session that demonstrates the problem under Mac OS X 10.6:

JDMBook:~ jdmuys$ mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 45
Server version: 5.1.37 MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database testUtf8 default character set utf8;
Query OK, 1 row affected (0.00 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema | 
| mysql              | 
| test               | 
| testUtf8           | 
| unibase16          | 
+--------------------+
5 rows in set (0.00 sec)

mysql> use testUtf8;
Database changed
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'character_set%';
+--------------------------+-----------------------------------------------------+
| Variable_name            | Value                                               |
+--------------------------+-----------------------------------------------------+
| character_set_client     | utf8                                                | 
| character_set_connection | utf8                                                | 
| character_set_database   | utf8                                                | 
| character_set_filesystem | binary                                              | 
| character_set_results    | utf8                                                | 
| character_set_server     | latin1                                              | 
| character_set_system     | utf8                                                | 
| character_sets_dir       | /usr/local/mysql-5.1.37-osx10.5-x86/share/charsets/ | 
+--------------------------+-----------------------------------------------------+
8 rows in set (0.00 sec)

mysql> create table test (nom varchar(256) primary key, `prénom` varchar(256));
Query OK, 0 rows affected (0.40 sec)

mysql> describe test;
+---------+--------------+------+-----+---------+-------+
| Field   | Type         | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| nom     | varchar(256) | NO   | PRI | NULL    |       | 
| prénom | varchar(256) | YES  |     | NULL    |       | 
+---------+--------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> insert into test values ('Hélène', 'Davôs');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select * from test;
+-----+---------+
| nom | prénom |
+-----+---------+
| H   | Davôs  | 
+-----+---------+
1 row in set (0.00 sec)

mysq                                               ;t;
mysql> 

How to repeat:
create database testUtf8 default character set utf8;
use testUtf8;
create table test (nom varchar(256) primary key, `prénom` varchar(256));
describe test;
insert into test values ('Hélène', 'Davôs');
select * from test;

Suggested fix:
A possible cause of the bug is the use of strlen() on a utf8 string or some such. This doesn't work as some characters in utf8 encoding use more than one byte.

If that is actually the cause of the bug, then a fix would be to use a utf8-aware string length function instead.
[8 Sep 2009 17:46] Valeriy Kravchuk
I can't repeat neither that older bug not this one with recent 5.1.40 from bzr. Look:

valeriy-kravchuks-macbook-pro:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.40-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SHOW VARIABLES like 'character_set_%';
+--------------------------+---------------------------------------------+
| Variable_name            | Value                                       |
+--------------------------+---------------------------------------------+
| character_set_client     | latin1                                      |
| character_set_connection | latin1                                      |
| character_set_database   | latin1                                      |
| character_set_filesystem | binary                                      |
| character_set_results    | latin1                                      |
| character_set_server     | latin1                                      |
| character_set_system     | utf8                                        |
| character_sets_dir       | /Users/openxs/dbs/5.1/share/mysql/charsets/ |
+--------------------------+---------------------------------------------+
8 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE foo (`foobär` int);
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO foo VALUES(1);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM foo;
+---------+
| foobär |
+---------+
|       1 |
+---------+
1 row in set (0.00 sec)

mysql> create database testutf8 charset=utf8;
Query OK, 1 row affected (0.00 sec)

mysql> use testutf8;
Database changed
mysql> SHOW VARIABLES like 'character_set_%';
+--------------------------+---------------------------------------------+
| Variable_name            | Value                                       |
+--------------------------+---------------------------------------------+
| character_set_client     | utf8                                        |
| character_set_connection | utf8                                        |
| character_set_database   | utf8                                        |
| character_set_filesystem | binary                                      |
| character_set_results    | utf8                                        |
| character_set_server     | latin1                                      |
| character_set_system     | utf8                                        |
| character_sets_dir       | /Users/openxs/dbs/5.1/share/mysql/charsets/ |
+--------------------------+---------------------------------------------+
8 rows in set (0.00 sec)

mysql>  create table test (nom varchar(256) primary key, `prénom` varchar(256));Query OK, 0 rows affected (0.05 sec)

mysql> describe test;
+---------+--------------+------+-----+---------+-------+
| Field   | Type         | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| nom     | varchar(256) | NO   | PRI | NULL    |       |
| prénom | varchar(256) | YES  |     | NULL    |       |
+---------+--------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql>  insert into test values ('Hélène', 'Davôs');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test;
+----------+---------+
| nom      | prénom |
+----------+---------+
| Hélène | Davôs  |
+----------+---------+
1 row in set (0.00 sec)

What am I doing wrong? Maybe you did something wrong? Please, check.
[8 Sep 2009 22:46] Jean-Denis Muys
Well, your output actually does show the bug for example you say you get:

mysql> SELECT * FROM foo;
+---------+
| foobär |
+---------+
|       1 |
+---------+

While the expected output should be

mysql> SELECT * FROM foo;
+---------+
| foobär  |
+---------+
|       1 |
+---------+

Or for my case, you get:

mysql> select * from test;
+----------+---------+
| nom      | prénom |
+----------+---------+
| Hélène | Davôs  |
+----------+---------+
1 row in set (0.00 sec)

While the expected output should be

mysql> select * from test;
+----------+---------+
| nom      | prénom  |
+----------+---------+
| Hélène   | Davôs   |
+----------+---------+
1 row in set (0.00 sec)
[9 Sep 2009 5:22] Sveta Smirnova
Thank you for the feedback.

Have you started client with --default-character-set=utf8 as was suggested in comment "[19 Jul 2006 18:26] Timothy Smith" in bug #17939?
[9 Sep 2009 6:21] Jean-Denis Muys
I hadn't noticed that suggestion. It works. Could mySQL do better without that option? I don't know.

Maybe this should be added somewhere in the documentation, like in the installation note for MacOS X (if that issue only arises with MacOS X).

Here is a test session with the option set:

JDMBook:~ jdmuys$ mysql --default-character-set=utf8 -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 48
Server version: 5.1.37 MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table testutf8 (`prénom` varchar(256));
Query OK, 0 rows affected (0.08 sec)

mysql> describe testutf8;
+---------+--------------+------+-----+---------+-------+
| Field   | Type         | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| prénom  | varchar(256) | YES  |     | NULL    |       | 
+---------+--------------+------+-----+---------+-------+
1 row in set (0.02 sec)

mysql> insert into testutf8 values ('Hélène');
Query OK, 1 row affected (0.00 sec)

mysql> select * from testutf8;
+----------+
| prénom   |
+----------+
| Hélène   | 
+----------+
1 row in set (0.00 sec)
[9 Sep 2009 7:34] Sveta Smirnova
Thank you for the feedback.

I agree this could be added somewhere in the documentation, most likely at http://dev.mysql.com/doc/refman/5.1/en/mysql-command-options.html#option_mysql_default-cha...
[16 Sep 2009 9:18] Jon Stephens
I'll take this one.
[24 Oct 2009 11:26] Jon Stephens
I thought this would be simple, but I'm just not getting it.

Can someone please suggest more exactly what it is that I'm supposed to add or change in the documentation. Preferably by describing the issue and the resolution in 1-2 sentences?

Thanks.
[24 Oct 2009 11:46] Jon Stephens
Disregard the previous comment -- I was just being stupid.
[24 Oct 2009 13:03] Jon Stephens
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.