Bug #17939 Wrong table format when using UTF8 strings
Submitted: 5 Mar 2006 15:57 Modified: 14 Sep 2006 5:48
Reporter: Christian Hammers (Silver Quality Contributor)
Status: Closed
Category:Server: Charsets Severity:S3 (Non-critical)
Version:5.0 OS:Linux (Debian GNU/Linux Sid)
Assigned to: Alexander Barkov Target Version:

[5 Mar 2006 15:57] Christian Hammers
Description:
As reported in http://bugs.debian.org/355302 by Daniel van Eeden <daniel_e@dds.nl>:
------------------------

> > mysql> SELECT 'John Doe' as '__tañgè Ñãmé';
> > +-------------------+
> > | __tañgè Ñãmé |
> > +-------------------+
> > | John Doe          |
> > +-------------------+
> > 1 row in set (0.00 sec)

The column name is wrongly formatted. Most probably because in Unicode UTF-8 the site is
not
equal to the number of bytes the string.

Further info from Daniel:

# grep character-set-server /etc/mysql/my.cnf
character-set-server    = utf8
# logout
$ grep default-character-set .my.cnf
default-character-set=utf8
$ echo ${LC_ALL}
en_US.utf8
$ locale -a | grep "en_US.utf8"
en_US.utf8
$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9 to server version: 5.0.18-Debian_8-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SHOW VARIABLES like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
7 rows in set (0.01 sec)

How to repeat:
see above

Suggested fix:
In case we did not overlook any config option and this is really a bug maybe use wcwidth()
etc instead of strlen().
[6 Mar 2006 12:46] Hartmut Holzgraefe
Verified on 5.0 and 5.1, (i think it is a duplicate but i can't find the original report)

SET NAMES utf8;
CREATE TABLE foo (`foobär` int);
INSERT INTO foo VALUES(1);
SELECT * FROM foo;

+---------+
| foobär |
+---------+
|       1 |
+---------+
1 row in set (0.01 sec)
[17 Apr 2006 9:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/5004
[5 Jul 2006 23:45] Tim Smith
Hmm, I looked at this more, and I think the patch is incomplete.  It does fix the problem
w/ column name, but the analogous problem exists with the column values as well.  Use a
test case like:

drop table t1;
set names utf8;
create table t1 (fööbâr varchar(32));
insert into t1 values ('áéíóú');
select * from t1;

I get these results:

mysql> select * from t1;
+------------+
| fööbâr  |
+------------+
| áéíóú |
+------------+
1 row in set (0.00 sec)

I.e., field->maxlength is not reliable, and needs a ->numcells call to generate correct
output.

Timothy
[19 Jul 2006 18:26] Tim Smith
Alexander,

Thanks for clarifying that the client should be started with --default-character-set=utf8;
with that, then the patch does work as described.

Regards,

Timothy
[2 Sep 2006 11:28] Tim Smith
Merged to 5.0 (will be in 5.0.25)

TODO: add test case, and merge to 5.1
[12 Sep 2006 3:34] Paul DuBois
Noted in 4.1.12, 5.0.25 changelogs.

For table-format output, mysql did not always calculate columns
widths correctly for columns containing multi-byte characters in the
column name or contents.
[14 Sep 2006 5:48] Paul DuBois
Noted in 5.1.12 changelog.