Bug #30250 Too wide fields when using UTF8 strings
Submitted: 6 Aug 2007 9:07 Modified: 14 Nov 2007 10:07
Reporter: Dmitry Samborskiy Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S4 (Feature request)
Version:5.0.45 OS:Linux
Assigned to: Assigned Account CPU Architecture:Any

[6 Aug 2007 9:07] Dmitry Samborskiy
Description:
Though one UTF-8 bug is resolved:

Bug #17939  	Wrong table format when using UTF8 strings

I still observe something strange with UTF-8 output on the mysql console.
Apparently, fields length is calculated in a wrong way:

mysql> SELECT 'Hello World!';
+--------------+
| Hello World! |
+--------------+
| Hello World! | 
+--------------+
1 row in set (0.00 sec)

# That's fine. But ('Привет!' == 'Hi!' in Russian):

mysql> SELECT 'Привет!';
+---------------+
| Привет!       |
+---------------+
| Привет!       | 
+---------------+
1 row in set (0.00 sec)

# Extra space may be annoying when fields are long:

mysql> SELECT 'Привет! Привет! Привет!';
+-------------------------------------------+
| Привет! Привет! Привет!                   |
+-------------------------------------------+
| Привет! Привет! Привет!                   | 
+-------------------------------------------+
1 row in set (0.00 sec)

Some info:

# rpm -q mysql mysql-server
mysql-5.0.27-1.fc6
mysql-server-5.0.27-1.fc6

# grep character-set-server /etc/my.cnf
character-set-server = utf8

mysql> SHOW VARIABLES like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | utf8                       | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

How to repeat:
See above.

Suggested fix:
Don't know.
[6 Aug 2007 9:11] Sveta Smirnova
Thank you for the report.

But version 5.0.27 is quite old. Additionally I can not repeat described behaviour in my environment with current sources:

mysql>  SELECT 'Привет!';
+---------------+
| Привет! |
+---------------+
| Привет! | 
+---------------+
1 row in set (0.00 sec)

Please upgrade to current version 5.0.45, try with it and say us result.
[6 Sep 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[7 Sep 2007 8:21] Dmitry Samborskiy
With v.5.0.45 it's the same:

mysql> SELECT 'Привет!';
+---------------+
| Привет!       |
+---------------+
| Привет!       | 
+---------------+
1 row in set (0.00 sec)

mysql> SELECT 'Привет! Привет! Привет!';
+-------------------------------------------+
| Привет! Привет! Привет!                   |
+-------------------------------------------+
| Привет! Привет! Привет!                   | 
+-------------------------------------------+
1 row in set (0.00 sec)

Some info:

# echo $LANG
ru_RU.UTF-8

# rpm -q mysql mysql-server
mysql-5.0.45-1.fc6.remi
mysql-server-5.0.45-1.fc6.remi

# grep character-set-server /etc/my.cnf
character-set-server = utf8

mysql> SHOW VARIABLES like 'character_set_%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | utf8                       | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
[7 Sep 2007 8:28] Dmitry Samborskiy
P.S.

> But version 5.0.27 is quite old. Additionally I can not repeat 
> described behaviour in my
> environment with current sources:
>
> mysql>  SELECT 'Привет!';
> +---------------+
> | Привет! |
> +---------------+
> | Привет! | 
> +---------------+
> 1 row in set (0.00 sec)

In your case formatting is also broken. I think I also saw it until I set
character code settings properly. Check your settings:

# grep character-set-server /etc/my.cnf
character-set-server = utf8

mysql> SHOW VARIABLES like 'character_set_%';
...
[7 Sep 2007 8:42] Sveta Smirnova
screentshot

Attachment: bug30250.jpg (image/jpeg, text), 61.76 KiB.

[7 Sep 2007 8:43] Sveta Smirnova
Thank you for the feedback.

In my case HTML formatting corrupted mysql output. See attached screenshot.

Please indicate accurate version of MySQL package you use (file name).
[7 Sep 2007 8:47] Sveta Smirnova
If I understood correctly from your last comment you don't use our binaries. If so please try ours to find if problem is on our or on provider of mysql-5.0.45-1.fc6.remi binaries side.
[12 Sep 2007 6:25] Dmitry Samborskiy
> If I understood correctly from your last comment you don't use our binaries. 
> If so please try ours to find if problem is on our or on provider 
> of mysql-5.0.45-1.fc6.remi binaries side.

OK. On my other site (Fedora Core 4) I had to install 
RPMS built by MySQL AB.

But I see there exactly the same behavior (see the screenshot that I'll
attach.

The packages are:

# rpm -qa | grep MySQL | xargs rpm -qi

Name        : MySQL-server                 Relocations: (not relocatable)
Version     : 5.0.45                            Vendor: MySQL AB
Release     : 0.glibc23                     Build Date: Thu 05 Jul 2007 02:09:15 AM MSD
Install Date: Sat 08 Sep 2007 03:08:16 AM MSD      Build Host: rhas3-x86.mysql.com
Group       : Applications/Databases        Source RPM: MySQL-5.0.45-0.glibc23.src.rpm
Size        : 32779770                         License: GPL
Signature   : DSA/SHA1, Tue 10 Jul 2007 08:03:21 PM MSD, Key ID 8c718d3b5072e1f5
Packager    : MySQL Production Engineering Team <build@mysql.com>
URL         : http://www.mysql.com/
Summary     : MySQL: a very fast and reliable SQL database server
Description :
The MySQL(TM) software delivers a very fast, multi-threaded, multi-user,
and robust SQL (Structured Query Language) database server. MySQL Server
is intended for mission-critical, heavy-load production systems as well
as for embedding into mass-deployed software. MySQL is a trademark of
MySQL AB.

Copyright (C) 2000-2007 MySQL AB
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL license.

The MySQL web site (http://www.mysql.com/) provides the latest
news and information about the MySQL software. Also please see the
documentation and the manual for more information.

This package includes the MySQL server binary (incl. InnoDB) as well
as related utilities to run and administrate a MySQL server.

If you want to access and work with the database, you have to install
the package "MySQL-client" as well!
Name        : MySQL-client                 Relocations: (not relocatable)
Version     : 5.0.45                            Vendor: MySQL AB
Release     : 0.glibc23                     Build Date: Thu 05 Jul 2007 02:09:15 AM MSD
Install Date: Sat 08 Sep 2007 03:08:10 AM MSD      Build Host: rhas3-x86.mysql.com
Group       : Applications/Databases        Source RPM: MySQL-5.0.45-0.glibc23.src.rpm
Size        : 712414                           License: GPL
Signature   : DSA/SHA1, Tue 10 Jul 2007 07:57:24 PM MSD, Key ID 8c718d3b5072e1f5
Packager    : MySQL Production Engineering Team <build@mysql.com>
URL         : http://www.mysql.com/
Summary     : MySQL - Client
Description :
This package contains the standard MySQL clients and administration tools.

For a description of MySQL see the base MySQL RPM or http://www.mysql.com
Name        : MySQL-shared                 Relocations: (not relocatable)
Version     : 5.0.45                            Vendor: MySQL AB
Release     : 0.glibc23                     Build Date: Thu 05 Jul 2007 02:09:15 AM MSD
Install Date: Sat 08 Sep 2007 03:08:09 AM MSD      Build Host: rhas3-x86.mysql.com
Group       : Applications/Databases        Source RPM: MySQL-5.0.45-0.glibc23.src.rpm
Size        : 3548552                          License: GPL
Signature   : DSA/SHA1, Tue 10 Jul 2007 08:05:05 PM MSD, Key ID 8c718d3b5072e1f5
Packager    : MySQL Production Engineering Team <build@mysql.com>
URL         : http://www.mysql.com/
Summary     : MySQL - Shared libraries
Description :
This package contains the shared libraries (*.so*) which certain
languages and applications need to dynamically load and use MySQL.
Name        : MySQL-devel                  Relocations: (not relocatable)
Version     : 5.0.45                            Vendor: MySQL AB
Release     : 0.glibc23                     Build Date: Thu 05 Jul 2007 02:09:15 AM MSD
Install Date: Sat 08 Sep 2007 03:08:12 AM MSD      Build Host: rhas3-x86.mysql.com
Group       : Applications/Databases        Source RPM: MySQL-5.0.45-0.glibc23.src.rpm
Size        : 17231601                         License: GPL
Signature   : DSA/SHA1, Tue 10 Jul 2007 08:02:21 PM MSD, Key ID 8c718d3b5072e1f5
Packager    : MySQL Production Engineering Team <build@mysql.com>
URL         : http://www.mysql.com/
Summary     : MySQL - Development header files and libraries
Description :
This package contains the development header files and libraries
necessary to develop MySQL client applications.

For a description of MySQL see the base MySQL RPM or http://www.mysql.com
[12 Sep 2007 6:27] Dmitry Samborskiy
Screenshot of gterm / FC4 / mysql RPMS from MySQL AB

Attachment: mysql-utf8-fmt.png (image/png, text), 36.94 KiB.

[25 Sep 2007 11:49] MySQL Verification Team
Thank you for the bug report. Verified on FC 6.0
[12 Oct 2007 13:55] Konstantin Osipov
Bar, it's very likely this is already fixed in .50. Could you please re-verify?
Thanks a lot.
[14 Nov 2007 7:37] Sveta Smirnova
There is similar bug #32271 repeatable with 5.0.50
[14 Nov 2007 10:07] MySQL Verification Team
Still happens on FC 6.0:

[miguel@skybr 5.0]$ bin/mysql -uroot --default-character-set=utf8
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.52-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SHOW VARIABLES like 'character_set_%';
+--------------------------+--------------------------------------------+
| Variable_name            | Value                                      |
+--------------------------+--------------------------------------------+
| character_set_client     | utf8                                       | 
| character_set_connection | utf8                                       | 
| character_set_database   | utf8                                       | 
| character_set_filesystem | binary                                     | 
| character_set_results    | utf8                                       | 
| character_set_server     | utf8                                       | 
| character_set_system     | utf8                                       | 
| character_sets_dir       | /home/miguel/dbs/5.0/share/mysql/charsets/ | 
+--------------------------+--------------------------------------------+
8 rows in set (0.02 sec)

mysql> SELECT 'Hello World!';
+--------------+
| Hello World! |
+--------------+
| Hello World! | 
+--------------+
1 row in set (0.00 sec)

mysql> SELECT 'Привет!';
+---------------+
| Привет!       |
+---------------+
| Привет!       | 
+---------------+
1 row in set (0.00 sec)

mysql> SELECT 'Привет! Привет! Привет!';
+-------------------------------------------+
| Привет! Привет! Привет!                   |
+-------------------------------------------+
| Привет! Привет! Привет!                   | 
+-------------------------------------------+
1 row in set (0.00 sec)

mysql>
[11 Jan 2008 13:46] Alexander Barkov
Full-width characters example

Attachment: fullwidth.jpg (image/jpeg, text), 40.59 KiB.

[11 Jan 2008 14:12] Alexander Barkov
"Privet" with --column-type-info

Attachment: fullwidth2.jpg (image/jpeg, text), 55.54 KiB.

[11 Jan 2008 14:24] Alexander Barkov
There is a problem which prevents to fully fix this bug.

Some characters can occupy two screen cells.
They are so called "full width forms". For example:
U+FF30 "FULLWIDTH LATIN CAPITAL LETTER P" uses two cells.

Please take a look into the screen-shot I attached.

There's no a way to quickly convert number-of-bytes (Max_length)
to number-of-screen-cells.

In utf8, these combinations are possible:

One-byte character can occupy one screen cell.
Two-byte character can occupy one screen cell.
Three-byte character can occupy one or two screen cells.

The only possible way to calculate exact table width for a result -
is to loop through all of the rows returned in MYSQL_RESULT and
calculate number of cells for each individual row, and remember
the maximum value.

Then we should return back to the very first row and run the second
loop, to do actual displaying of the results, padding short strings
with spaces, up to the maximum width found on the first loop.

Unfortunately, this double loop will slow down "mysql".

Possibly it should be an optional new feature,
with is disable by default, something like:

"mysql --nice-tables"
[11 Jan 2008 14:31] Alexander Barkov
Note, some other databases do not optimize trailing spaces as well.
For example, Oracle's "sqlplus" always allocates 18 screen cells
for a string consisting of 6 characters. 

[bar@bar:~]$NLS_LANG=American_America.UTF8 sqlplus

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jan 11 18:22:58 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Enter user-name: system
Enter password:

Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> select unistr('\0410\0410\0410\0410\0410\0410') from dual;

UNISTR('\0410\0410
------------------
АААААА

SQL> select unistr('AAAAAA') from dual;

UNISTR('AAAAAA')
------------------
AAAAAA