Bug #4988 Japanese text gets truncated when inserted into utf8 column
Submitted: 11 Aug 2004 10:03 Modified: 25 Aug 2004 6:14
Reporter: Joseph Waller Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.3b-beta OS:Windows (Windows XP SP1)
Assigned to: Alexander Barkov CPU Architecture:Any

[11 Aug 2004 10:03] Joseph Waller
Description:
Japanese texts are trucated and removed from character strings being inserted into a UTF8 column. This did not occur in previous developement with version 4.1.1a-alpha.

How to repeat:
Here is copy of the text from the client window i used to reproduce this error.

mysql> create table w
    -> (
    ->   a char(50),
    ->   b char(50)
    -> );
Query OK, 0 rows affected (0.11 sec)

mysql> status;
--------------
mysql  Ver 14.5 Distrib 4.1.3b-beta, for Win95/Win98 (i32)

Connection id:          2
Current database:       jhotdata
Current user:           root@localhost
SSL:                    Not in use
Using delimiter:        ;
Server version:         4.1.3b-beta
Protocol version:       10
Connection:             localhost via TCP/IP
Client characterset:    utf8_general_ci
Server characterset:    utf8_general_ci
TCP port:               3306
Uptime:                 10 min 17 sec

Threads: 2  Questions: 77  Slow queries: 0  Opens: 9  Flush tables: 1  Open tables: 2  Queries per second avg: 0.125
--------------

mysql> show variables like 'character%';
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_client     | utf8                     |
| character_set_connection | utf8                     |
| character_set_database   | utf8                     |
| character_set_results    | utf8                     |
| character_set_server     | utf8                     |
| character_set_system     | utf8                     |
| character_sets_dir       | C:\mysql\share\charsets/ |
+--------------------------+--------------------------+
7 rows in set (0.00 sec)

mysql> insert into w (a,b) values ('English', '日本語');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+----------------------------------------+
| Level   | Code | Message                                |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'b' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)

mysql> select * from w;
+---------+------+
| a       | b    |
+---------+------+
| English |      |
+---------+------+
1 row in set (0.00 sec)

mysql> show full columns from w;
+-------+----------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type     | Collation       | Null | Key | Default | Extra | Privileges                      | Comment |
+-------+----------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| a     | char(50) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
| b     | char(50) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+-------+----------+-----------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)
[11 Aug 2004 21:00] Shuichi Tamagawa
I have the same problem on both Win XP SP1 and SuSE Linux 9.0.
[23 Aug 2004 2:50] Joseph Waller
Any luck finding a work-around?
[24 Aug 2004 11:04] Alexander Barkov
What client program do you use?

Can you please run this query:

SELECT HEX('the same string with Japanese characters you're trying to insert');

Thank you!
[25 Aug 2004 3:03] Joseph Waller
The test case I provided was done using the mysql.exe binary that was packaged with the 4.1.3b-beta server distribution (Windows). I did, however, get the exact same results when using Connector/ODBC (3.51.07). Here are the results from the queries you requested:

mysql> SELECT HEX('日本語');
+---------------+
| HEX('日本語') |
+---------------+
| 93FA967B8CEA  |
+---------------+
1 row in set (0.00 sec)

mysql> insert into w (a,b) values ('English', '日本語');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+----------------------------------------+
| Level   | Code | Message                                |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'b' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)

mysql> select * from w;
+---------+------+
| a       | b    |
+---------+------+
| English |      |
+---------+------+
1 row in set (0.00 sec)

It seems to behave this way for all Japanese inserts, not just the string I attempted. Here is another example.

mysql> SELECT HEX('あいうえお');
+----------------------+
| HEX('あいうえお')    |
+----------------------+
| 82A082A282A482A682A8 |
+----------------------+
1 row in set (0.00 sec)

mysql> insert into w (a,b) values ('abcde', 'aiueoあいうえお');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+----------------------------------------+
| Level   | Code | Message                                |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'b' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)

mysql> select * from w;
+--------------------+-------+
| a                  | b     |
+--------------------+-------+
| English            |       |
| abcde              | aiueo |
+--------------------+-------+
3 rows in set (0.00 sec)
[25 Aug 2004 6:14] Alexander Barkov
Sorry, but the bug system is not the appropriate forum for asking
support questions. Your problem is not the result of a bug.
For a list of more appropriate places to ask for help using MySQL
products, please visit http://www.mysql.com/support/

Thank you for your interest in MySQL.

Additional info:

This is not a bug. You use SJIS character set in the client programs, not UTF8.
But the server doesn't know about it. To make this work you should run
either "SET CHARACTER SET sjis;"  or "SET NAMES sjis;" in the beginning
of the sessions.

They are a bit different, please refer to our manual to choose the best for you.
But I think the first one should be better in your case.
[25 Aug 2004 12:59] Joseph Waller
My apologies.