Bug #6381 Doesn't support Chinese
Submitted: 2 Nov 2004 8:59 Modified: 3 Dec 2004 12:48
Reporter: Juliau Gong Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Query Browser Severity:S2 (Serious)
Version:1.1.1 gamma OS:Windows (WinXP Pro SP2)
Assigned to: Mike Lischke CPU Architecture:Any

[2 Nov 2004 8:59] Juliau Gong
Description:
the result can't display Chinese.

How to repeat:
when the result contains Chinese, it can be displayed.
[3 Nov 2004 14:09] Michael G. Zinner
Which version of the MySQL sever are you using? Have you choosen fonts that can display Chinese characters, like MS Arial Unicode?
[4 Nov 2004 9:10] Juliau Gong
I use MySQL Server 4.1.7, and I am sure that I used the correct font, and the MySQL Control Center can display Chinese correctly.

By the way, the MySQL Connector/Net doesn't support Chinese also.
[5 Nov 2004 14:09] Mike Lischke
Can you enter chinese characters into the script editor in Query Browser? Might be you have to download the latest version (1.1.0) to get chinese input enabled.
[15 Nov 2004 10:10] Mike Lischke
Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.
[26 Nov 2004 6:15] Juliau Gong
Dear Michael G. Zinner,

I have download the latest version 1.1.1, but still can't input Chinese, I have set the correct font.

when I input Chinese in the script editor, the Chinese character only can be display a half.
[26 Nov 2004 7:49] Mike Lischke
Dear Juliau,

Would it be possible for you to attach a screen shot to this bug so I can see how it looks. Addtionally, could you please a step-by-step description how to reproduce the problem? That should include what you do, what you expect exactly and what indeed happens exactly.

Thank you,

Mike
[26 Nov 2004 8:40] Juliau Gong
can't display Chinese

Attachment: mysql.jpg (image/pjpeg, text), 133.48 KiB.

[26 Nov 2004 8:44] Juliau Gong
Dear Mike Lischke,

I have uploaded the screen shot, and I found I can enter the Chinese to script editor, but when the result include Chinese, it can't be displayed.
[29 Nov 2004 10:56] Mike Lischke
From the screenshot I gather that the text is already in the database. So I wonder how did you get this in there. Have you made a database upgrade? From what I can see it looks very much like the encoding for the text is messed up.

Could you please dump this particular table (which is seen in the screenshot) and attach the dump to this bug entry too?

Mike
[29 Nov 2004 13:20] Juliau Gong
the dump script for the table

Attachment: user.sql (text/plain), 5.25 KiB.

[29 Nov 2004 13:24] Juliau Gong
Dear Mike Lischke,

I have uploaded the dump file for that table, and the database doesn't have been upgraded, it's a new database under MySQL Server 4.1.7, and I can show, edit it in MyCC correctly, but with MySQL Query Browser, it can't be displayed correctly.

Thanks.

Juliau
[29 Nov 2004 23:02] Arik Kfir
Hello everyone,

I've encountered the same problem, only with Hebrew. 

Server software:
Red hat 8 linux, with hebrew codepage support (standard)
MySQL 4.1.7

Database configuration:
InnoDB storage with default encoding set to utf8

Client software:
Windows XP SP2
Region & locale set to Hebrew

Test case 1 - JDBC (written using IntelliJ - just in case you want to REALLY reproduce..):
1. Small Java program which inserts a record with unicode hebrew characters into the db
2. The program then retrieves that same record and displays it on screen
3. Program also displays literal hebrew characters
Result:
literal characters displayed correctly
retrieved data displays "?" instead of hebrew characters.

Test case 2 - using Query Browser 1.1.1:
1. Configure query browser to use hebrew fonts (tried David, Arial and Tahoma with script set to hebrew)
2. Insert record into table with hebrew characters - characters displayed CORRECTLY on screen (font supports hebrew)
3. Update to database
4. Re-fetch record
result: query browser displays "?" instead of hebrew data

Here is an output of the SHOW CREATE DATABASE:
mysql> show create database homedvlp;
+----------+-------------------------------------------------------------------+
| Database | Create Database                                                   |
+----------+-------------------------------------------------------------------+
| homedvlp | CREATE DATABASE `homedvlp` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+-------------------------------------------------------------------+
1 row in set (0.00 sec)

Here is an output of the SHOW CREATE TABLE (table users is my table, not the one from mysql db):
mysql> show create table users;
| users | CREATE TABLE `users` (
  `id` int(11) NOT NULL auto_increment,
  `username` varchar(45) NOT NULL default '',
  `password` varchar(45) NOT NULL default '',
  `display_name` varchar(45) NOT NULL default '',
  `version` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

I removed the "----" and such to save space (they're long..)

If there's more information you need - please let me know...

10x!
[29 Nov 2004 23:22] Arik Kfir
I also tried MyCC (for testing purposes) and like Juliau says, MyCC does work, but I suspect it does it in a strange manner. When I insert a record in MyCC with Hebrew data, and then refetch that record, it works. But, only in MyCC. If I query the same record (inserted with MyCC) in either JDBC or Query Browser, I don't see question marks ("?") anymore, but some garbled data.

I wonder - is there some configuration parameter one should set when connecting to a MySQL db using JDBC? I suspect MyCC does so - but not the right parameter. I think MyCC uses the locale of the system (windows-1255 for hebrew in my case) instead of Unicode. It probably reads it using that same locale so it works for MyCC but not for unicode clients such as the JDBC driver. Does query-browser do anything like that?
[29 Nov 2004 23:32] Arik Kfir
I've found a workaround for using JDBC (or perhaps I should say the solution) - Juliau - see if you can reproduce this:

When connecting using JDBC, I added the properties 'useUnicode' and 'characterEncoding' to the JDBC URL like this: "&useUnicode=true&characterEncoding=UTF-8" to the end of the JDBC URL and it solved the problem when using JDBC.

My guess is that Query Browser does NOT do this, and hence, the question marks in Query Browser. Could someone from querybrowser dev team confirm this?
[30 Nov 2004 5:47] Juliau Gong
Dear Arik,

I think so, but I don't know how to use JDBC to connect MySQL, can you give me a small sample program to test.

thanks.
[30 Nov 2004 5:49] Juliau Gong
tested with the latest version
[30 Nov 2004 10:05] Mike Lischke
Juliau,

Your case seems to be caused by the fact that you are using latin1 as the charset for your table. Actually the content seems to be an ANSI encoding (e.g. Big 5, GB...). In the best case this can be read only on a system supporting the exact same encoding that was used to store the text. Can you recreate the table with utf8 and insert your data again? It should work then.

Arik,

In your case I have no idea what is wrong. I can successfully create the table as you have given it and inserted hebrew text (using the built-in IME of XP). The text was stored and retrieved correctly again as far as I can see. 

For MyCC: it uses for sure the current system locale to store text data as ANSI and reads it back with the same encoding.

Could you please dump a table too, so I can see if the data is correct? Make sure you export it in utf8 encoding.

Mike
[1 Dec 2004 1:42] Juliau Gong
Mike,

I have recreated a table with utf8, and the bug is still here, I have uploaded the dump file of the database.

thanks.
[1 Dec 2004 1:43] Juliau Gong
dump file with Chinese

Attachment: mysql.sql (text/plain), 1.08 KiB.

[1 Dec 2004 2:18] Juliau Gong
Mike,

I can display some Chinese that is inserted into the table with MySQL Connector .Net as UTF8, but when I insert the Chinese with the sql editor, it's still can't be displayed.

Can the Query Browser do as the MyCC does? It will be more convenient.
[1 Dec 2004 3:54] Juliau Gong
MySQL Query Browser can only display the Chinese with utf8 encoding, can't display other, and can't edit, insert the Chinese.
[1 Dec 2004 13:36] Michael G. Zinner
Arik Kfir,

are you specifying your `homedvlp` database in the connection dialog of QB? There is a know issue of MySQL 4.1.7 that all characters inserted by any client have to be in the collation of the "default database" if the client uses SET CHARACTER SET xxx;

That means, if you have specified no or e.g. the `test` database as Schema in the connection dialog (below the Port), you will only be able to insert latin1 characters, since the `test` database.

All characters that are not in the collation of the "default database" will be converted to ?.

Could you try this: 
* Specify your homedvlp database in the connection dialog. 
* Then write an update statement with hebrew characters in the query area and press execute.
* Select the data back. You should be able to see the hebrew characters.
* Press the edit button and change a value in the result set grid. Press apply changes. Refresh the query. You should be able to see the hebrew characters.

Please report if this fixes the problem for you.

There will be a fix for this problem in MySQL 4.1.8 and 5.0.2, but we should try to get it work on 4.1.7 and below.

Thanks a lot,
Mike Zinner
[1 Dec 2004 15:23] Juliau Gong
Mike,

When I set the Server's default Character Set to utf8, the problem is fixed, but I can't see the content from other programs that don't run as utf8, such as mysql.exe, because it run as the Windows Locale encoding.

I think this is a bug.

Juliau
[3 Dec 2004 11:22] Arik Kfir
Hi again,

As it turns out, the problem is apparently somewhat my fault - indeed the table uses utf8, but the "character_set_server" variable in the server was set to "latin1" - fixing that solved my problem. I'd recommend adding a FAQ entry about this in the MySQL documentation - I'm sure many others will puzzle over this.

10x for all your help!

(P.S, sorry for late response - I was away for a few days)
[3 Dec 2004 11:25] Arik Kfir
Whoops - I'm not sure which one I changed...LOL - it was either "character_set_server" or "character_set_system". Anyway - it did the trick...

cheers!
[3 Dec 2004 12:48] Michael G. Zinner
Juliau,

you are correct. This *is* still an issue. But it is actual an issue for every tool that uses UTF8 internally - so i guess it is rather a server issue.

Therefore we discussed this with the server team and got a solution from them. In 4.1.8 and 5.0.2 the server automatically converts utf8 strings to the needed character set/collation if possible. So we can switch to SET NAMES utf8; instead of SET CHARACTER SET utf8; in the GUI tools.

SET NAMES utf8; doesn't consider the server's default setting anymore and therefore this solution will work in all cases.

We will include this fix as soon as the 4.1.8 server is out and if 5.0.2 is fixed. I will close this bug for now.

Thanks for the detailed bug reports.