Bug #10377 mySQL 4.1.11 handling of Korean Characters in UTF 8 encoding
Submitted: 5 May 2005 8:46 Modified: 6 May 2005 6:40
Reporter: William Chung Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.11 OS:Windows (Windows XP SP2)
Assigned to: CPU Architecture:Any

[5 May 2005 8:46] William Chung
Description:
My test environment is running on a Pentium 4 with Windows XP SP2, IIS6.0, PHP 5.03, Postnuke.

1) I was looking at upgrading my version of mySQL from 4.0.20 to a newer version.

2) In my tests to upgrade, I backed up my PostNuke portal data using mySQL Administrator 1.0.19.  I used character set UTF-8 as the character set for the PostNuke Site.  The site had no problems receiving or displaying Korean characters in UTF8 mode.  

3) Then, I installed 4.1.11.  I used UTF8 as the default character set.  Anyways, for some reason the Korean text did not restore properly from the backup .sql file and showed up as question marks on the browser.   I have backed up and restored from the .sql files in 4.0.20 with no problems, but had no luck restoring the Korean Text data into 4.1.11.  Interestingly, if you look at the restored data in mySQL Query Analyzer, it displays the Korean text correctly (The Korean text was: 아이구).  However, when pulled from the database via PHP v5.03 and displayed to the browser it shows up as ???.   

4) Aside from not restoring UTF8 encoded Korean text from mySQL v4.0 correctly, I tried inserting new Korean text into the database via the PHP PostNuke interface.  This time I had only partial success.  Some of the characters didn't map correctly.

아이구.  잘됄건지모라갯다.
became
아쿴구. 잘뿄건지모뿼갯다.

It basically appears that the character set mapping is off on some characters, particularly 이  and 라 in the sample above.    

How to repeat:
1) To test out the restore, you'd have to install mySQL 4.0 and then create a table and place some Korean text into it.
2) Then back it up with mySQL Administrator.
3) Delete the schema, and restore it as a test.  
4) Note that it works in 4.0.
5) Uninstall 4.0 and install 4.1 or move the file to a computer with 4.1 installed
6) Restore the database
7) Look at the Korean text via IE6.0 using a web server and script.
8) Note that Korean text appears as ????

TEST FOR KOREAN TEXT ENTERED INTO DB
To demonstrate that Korean text encoded in UTF 8 does not map quite right in mysql 4.1, I've included some php script.

//If you don't have a test database run the following SQL statements:
//Create Database Test;
//use test;
//CREATE TABLE tbl_test (strTEXT VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin);

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Test mysql's ability to handle Korean text using UTF 8</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>

<body>
<?php
//If you don't have a test database run the following SQL statements:
//Create Database Test;
//use test;
//CREATE TABLE tbl_test (strTEXT VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin);

// Connecting, selecting database
$link = mysql_connect('localhost', 'root', 'password')
   or die('Could not connect: ' . mysql_error());
echo 'Connected successfully';
mysql_select_db('test') or die('Could not select database');

//Insert data
$query= "Insert into tbl_test (strText) values ('This is a test of UTF 8.  아이구.  잘됄건지모라갯다.')";
$result=mysql_query($query) or die('Query Failed: ' . mysql_error());

// Performing SQL query
$query = 'SELECT * FROM tbl_test';
$result = mysql_query($query) or die('Query failed: ' . mysql_error());

// Printing results in HTML
echo "<table>\n";
while ($line = mysql_fetch_array($result, MYSQL_ASSOC)) {
   echo "\t<tr>\n";
   foreach ($line as $col_value) {
       echo "\t\t<td>$col_value</td>\n";
   }
   echo "\t</tr>\n";
}
echo "</table>\n";

// Free resultset
mysql_free_result($result);

// Closing connection
mysql_close($link);
?>  

</body>
</html>
[5 May 2005 21:55] Geert Vanderkelen
Hi William,

Please look at this link. 
http://dev.mysql.com/doc/mysql/en/charset-connection.html

I'm not smelling a bug here, but you could prove me wrong ofcourse :)
 
Thanks,

Geert
[6 May 2005 2:29] William Chung
Running the command:

SET NAMES 'utf8'

 made things work again!

Is there a way to make the database server automatically set names to utf8 by default so I don't have to modify the PostNuke pndbinit function to execute the SET NAMES 'utf8' command?
[6 May 2005 6:40] Geert Vanderkelen
We're sorry, but the bug system is not the appropriate forum for 
asking help on using MySQL products. Your problem is not the result 
of a bug.

Support on using our products is available both free in our forums
at http://forums.mysql.com and for a reasonable fee direct from our
skilled support engineers at http://www.mysql.com/support/

Thank you for your interest in MySQL.
[9 May 2005 0:18] William Chung
Thanks, anyway.  I suppose I'll post to the forums then.