Bug #34051 I have some problems using character sets.
Submitted: 25 Jan 2008 5:11 Modified: 30 Apr 2008 10:37
Reporter: Michael Jordan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Compiling Severity:S2 (Serious)
Version:5.0.45 OS:Linux
Assigned to: CPU Architecture:Any
Tags: character set, euckr

[25 Jan 2008 5:11] Michael Jordan
Description:
Dear~~~
Sometimes I have some problems when using a Character set which can express Korean Characters,which is euckr.

Qustion 1
I have two cases to show you difference between version 3.X version 5.X

In version 5.X some characters can't be inserted correctly. The characters which can't be expressed in euckr are removed. - case1
Meanwhile in version 3.X the same characters in case1 are transformed into unicode and inserted without loss of data. -case2

The Characters with problems are like these- '뷁', '솁'. I wish you could see these characters on web.

Question2
When we use version 5.X, we can define character sets for server, client, table and etc respectively.
In case I define different character sets for respecitve part, I wonder how MySQL operates internally.

Example)
I define character sets like below

[client] default-character-set=euckr
[mysqld] default-character-set=euckr
[mysql] default-character-set=euckr

and create a table named 'test', whose character set is utf8.

In case I insert euckr data into table 'test', are these data saved after encoding data to utf8?
And when I select those data, does mysql server decode those data from utf8 to euckr?

I Appreciate Your Concern.

How to repeat:
About some characters like '솁', '뷁', always repeat.
[25 Jan 2008 5:46] Michael Jordan
ddddd
[25 Jan 2008 8:51] Sveta Smirnova
We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not the result of a bug.

Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/

Thank you for your interest in MySQL.

Case 1 can be bug, but we need repeatable test case to check this. If you can provide example of SQL code showing the problem which we can paste into mysql command line client feel free to reopen the report.
[25 Jan 2008 9:30] Michael Jordan
http://img.empas.com/sample1.jpg

Thanks for your quick reply.
You can understand the situation roughly referring to the image above...
[25 Jan 2008 10:00] Sveta Smirnova
Thank you for the feedback.

But version 5.0.37 is quite old and at least one character set bug affected Korean was fixed since. Please upgrade to current verion 5.0.45, try with it and if problem still exists provide output of SHOW VARIABLES LIKE '%char%';  and SHOW VARIABLES LIKE '%coll%';
[28 Jan 2008 1:53] Michael Jordan
I'm sorry that I have same problem when using 5.0.45.

mysql> select version();
+----------------------+
| version()            |
+----------------------+
| 5.0.45-community-log |
+----------------------+
1 row in set (0.00 sec)
[28 Jan 2008 8:50] Sveta Smirnova
Please provide output of SHOW VARIABLES LIKE '%char%';  and SHOW VARIABLES LIKE '%coll%';
[28 Jan 2008 11:14] Michael Jordan
Sorry, I omitted them.

mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | euckr                      |
| character_set_connection | euckr                      |
| character_set_database   | euckr                      |
| character_set_filesystem | binary                     |
| character_set_results    | euckr                      |
| character_set_server     | euckr                      |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> SHOW
    -> VARIABLES LIKE '%coll%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | euckr_korean_ci |
| collation_database   | euckr_korean_ci |
| collation_server     | euckr_korean_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
[30 Jan 2008 10:48] Susanne Ebrecht
Sorry, but I have a problem to understand your problem.

To handle character sets correct, you have to make sure of the following:

1) Your column should use the right character set. For example euckr
You can see it with:
mysql> show create table YOUR_TABLE_NAME;

2) for inserting data, your character_set_client variable and your input environment should have the same encoding.

For example:
Make sure that your terminal is set to EUC-KR, if the character_set_client=euckr.
Now, you can insert the data.

For example: if your terminal is utf8 and your column has euckr.
Then make first:
mysql> set names utf8;
This occurs that character_set_client and some other variables will set to utf8.
Now you can insert the data by using utf8 and the system will handle, that they will changed from utf8 to euckr. Which means, they are stored by using euckr at the table.

3) for selecting data, it's the same as for inserting data.
Look, which encoding your output environment need. Set the environment to euckr or use:
set names CHARACTER_SET_OF_YOUR_OUTPUT_ENVIRONMENT

The difference between input and output are, that you need other variables. 
For example: you need character_set_results for output but it's not necessary for input.
"Set names" will set all necessary variables to the right values.

By using this rules, you can be sure, your data are stored in the right way at the database and you won't get problems.

If you get a weird output by using this rules, you can be sure, your stored data were stored in the wrong way. For repairing this, it's necessary to dump the database, change the wrong data manually at the dump and import the dump again.

Please, let us know, if you still have problems by using the rules, that I gave you.
[31 Jan 2008 8:24] Sveta Smirnova
Please additionally type problem query into bug report to we can just copy-paste problem characters.
[31 Jan 2008 8:32] Michael Jordan
All character sets about this problem are euckr.

show create table test;
+-------+------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                         |
+-------+------------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `char_field` varchar(200) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=euckr |

1 row in set (0.00 sec)

| character_set_client            | euckr                               |
| character_set_connection        | euckr                               |
| character_set_database          | euckr                               |
| character_set_filesystem        | binary                              |
| character_set_results           | euckr                               |
| character_set_server            | euckr                               |
| character_set_system            | utf8    
| collation_connection            | euckr_korean_ci                     |
| collation_database              | euckr_korean_ci                     |
| collation_server                | euckr_korean_ci                     |

 

In general case, I can insert data without problem.
It occurs when I insert things like '솁', '뷁'....

mysql> insert into test values ('가나다');
Query OK, 1 row affected (0.00 sec)

mysql> insert into test values ('안드리 솁첸코');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> insert into test values ('뷁');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select * from test;
+------------+
| char_field |
+------------+
| 가나다     |
| 안드리     |
|            |
+------------+
[31 Jan 2008 8:35] Michael Jordan
The important thing is that I have no problem when using version 3.XX or 4.XX in the same environment.
Only when using version 5.XX, I have proble.
[31 Jan 2008 10:39] Susanne Ebrecht
Unfortunately, I can't reproduce your problem with:

mysql> select version()\G
*************************** 1. row ***************************
version(): 5.0.51a-debug

Look here:

mysql> show create table t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `t` text
) ENGINE=MyISAM DEFAULT CHARSET=euckr

mysql> insert into t values('솁'),('뷁'),('가나다'),('뷁');

mysql> select * from t;
+-----------+
| t         |
+-----------+
| 솁       | 
| 뷁       | 
| 가나다 | 
| 뷁       | 
+-----------+

This looks correct for me.

Please, make sure, that your environment encoding is set right. Also make sure, that your data are not stored to the database at the wrong way.

Also, try MySQL 5.0.51a. We fixed some character set bugs from older versions of MySQL.
[31 Jan 2008 11:14] Susanne Ebrecht
also this looks correct:

mysql> insert into t values('솁 뷁 가나다 뷁');

mysql> select * from t where t like '솁 뷁%'\G
*************************** 1. row ***************************
t: 솁 뷁 가나다 뷁
[1 Mar 2008 0:01] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[17 Mar 2008 9:37] Susanne Ebrecht
Micheal,

we still need to know if you have problems with this letters by using newer version and taking the encoding rules that I gave you above.
[17 Mar 2008 9:39] Susanne Ebrecht
We still need a mysqldump from you to analyse this problem.
[17 Mar 2008 9:41] Susanne Ebrecht
Sorry, forget my last comment. I mixed bug numbers and we need a dump for another bug not for this bug here.
[17 Apr 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[30 Apr 2008 10:37] Susanne Ebrecht
Michael,

I will set this to "can't repeat" because I am pretty sure that your problem is fixed in higher versions.

Please, feel free to open it again, if you will have problems with newer versions as well.