Bug #20564 invalid character translation
Submitted: 20 Jun 2006 9:39 Modified: 20 Jun 2006 11:47
Reporter: [ name withheld ] Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.22 OS:Windows (Windows 98)
Assigned to: CPU Architecture:Any

[20 Jun 2006 9:39] [ name withheld ]
Description:
when I use SQL statement like this :

select * from <table> where convert(<field1> using utf8)="ď" incorect character mapping uccured.

I think, that there is bug in character mapping table for character set utf8.

Character "ď" = "d with hiccup" translates to "i" which is bad (correct is "d")

How to repeat:
CREATE TABLE t (n1 integer, c1 VARCHAR(20) character set latin1 collate latin1_swedish_ci, c2 VARCHAR(20) CHARACTER SET utf8);
insert into t values(1,'a','a');
insert into t values(2,'á','á');
insert into t values(3,'ä','ä');
insert into t values(4,'d','d');
insert into t values(5,'ď','ď');
insert into t values(6,'i','i');
insert into t values(7,'í','í');

select * from t where c1='i';
/* returns 5,6,7 , which is inncorect; correct is 6,7 ?*/
select * from t where c2='i';
/* returns 5,6,7 , which is inncorect; correct is 6,7 ?*/
select * from t where c2='d';
/* returns 4 , which is inncorect; correct is 4,5 ? */
[20 Jun 2006 10:37] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of the

SHOW VARIABLES like 'char%';

from the session were you created the table. I've got different results with your test:

mysql> CREATE TABLE t (n1 integer, c1 VARCHAR(20) character set latin1 collate
    -> latin1_swedish_ci, c2 VARCHAR(20) CHARACTER SET utf8);
Query OK, 0 rows affected (0.09 sec)

mysql> insert into t values(1,'a','a');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(2,'á','á');
ERROR 1406 (22001): Data too long for column 'c2' at row 1
mysql> insert into t values(3,'ä','ä');
ERROR 1406 (22001): Data too long for column 'c2' at row 1
mysql> insert into t values(4,'d','d');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(5,'ď','ď');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(6,'i','i');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(7,'í','í');
ERROR 1406 (22001): Data too long for column 'c2' at row 1
mysql> show variables like 'char%';
+--------------------------+---------------------------------------------------------+
| Variable_name            | Value     |
+--------------------------+---------------------------------------------------------+
| character_set_client     | utf8     |
| character_set_connection | utf8     |
| character_set_database   | utf8     |
| character_set_filesystem | binary     |
| character_set_results    | utf8     |
| character_set_server     | utf8     |
| character_set_system     | utf8     |
| character_sets_dir       | C:\Program Files\MySQL\MySQL Server 5.0\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> select * from t where c1='i';
+------+------+------+
| n1   | c1   | c2   |
+------+------+------+
|    6 | i    | i    |
+------+------+------+
1 row in set (0.00 sec)

mysql> select * from t where c2='i';
+------+------+------+
| n1   | c1   | c2   |
+------+------+------+
|    6 | i    | i    |
+------+------+------+
1 row in set (0.00 sec)

mysql> select * from t where c2='d';
+------+------+------+
| n1   | c1   | c2   |
+------+------+------+
|    4 | d    | d    |
|    5 | d    | d    |
+------+------+------+
2 rows in set (0.01 sec)

Hence the question above...
[20 Jun 2006 11:06] [ name withheld ]
+--------------------------+-----------------------------------------------------+
| Variable_name            | Value                                               |
+--------------------------+-----------------------------------------------------+
| character_set_client     | latin1                                              |
| character_set_connection | latin1                                              |
| character_set_database   | latin1                                              |
| character_set_filesystem | binary                                              |
| character_set_results    | latin1                                              |
| character_set_server     | latin1                                              |
| character_set_system     | utf8                                                |
| character_sets_dir       | C:\Program Files\Apache Group\MySQL\share\charsets\ |
+--------------------------+-----------------------------------------------------+
it is default instalation.

Why do you get error in this line ?:
 insert into t values(7,'í','í');
ERROR 1406 (22001): Data too long for column 'c2' at row 1
[20 Jun 2006 11:23] Valeriy Kravchuk
Sorry, but I just can not get that "d with hiccup" in c2 column. Look:

mysql> show variables like 'char%';
+--------------------------+---------------------------------------------------------+
| Variable_name            | Value     |
+--------------------------+---------------------------------------------------------+
| character_set_client     | latin1     |
| character_set_connection | latin1     |
| character_set_database   | latin1     |
| character_set_filesystem | binary     |
| character_set_results    | latin1     |
| character_set_server     | latin1     |
| character_set_system     | utf8     |
| character_sets_dir       | C:\Program Files\MySQL\MySQL Server 5.0\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> CREATE TABLE t (n1 integer, c1 VARCHAR(20) character set latin1 collate
    -> latin1_swedish_ci, c2 VARCHAR(20) CHARACTER SET utf8);
Query OK, 0 rows affected (0.08 sec)

mysql> insert into t values(1,'a','a');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(2,'á','á');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t values(3,'ä','ä');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(4,'d','d');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(5,'ď','ď');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(6,'i','i');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t values(7,'í','í');
Query OK, 1 row affected (0.02 sec)

mysql> select * from t where c1='i';
+------+------+------+
| n1   | c1   | c2   |
+------+------+------+
|    6 | i    | i    |
|    7 | í     | í     |
+------+------+------+
2 rows in set (0.00 sec)

mysql> select * from t where c2='i';
+------+------+------+
| n1   | c1   | c2   |
+------+------+------+
|    6 | i    | i    |
|    7 | í     | í     |
+------+------+------+
2 rows in set (0.00 sec)

mysql> select * from t where c2='d';
+------+------+------+
| n1   | c1   | c2   |
+------+------+------+
|    4 | d    | d    |
|    5 | d    | d    |
+------+------+------+
2 rows in set (0.00 sec)

mysql> select * from t;
+------+------+------+
| n1   | c1   | c2   |
+------+------+------+
|    1 | a    | a    |
|    2 | á     | á     |
|    3 | ä     | ä     |
|    4 | d    | d    |
|    5 | d    | d    |
|    6 | i    | i    |
|    7 | í     | í     |
+------+------+------+
7 rows in set (0.00 sec)

mysql> select version();
+---------------------+
| version()           |
+---------------------+
| 5.0.22-community-nt |
+---------------------+
1 row in set (0.00 sec)

So, for me it is converted to d upon insertion...
[20 Jun 2006 11:45] [ name withheld ]
when I type :
select *, ascii(c1) from t

I get :
+--------+--------+--------+-----------+
| n1     | c1     | c2     | ascii(c1) |
+--------+--------+--------+-----------+
|      1 | a      | a      |        97 |
|      2 |
[20 Jun 2006 11:47] [ name withheld ]
when I type :
select *, ascii(c1) from t

I get :
+--------+--------+--------+-----------+
| n1     | c1     | c2     | ascii(c1) |
+--------+--------+--------+-----------+
|      1 | a      | a      |        97 |
|      2 |      |      |       225 |
|      3 |     |     |       228 |
|      4 | d      | d      |       100 |
|      5 |     |     |       239 |
|      6 | i      | i      |       105 |
|      7 |     |     |       237 |
|      8 |     |     |       158 |
|      9 | z      | z      |       122 |
+--------+--------+--------+-----------+

So "d caron" in Windows1250 (my windows codepage) is ascii code is 239 (row 5) - my insert statement was written under this codepage.
But in latin1 ascii code 239 is "Small i, dieresis or umlaut mark", so mysql interprets is as "i" not as "d" ...
So it seems like my mystake :-)
I must alter code page for table and for connection ...