Bug #30315 Character sets: insertion of euckr code value 0xa141 fails
Submitted: 8 Aug 2007 16:36 Modified: 5 Dec 2007 19:05
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.*, 5.1.21-beta-debug OS:Linux (SUSE 10 64-bit)
Assigned to: Alexander Barkov CPU Architecture:Any

[8 Aug 2007 16:36] Peter Gulutzan
Description:
I create a table with a column with euckr (Korean) character set.
I attempt to insert 0xa141 in this column. I get a warning message
and the value does not go in.

The euckr/unicode mapping page
ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSC5601.TXT
has this line:
0xA141        0xC8A5  # HANGUL SYLLABLE CIEUC-WA-THIEUTH
so I know the character is valid. Please check nearby characters too.

I have a workaround using a different code, I can say:
    INSERT ... (_ucs2 0xc8a5)
I have a workaround using the character, I can say:
    INSERT ... ('좥')

How to repeat:
mysql> set names utf8;
Query OK, 0 rows affected (0.03 sec)

mysql> create table tk (s1 varchar(5) character set euckr);
Query OK, 0 rows affected (0.20 sec)

mysql> insert into tk values (0xa141);
Query OK, 1 row affected, 1 warning (0.08 sec)

mysql> show warnings;
+---------+------+----------------------------------------------------------+
| Level   | Code | Message                                                  |
+---------+------+----------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xA1A' for column 's1' at row 1 |
+---------+------+----------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into tk values (_ucs2 0xc8a5);
Query OK, 1 row affected (0.35 sec)

mysql> insert into tk values ('좥');
Query OK, 1 row affected (0.01 sec)

mysql> select s1,hex(s1) from tk;
+------+---------+
| s1   | hex(s1) |
+------+---------+
|      |         |
| 좥  | A141    |
| 좥  | A141    |
+------+---------+
3 rows in set (0.00 sec)
[1 Oct 2007 10:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34704

ChangeSet@1.2526, 2007-10-01 15:35:42+05:00, bar@mysql.com +3 -0
  Bug#30315 Character sets: insertion of euckr code value 0xa141 fails
  
  Problem: some valid euc-kr characters were rejected because
  condition checking multi-byte tail didn't allow 
  multi-byte characters having the second byte in the ranges
  [0x41..0x5A]  and [0x61..0x7A].
  
  Fix: allow these byte ranges for mb tails
[2 Oct 2007 9:25] Sergei Glukhov
ok to push
[3 Oct 2007 7:26] Alexander Barkov
Pushed into 5.0.50-rpl and 5.1.23-rpl
[27 Nov 2007 10:49] Bugs System
Pushed into 5.0.54
[27 Nov 2007 10:51] Bugs System
Pushed into 5.1.23-rc
[27 Nov 2007 10:54] Bugs System
Pushed into 6.0.4-alpha
[5 Dec 2007 19:05] Paul DuBois
Noted in 5.0.54, 5.1.23, 6.0.4 changelogs.

Some valid euc-kr characters having the second byte in the ranges 
[0x41..0x5A] and [0x61..0x7A] were rejected.