Bug #32393 Character sets: illegal characters in utf16 columns
Submitted: 14 Nov 2007 19:50 Modified: 6 Feb 2008 17:45
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:6.0.5-alpha-debug OS:Linux (SUSE 10 64-bit)
Assigned to: Alexander Barkov CPU Architecture:Any

[14 Nov 2007 19:50] Peter Gulutzan
Description:
I'm using the mysql-5.2-rpl team tree.

The specification was (see WL#1213):
"
Validity checks are: in utf16 there must not be a top
surrogate without a bottom surrogate (or a bottom
surrogate without a top surrogate) ..."
"

But I can get a high surrogate in a utf16 column
(not "on input" but via alter or update).

How to repeat:
/* via alter */
create table ucs2 (s1 varchar(50) character set ucs2);
insert into ucs2 values (0xdf84);
alter table ucs2 modify column s1 varchar(50) character set utf16;
/* via update */
create table xk (s1 varchar(5) character set ucs2, s2 varchar(5) character set utf16);
insert into xk (s1) values (0xdf84);
update xk set s2 = s1;

Example:

mysql> /* via alter */
mysql> create table ucs2 (s1 varchar(50) character set ucs2);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into ucs2 values (0xdf84);
Query OK, 1 row affected (0.00 sec)

mysql> alter table ucs2 modify column s1 varchar(50) character set utf16;
Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> /* via update */
mysql> create table xk (s1 varchar(5) character set ucs2, s2 varchar(5) character set utf16);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into xk (s1) values (0xdf84);
Query OK, 1 row affected (0.01 sec)

mysql> update xk set s2 = s1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
[15 Nov 2007 2:10] MySQL Verification Team
Thank you for the bug report. Verified as described.
[6 Dec 2007 8:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/39380

ChangeSet@1.2700, 2007-12-06 12:52:29+04:00, bar@mysql.com +3 -0
  Bug#32393 Character sets: illegal characters in utf16 columns
  Problem: utf16 column allowed to put wrong Unicode characters 
  through conversion from another Unicode character set.
  Fix: Disallow Unicode characters in conversion.
[6 Dec 2007 12:09] Alexander Barkov
Pushed into 6.0.4-rpl
[5 Feb 2008 13:07] Bugs System
Pushed into 6.0.5-alpha
[6 Feb 2008 17:45] Paul DuBois
Noted in 6.0.5 changelog.

utf16 columns allowed incorrect Unicode characters inserted through
conversion from another Unicode character set.