Bug #2390 UCS2 columns have filler bytes after hex-value input
Submitted: 14 Jan 2004 10:18 Modified: 21 Jan 2004 3:38
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.2-alpha-debug OS:Linux (SuSE 8.2)
Assigned to: Alexander Barkov CPU Architecture:Any

[14 Jan 2004 10:18] Peter Gulutzan
Description:
MySQL 4.1.2 supports UCS-2 characters, and supports hexadecimal 
string literals with X'....', but there's an apparent problem mixing them 
when one uses one-byte hexadecimal literals.  
 
mysql> create table t4 (s1 char(1) character set ucs2); 
Query OK, 0 rows affected (0.28 sec) 
 
mysql> insert into t4 values (x'6F'); 
Query OK, 1 row affected (0.01 sec) 
 
mysql> select hex(s1) from t4; 
+---------+ 
| hex(s1) | 
+---------+ 
| 6F20    | 
+---------+ 
1 row in set (0.00 sec) 
 
All right, x'6F' is ambiguous, it's reasonable to 
decide that it's only one byte and not x'006F' 
(even though the MySQL Reference Manual says 
"each PAIR of hex digits is converted to a character"). 
 
But why add x'20'? It's no good replying that x'20' 
is a space, it  is not a space if it's the 
low part of a 16-bit character. 
 
Also, if I add an introducer, I get a different filler: 
 
mysql> delete from t4; 
Query OK, 3 rows affected (0.00 sec) 
 
mysql> insert into t4 values (_ucs2 x'6f'); 
Query OK, 1 row affected (0.00 sec) 
 
mysql> select hex(s1) from t4; 
+---------+ 
| hex(s1) | 
+---------+ 
| 6F6F    | 
+---------+ 
1 row in set (0.00 sec) 
 
See also the MySQL Reference Manual, "Hexadecimal Characters": 
http://www.mysql.com/doc/en/Hexadecimal_values.html 
 

How to repeat:
create table t4 (s1 char(1) character set ucs2); 
insert into t4 values (x'6F'); 
select hex(s1) from t4; 
 
delete from t4; 
insert into t4 values (_ucs2 x'6f'); 
select hex(s1) from t4; 
 

Suggested fix:
Fill incomplete words with x'00' instead of x'20' or x'6f'. 
In strict mode, incomplete words should cause an error.
[15 Jan 2004 0:19] Alexander Barkov
Probably the best choice is to prepend it with a necessary number
of 0 digits.
[20 Jan 2004 5:41] Alexander Barkov
These two are now fixed:
  SELECT _ucs2 0xAA
  SELECT _ucs2 X'AA'

And automatically extend the argument to 0x00AA
[20 Jan 2004 5:43] Alexander Barkov
Test with all character types should be done after fixing this:
  CHAR, VARCHAR and TEXT

These two should prepend leding zeros too:
  CAST(bin_expr AS char character set ucs2)
  CONVERT(bin_expr using ucs2)
[21 Jan 2004 3:38] Alexander Barkov
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Tests were also added.