Bug #10344 Some string functions fail for UCS2
Submitted: 3 May 2005 16:57 Modified: 24 May 2005 19:54
Reporter: Marko Mäkelä Email Updates:
Status: Closed Impact on me:
Category:MySQL Server Severity:S2 (Serious)
Version:5.0-bk OS:Any (all)
Assigned to: Alexander Barkov CPU Architecture:Any

[3 May 2005 16:57] Marko Mäkelä
At least the functions SUBSTR and CONCAT seem to function improperly on UCS2 input. Somehow, HEX(expr) often shows more than expr by itself.

How to repeat:
select hex(substr(_latin1 0x00e400e40068, 2));
-- E400E40068
select hex(substr(_utf8 0x00e400e40068, 2));
-- E400E40068 (why not truncate to empty string (invalid UTF-8 char E4)?)
select hex(substr(_ucs2 0x00e400e40068, 2));
-- empty (expected 00E40068)
select hex(concat(_ucs2 0x00e4, 1));
-- 00E431 (expected 00E40031)
select concat(_ucs2 0x0041, 1);
-- empty (expected 'a1')

create table f(a char(5))engine=myisam charset=ucs2;
insert into f values('abc');
select concat(substr(a,2),substr(a,1,1)) from f;
-- expected 'bca', got 'a'

Suggested fix:
Make sure you're not assuming anything special about the NUL byte.
[10 May 2005 11:28] Alexander Barkov
1.  substr problem was fixed in 4.1.13.
2. Character-To-Number conversion like:
   CONCAT(ucs2_value, 1)
is going to be fixed in 5.1.
For earlier versions, Charac it is considered to be expected behaviour.
[24 May 2005 19:54] Paul Dubois
SUBSTR() fix noted in 4.1.13 changelog.