Bug #32394 Character sets: crash if comparison with 0xfffd
Submitted: 14 Nov 2007 19:51 Modified: 18 Jul 2008 16:13
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:6.0.5-alpha-debug OS:Linux (SUSE 10 64-bit)
Assigned to: Alexander Barkov CPU Architecture:Any
Triage: D2 (Serious)

[14 Nov 2007 19:51] Peter Gulutzan
Description:
I'm using the mysql-5.2-rpl team tree.
I try to compare a utf32 value with 0xfffd.
Crash.

How to repeat:
drop table if exists utf32;
create table utf32 (s1 varchar(5) character set utf32);
insert into utf32 values (0xfffd);
select case when s1 = 0xfffd then 1 else 0 end from utf32;

or

drop table if exists utf32;
create table utf32 (s1 varchar(5) character set utf32);
insert into utf32 values (0xfffd);
select * from utf32 where s1 = 0xfffd;

Example:

mysql> drop table if exists utf32;
Query OK, 0 rows affected (0.00 sec)

mysql> create table utf32 (s1 varchar(5) character set utf32);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into utf32 values (0xfffd);
Query OK, 1 row affected (0.01 sec)

mysql> select case when s1 = 0xfffd then 1 else 0 end from utf32;
ERROR 2013 (HY000): Lost connection to MySQL server during query
[15 Nov 2007 2:07] Miguel Solorzano
Thank you for the bug report. Verified as described:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.2.6-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> drop table if exists utf32;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create table utf32 (s1 varchar(5) character set utf32);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into utf32 values (0xfffd);
Query OK, 1 row affected (0.00 sec)

mysql> select case when s1 = 0xfffd then 1 else 0 end from utf32;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>
[7 Dec 2007 18:45] Peter Gulutzan
Confirmed with 6.0.5-alpha-debug.
[12 Dec 2007 11:04] Alexander Barkov
An easier test demonstrating the same problem:

mysql> select _utf32'a' collate utf32_general_ci = 0xfffd;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>
[12 Dec 2007 11:12] Alexander Barkov
Related problem:

mysql> select _ucs2 0x0061 collate ucs2_general_ci = 0x61;
+---------------------------------------------+
| _ucs2 0x0061 collate ucs2_general_ci = 0x61 |
+---------------------------------------------+
|                                           0 |
+---------------------------------------------+
1 row in set (1.31 sec)

The above query should return 1.
[12 Dec 2007 13:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/39776

ChangeSet@1.2700, 2007-12-12 17:23:22+04:00, bar@mysql.com +8 -0
  Bug#32394 Character sets: crash if comparison with 0xfffd
  Problem: strnncoll() was called with non-aligned arguments in some cases.
  E.g. UCS2 and UTF16 expect length to be divisible by 2,
  and UTF32 expects length to be divisible by 4.
  This was not true in the case of mixing character strings
  with binary constants, like 0xAA of X'AA'. A binary constant of
  this kind was passed directly to strnncoll() without preliminary
  extending to 0x00AA (for UCS2/UTF16) and 0x000000AA (for UTF32).
  Fix: force binary constant alignment for UCS2/UTF16/UTF32.
[14 Dec 2007 9:20] Alexander Barkov
There problem is more complex than I originally thought.
Another example:

mysql> select hex(@x:=concat(_ucs2 0x0410 collate ucs2_general_ci, 0x61));
+-------------------------------------------------------------+
| hex(@x:=concat(_ucs2 0x0410 collate ucs2_general_ci, 0x61)) |
+-------------------------------------------------------------+
| 041061                                                      |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select charset(@x);
+-------------+
| charset(@x) |
+-------------+
| ucs2        |
+-------------+
1 row in set (0.00 sec)

The result of concatenation is bad.
The expected result is 04100061.

I.e. it should work as follows:

1. Collation aggregation code detects that the result collation is utc2_general_ci
2. Concat() casts the right argument from binary to ucs2, so the constant
is left-padded with zero: 0x0061
3. Concat() does actual concatenating

The same problem applies to all string functions.
[2 May 2008 15:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46299

ChangeSet@1.2622, 2008-05-02 20:53:40+05:00, bar@mysql.com +3 -0
  Bug#32394 Character sets: crash if comparison with 0xfffd
  Problem: when converting from "binary" to "real multi-byte" 
  character sets, strings were not left-padded to correct length.
  Fix: force installing Item_func_conv_charset() when conversion
  from "binary" to "real multi-byte" happens. Example:
  In case of utf32: 0x61 is now padded to 0x00000061,
  previously it was not padded in some cases.
[5 May 2008 10:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46346

ChangeSet@1.2641, 2008-05-05 15:53:05+05:00, bar@mysql.com +3 -0
  Bug#32394 Character sets: crash if comparison with 0xfffd
  Problem: when converting from "binary" to "real multi-byte"
  character sets, strings were not left-padded to correct length.
  Fix: force installing Item_func_conv_charset() when conversion
  from "binary" to "real multi-byte" happens. Example:
  In case of utf32: 0x61 is now padded to 0x00000061,
  previously it was not padded in some cases.
[5 May 2008 10:58] Alexander Barkov
Pushed into 6.0.6-rpl
[18 Jul 2008 8:33] Alexander Barkov
Merged into bzr mysql-6.0.7
[18 Jul 2008 16:13] Paul Dubois
Noted in 6.0.6 changelog.

Conversion of binary values to multi-byte character sets could fail
to left-pad values to the correct length. This could result in a
server crash.
[14 Sep 2008 5:03] Bugs System
Pushed into 6.0.7-alpha  (revid:sp1r-bar@mysql.com/bar.myoffice.izhnet.ru-20080505105305-51277) (version source revid:john.embretsen@sun.com-20080724122511-9c0oudz1xrdrs6y6) (pib:3)