Bug #36418 Character sets: crash if char(256 using utf32)
Submitted: 30 Apr 2008 0:26 Modified: 28 Jul 2008 20:50
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:6.0.6-alpha-debug OS:Linux (SUSE 10 | 32-bit)
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: regression

[30 Apr 2008 0:26] Peter Gulutzan
Description:
I create a table.
I try to insert using "char(256 using utf32)".
Crash.

How to repeat:
create table t (s1 varchar(1) character set utf32, s2 text character set utf32) engine=falcon;
create index i on t (s1);
insert into t values (char(256 using utf32), char(256 using utf32)    );
[30 Apr 2008 6:18] Valeriy Kravchuk
This is a regression bug. There is no crash with 6.0.4:

C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot -proot test -P3311
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 6.0.4-alpha-community MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t (s1 varchar(1) character set utf32, s2 text character set
utf32)
    -> engine=falcon;
Query OK, 0 rows affected (0.59 sec)

mysql> create index i on t (s1);
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into t values (char(256 using utf32), char(256 using utf32)    );
ERROR 1300 (HY000): Invalid utf32 character string: '010000'
[30 Apr 2008 11:17] MySQL Verification Team
Thank you for the bug report.

Server version: 6.0.6-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t (s1 varchar(1) character set utf32, s2 text character set utf32)
    -> engine=falcon;
Query OK, 0 rows affected (0.12 sec)

mysql> create index i on t (s1);
Query OK, 0 rows affected (0.20 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into t values (char(256 using utf32), char(256 using utf32)    );
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>
[19 May 2008 11:00] Alexander Barkov
Peter, what is expected result for CHAR(256) ?

http://dev.mysql.com/doc/refman/6.0/en/string-functions.html#function_char
says that:

 CHAR(256) is equivalent to CHAR(1,0)

which is 0x0100  and is a too short UTF-32 sequence.

Should a warning / error be generated?

Or should it try to zero-pad the numbers automatically,
if they use less than "mbminlen" bytes for the output character set?

I'd prefer leading zero padding. I.e. (assuming utf32)

- Every number is auto-extended to 4 bytes.
  CHAR(0x010203 using utf32) -> 0x00010203
  CHAR(0x0203 using utf32)   -> 0x00000203
  CHAR(0x03 using utf32)     -> 0x00000003

- Every integer argument is padded separately, different
arguments do not interfere to each other in sense of padding:

  CHAR(0x01, 0x02 using utf32) -> 0x0000000100000002
                              not 0x00000102

  i.e. "pad all numbers then concat"  VS "concat all numbers then pad".
[22 May 2008 22:44] Peter Gulutzan
Bar asked:

> Peter, what is expected result for CHAR(256) ?

For "CHAR(256 USING UTF32)" I expect rules like for CHAR(256 USING UCS2)".

I think that means that I think you are right.

I believe that the rules you propose for UTF32 are like the rules
that we have now for UCS2. That is:
* leading zero padding
* every number is auto-extended to 2 bytes
* select hex( CHAR(0x01, 0x02 using ucs2)) yields 00010002,
  so apparently integer arguments are padded separately

It's true, the manual says that CHAR(1,0) is the same as CHAR(256).
But I don't care, the manual doesn't say that CHAR(1,0 USING UTF32)
is the same as CHAR(256 USING UTF32). This doesn't, in my opinion,
change documented behaviour. And it's already the case that
select hex( CHAR(1, 0 using ucs2)) yields 00010000
while
select hex( CHAR(256 using ucs2))  yields 0100
[26 May 2008 13:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47055

ChangeSet@1.2643, 2008-05-26 17:59:08+05:00, bar@mysql.com +6 -0
  Bug#36418 Character sets: crash if char(256 using utf32)
  Problem: CHAR(256 USING utf32) could generate a result
  with incorrect length, which resulted into server crash.
  Fix: CHAR() now generates results with correct lengths,
  taking into account "mbminlen" of the character set.
[3 Jul 2008 9:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48948

2672 Alexander Barkov	2008-07-03
      Bug#36418 Character sets: crash if char(256 using utf32)
      Problem: CHAR(256 USING utf32) could generate a result
      with incorrect length, which resulted into server crash.
      Fix: CHAR() now generates results with correct lengths,
      taking into account "mbminlen" of the character set.
      
      mysql-test/r/ctype_ucs.result:
      mysql-test/r/ctype_utf32.result:
      mysql-test/t/ctype_ucs.test:
      mysql-test/t/ctype_utf32.test:
        Adding tests
      
      sql/item_strfunc.cc
        Fixing to append all multi-byte characters as a single buffer,
        instead of appending one-by-one. This is important for "real"
        multi-byte character sets like UCS2 and UTF32.
      
      sql/sql_string.cc
         Handling correctly a case when a UCS2 or UTF32
         string is appended with a binary string:
         zero-pad the binary argument before concatenation,
         to make it have correct length
         (e.g. 0x01 -> 0x00000001 in case of UTF32).
[16 Jul 2008 10:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49816

2719 Alexander Barkov	2008-07-16
            Bug#36418 Character sets: crash if char(256 using utf32)
            Problem: CHAR(256 USING utf32) could generate a result
            with incorrect length, which resulted into server crash.
            Fix: CHAR() now generates results with correct lengths,
            taking into account "mbminlen" of the character set.
            
            mysql-test/r/ctype_ucs.result:
            mysql-test/r/ctype_utf32.result:
            mysql-test/t/ctype_ucs.test:
            mysql-test/t/ctype_utf32.test:
              Adding tests
            
            sql/item_strfunc.cc
              Fixing to append all multi-byte characters as a single buffer,
              instead of appending one-by-one. This is important for "real"
              multi-byte character sets like UCS2 and UTF32.
            
            sql/sql_string.cc
               Handling correctly a case when a UCS2 or UTF32
               string is appended with a binary string:
               zero-pad the binary argument before concatenation,
               to make it have correct length
               (e.g. 0x01 -> 0x00000001 in case of UTF32).
[17 Jul 2008 7:19] Alexander Barkov
Pushed into mysql-6.0.6-bugteam.
[18 Jul 2008 9:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50013

2725 Georgi Kodinov	2008-07-18
      Bug#36418 addendum: fixed a C++ specific construct in a C file.
[18 Jul 2008 9:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50014

2725 Georgi Kodinov	2008-07-18
      Bug#36418 addendum: fixed a C++ specific construct in a C file.
[18 Jul 2008 11:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50026

2726 Sven Sandberg	2008-07-18 [merge]
      automerge
[28 Jul 2008 14:45] Bugs System
Pushed into 6.0.7-alpha  (revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (version source revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (pib:3)
[28 Jul 2008 20:50] Paul DuBois
Noted in 6.0.7 changelog.

CHAR(256 USING utf32) could generate a result with an incorrect
length and result in a server crash.
[13 Sep 2008 23:38] Bugs System
Pushed into 6.0.7-alpha  (revid:kgeorge@mysql.com-20080718093719-5927sojsbr8s73nw) (version source revid:john.embretsen@sun.com-20080808091208-ht48kyzsk7rim74g) (pib:3)