Bug #30982 CHAR(..USING..) can return a not-well-formed string
Submitted: 12 Sep 2007 15:48 Modified: 30 Oct 2007 0:47
Reporter: Alexander Barkov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0/5.1 OS:Any
Assigned to: Sergei Glukhov CPU Architecture:Any

[12 Sep 2007 15:48] Alexander Barkov
Description:
CHAR(..USING..) does not check the argument, and can return
a not-well-formed string.

How to repeat:
mysql> select hex(char(0xFF using utf8));
+----------------------------+
| hex(char(0xFF using utf8)) |
+----------------------------+
| FF                         |
+----------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+-------------------------------------+
| Level   | Code | Message                             |
+---------+------+-------------------------------------+
| Warning | 1300 | Invalid utf8 character string: 'FF' |
+---------+------+-------------------------------------+
1 row in set (0.00 sec)

Expected result is not to return bad strings.

Suggested fix:
It should be fixed to return an error rather than a warning.
[12 Sep 2007 15:58] MySQL Verification Team
Thank you for the bug report.
[4 Oct 2007 10:37] Alexander Barkov
The same problem shows up with CONVERT(..USING..)

mysql> select hex(convert(0xFF using utf8));
+-------------------------------+
| hex(convert(0xFF using utf8)) |
+-------------------------------+
| FF                            |
+-------------------------------+
1 row in set, 1 warning (0.00 sec)
[8 Oct 2007 12:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35096

ChangeSet@1.2534, 2007-10-08 17:19:10+05:00, gluh@mysql.com +4 -0
  Bug#30982 CHAR(..USING..) can return a not-well-formed string
  Bug#30986 Character set introducer followed by a HEX string can return bad result
  Item Item_func_hex: added the check for well formed string
  if result string has illegal symbols we cut off the string
  until last legal symbol.
[10 Oct 2007 14:59] Sergei Glukhov
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35292
[10 Oct 2007 16:27] Alexander Barkov
The patch http://lists.mysql.com/commits/35292 is almost fine.

I have two suggestions:

1. This won't work on a big-endian machine:

-      str->append((char) num);
+      str->append((char*) &num, 1);

You need to do it this way:

char chr= (char) num;
str->append(&chr, 1);

2. Please change this code:

+            $$= new Item_string(str ? str->ptr() : "",
+                                str ? str->length() : 0,
+                                Lex->underscore_charset);

To this:

+            $$= new Item_string(NULL /* name will be set in "select_item" */,
+                                str ? str->ptr() : "",
+                                str ? str->length() : 0,
+                                Lex->underscore_charset);

in both "UNDERSCORE_CHARSET BIN_NUM" and "UNDERSCORE_CHARSET HEX_NUM" rules.

It will do exactly the same, with an exception that it will set Item name to NULL in constructor. Later, Item name will be set in "select_item", here:

select_item: 
   remember_name select_item2 remember_end select_alias
   ...
   else if (!$2->name)
   {
     $2->set_name($1, (uint) ($3 - $1), thd->charset());
   }

This change will produce nicer looking Item names, like this:

mysql> select _utf8 X'616263FF';
+-------------------+
| _utf8 X'616263FF' |
+-------------------+
| abc               |
+-------------------+

instead of current bad behavior, when bad bytes are removed from value,
but are not removed from name:

mysql> select _utf8 X'616263FF';
+------+
| abc? | <- wrong byte is still at the end
+------+
| abc  | <- wrong byte was removed with help of check_well_formed_result()
+------+
[11 Oct 2007 8:52] Sergei Glukhov
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/35327
[11 Oct 2007 10:23] Alexander Barkov
Ok to push
[29 Oct 2007 8:43] Bugs System
Pushed into 5.0.52
[29 Oct 2007 8:46] Bugs System
Pushed into 5.1.23-beta
[29 Oct 2007 8:50] Bugs System
Pushed into 6.0.4-alpha
[30 Oct 2007 0:47] Paul DuBois
Noted in 5.0.52, 5.1.23, 6.0.4 changelogs.

CHAR(str USING charset) did not check its argument and could return
an ill-formed result for invalid input.