Bug #32388 Character sets: failure with a non-BMP character in a prepared statement
Submitted: 14 Nov 2007 19:43 Modified: 3 Jan 2008 18:57
Reporter: Peter Gulutzan Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:6.0.5-alpha-debug OS:Linux (SUSE 10 64-bit)
Assigned to: Alexander Barkov CPU Architecture:Any

[14 Nov 2007 19:43] Peter Gulutzan
Description:
I'm using the mysql-5.2-rpl team tree.

A search fails with non-BMP character
0x00010384 UGARITIC LETTER DELTA,
in strings that I use with PREPARE.

How to repeat:
set names utf8;
drop table if exists utf;
create table utf (utf32 varchar(5) character set utf32,
                  utf16 varchar(5) character set utf16,
                  utf8 varchar(5) character set utf8,
                  utf8mb3 varchar(5) character set utf8mb3);
insert into utf values (0x0000ff9d,0xff9d,0xefbe9d,0xefbe9d);
insert into utf values (0x00010384,0xd800df84,0xf0908e84,0xf0908e84);
select utf32,utf16,utf8,utf8mb3,hex(utf32),hex(utf16),hex(utf8),hex(utf8mb3) from utf where utf16 > 0xf0908e84;
set @x = concat('select utf32,utf16,utf8,utf8mb3,hex(utf32),hex(utf16),hex(utf8),hex(utf8mb3) from utf where utf16 > ''',0xf0908e84,'''');
prepare stmt1 from @x;
execute stmt1;
The SELECT finds 1 row. The EXECUTE finds 0 rows.

Example:

mysql> set names utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> drop table if exists utf;
Query OK, 0 rows affected (0.01 sec)

mysql> create table utf (utf32 varchar(5) character set utf32,
    ->                   utf16 varchar(5) character set utf16,
    ->                   utf8 varchar(5) character set utf8,
    ->                   utf8mb3 varchar(5) character set utf8mb3);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into utf values (0x0000ff9d,0xff9d,0xefbe9d,0xefbe9d);
Query OK, 1 row affected (0.00 sec)

mysql> insert into utf values (0x00010384,0xd800df84,0xf0908e84,0xf0908e84);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select utf32,utf16,utf8,utf8mb3,hex(utf32),hex(utf16),hex(utf8),hex(utf8mb3) from utf where utf16 > 0xf0908e84;
+-------+-------+------+---------+------------+------------+-----------+--------------+
| utf32 | utf16 | utf8 | utf8mb3 | hex(utf32) | hex(utf16) | hex(utf8) | hex(utf8mb3) |
+-------+-------+------+---------+------------+------------+-----------+--------------+
| ン   | ン   | ン  | ン     | 0000FF9D   | FF9D       | EFBE9D    | EFBE9D       |
|
[15 Nov 2007 2:19] MySQL Verification Team
Thank you for the bug report. Verified as described.
[30 Nov 2007 1:59] Rob Lanphier
This looks like a duplicate of http://bugs.mysql.com/bug.php?id=14052 , or at least related.
[30 Nov 2007 15:09] Peter Gulutzan
It's not a duplicate of bug#14052 but yes it's 'related".

The very latest source-code-downloadable version of
MysQL supports utf16 and utf32.
[6 Dec 2007 9:20] Alexander Barkov
Petter, I'm sorry - I don't understand the problem.

The SQL query in direct execution example and the SQL
query in prepared execution example are not equal.

The first query searches for strings which are greater
than "U+F090 followed by U+8E84". Both strings are
greater, which seems to be correct:

mysql> select hex(utf16), utf16>0xf0908e84 from utf;
+------------+------------------+
| hex(utf16) | utf16>0xf0908e84 |
+------------+------------------+
| FF9D       |                1 |
| D800DF84   |                1 | <- this one is compared as U+FFFD REPLACEMENT CHARACTER
+------------+------------------+
2 rows in set (0.00 sec)

The query in prepared execution example
searches for strings greater than U+010384,
and is equal to either of these three queries:

mysql> select utf32,utf16,utf8,utf8mb3,hex(utf32),hex(utf16),hex(utf8),hex(utf8mb3) from utf where utf16 > _utf8 0xf0908e84;
Empty set (0.00 sec)

mysql> select utf32,utf16,utf8,utf8mb3,hex(utf32),hex(utf16),hex(utf8),hex(utf8mb3) from utf where utf16 > _utf32 0x00010384;
Empty set (0.00 sec)

mysql> select utf32,utf16,utf8,utf8mb3,hex(utf32),hex(utf16),hex(utf8),hex(utf8mb3) from utf where utf16 > _utf16 0xD800DF84;
Empty set (0.00 sec)

Non of these queries return any results, which seems to be correct too.

Please clarify...
[8 Dec 2007 14:11] Alexander Barkov
Peter, please read my previous message again.

Using a BMP character as an example, you do:

SELECT ... WHERE ucs2_column='ÿ';    <- in prepared statement
SELECT ... WHERE ucs2_column=0xC3BF; <- in direct execution

The first query compares ucs2 column to an utf8 string literal.
It converts the string from ucs2 to utf8. Comparison is
done against U+00FF.

The second query compares ucs2 column to a binary literal.
No conversion happens. Comparison is done against U+CFBF.
[3 Jan 2008 18:57] Peter Gulutzan
I now understand the comments of Alexander Barkov.
This is not a bug.