Bug #37244 Character sets: short utf8_bin weight_string value
Submitted: 6 Jun 2008 13:14 Modified: 1 Sep 2010 7:08
Reporter: Peter Gulutzan Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.5, 5.6, 6.0.6-alpha-debug OS:Linux (SUSE 10 | 32-bit)
Assigned to: Assigned Account CPU Architecture:Any
Triage: Triaged: D3 (Medium)

[6 Jun 2008 13:14] Peter Gulutzan
Description:
I create a table with three columns:
utf32 collate utf32_bin,
utf16 collate utf16_bin,
utf8 collate utf8_bin.
I insert a supplementary character:
10384 UGARITIC LETTER DELTA.
I select hex(weight_string() for each column.
The utf8 and utf32 values are both '0384'.

How to repeat:
create table t
     (utf32 char(1) collate utf32_bin,
      utf16 char(1) collate utf16_bin,
      utf8  char(1) collate utf8_bin);
insert into t values (0x0000ff9d,0xff9d,0xefbe9d);
insert into t values (0x00010384,0xd800df84,0xf0908e84);
select hex(utf16),hex(weight_string(utf16)) from t;
select hex(utf32),hex(weight_string(utf32)) from t;
select hex(utf8), hex(weight_string(utf8))  from t;

Example run:

mysql> create table t
    ->      (utf32 char(1) collate utf32_bin,
    ->       utf16 char(1) collate utf16_bin,
    ->       utf8  char(1) collate utf8_bin);
Query OK, 0 rows affected (0.09 sec)

mysql> insert into t values (0x0000ff9d,0xff9d,0xefbe9d);
Query OK, 1 row affected (0.00 sec)

mysql> insert into t values (0x00010384,0xd800df84,0xf0908e84);
Query OK, 1 row affected (0.00 sec)

mysql> select hex(utf16),hex(weight_string(utf16)) from t;
+------------+---------------------------+
| hex(utf16) | hex(weight_string(utf16)) |
+------------+---------------------------+
| FF9D       | FF9D                      |
| D800DF84   | D800DF84                  |
+------------+---------------------------+
2 rows in set (0.00 sec)

mysql> select hex(utf32),hex(weight_string(utf32)) from t;
+------------+---------------------------+
| hex(utf32) | hex(weight_string(utf32)) |
+------------+---------------------------+
| 0000FF9D   | FF9D                      |
| 00010384   | 0384                      |
+------------+---------------------------+
2 rows in set (0.00 sec)

mysql> select hex(utf8), hex(weight_string(utf8))  from t;
+-----------+--------------------------+
| hex(utf8) | hex(weight_string(utf8)) |
+-----------+--------------------------+
| EFBE9D    | FF9D                     |
| F0908E84  | 0384                     |
+-----------+--------------------------+
2 rows in set (0.00 sec)
[6 Jun 2008 16:03] Sveta Smirnova
Thank you for the report.

Verified as described.
[30 Aug 2010 12:48] Alexander Barkov
The bug is reproducible with 5.6:

--------------
DROP TABLE IF EXISTS t1
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
CREATE TABLE  t1
     (utf32 CHAR(1) CHARACTER SET utf32 COLLATE utf32_bin,
      utf16 CHAR(1) CHARACTER SET utf16 COLLATE utf16_bin,
      utf8mb4 CHAR(1) COLLATE utf8mb4_bin)
--------------

Query OK, 0 rows affected (0.03 sec)

--------------
INSERT INTO t1 VALUES (0x0000ff9d,0xff9d,0xefbe9d)
--------------

Query OK, 1 row affected (0.00 sec)

--------------
INSERT INTO t1 VALUES (0x00010384,0xd800df84,0xf0908e84)
--------------

Query OK, 1 row affected (0.02 sec)

--------------
INSERT INTO t1 VALUES (_utf32 0x385, _utf32 0x385, _utf32 0x385)
--------------

Query OK, 1 row affected (0.00 sec)

--------------
INSERT INTO t1 VALUES (_utf32 0x100000, _utf32 0x100000, _utf32 0x100000)
--------------

Query OK, 1 row affected (0.00 sec)

--------------
SELECT hex(utf16),hex(weight_string(utf16)) FROM t1 ORDER BY utf16
--------------

+------------+---------------------------+
| hex(utf16) | hex(weight_string(utf16)) |
+------------+---------------------------+
| DBC0DC00   | 0000                      |
| D800DF84   | 0384                      |
| 0385       | 0385                      |
| FF9D       | FF9D                      |
+------------+---------------------------+
4 rows in set (0.00 sec)

--------------
SELECT hex(utf32),hex(weight_string(utf32)) FROM t1 ORDER BY utf32
--------------

+------------+---------------------------+
| hex(utf32) | hex(weight_string(utf32)) |
+------------+---------------------------+
| 00100000   | 0000                      |
| 00010384   | 0384                      |
| 00000385   | 0385                      |
| 0000FF9D   | FF9D                      |
+------------+---------------------------+
4 rows in set (0.00 sec)

--------------
SELECT hex(utf8mb4), hex(weight_string(utf8mb4)) FROM t1 ORDER BY utf8mb4
--------------

+--------------+-----------------------------+
| hex(utf8mb4) | hex(weight_string(utf8mb4)) |
+--------------+-----------------------------+
| F4808080     | 0000                        |
| F0908E84     | 0384                        |
| CE85         | 0385                        |
| EFBE9D       | FF9D                        |
+--------------+-----------------------------+
4 rows in set (0.00 sec)
[30 Aug 2010 12:51] Alexander Barkov
The bug is reproducible with 5.5: 
Note: although the weight_string() function does not exist 5.5 yet,
this script demonstrates that ORDER BY returns results in a wrong order:

--------------
DROP TABLE IF EXISTS t1
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
CREATE TABLE  t1
     (utf32 CHAR(1) CHARACTER SET utf32 COLLATE utf32_bin,
      utf16 CHAR(1) CHARACTER SET utf16 COLLATE utf16_bin,
      utf8mb4 CHAR(1) COLLATE utf8mb4_bin)
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
INSERT INTO t1 VALUES (0x0000ff9d,0xff9d,0xefbe9d)
--------------

Query OK, 1 row affected (0.01 sec)

--------------
INSERT INTO t1 VALUES (0x00010384,0xd800df84,0xf0908e84)
--------------

Query OK, 1 row affected (0.00 sec)

--------------
INSERT INTO t1 VALUES (_utf32 0x385, _utf32 0x385, _utf32 0x385)
--------------

Query OK, 1 row affected (0.00 sec)

--------------
INSERT INTO t1 VALUES (_utf32 0x100000, _utf32 0x100000, _utf32 0x100000)
--------------

Query OK, 1 row affected (0.00 sec)

--------------
SELECT hex(utf16) FROM t1 ORDER BY utf16
--------------

+------------+
| hex(utf16) |
+------------+
| DBC0DC00   |
| D800DF84   |
| 0385       |
| FF9D       |
+------------+
4 rows in set (0.01 sec)

--------------
SELECT hex(utf32) FROM t1 ORDER BY utf32
--------------

+------------+
| hex(utf32) |
+------------+
| 00100000   |
| 00010384   |
| 00000385   |
| 0000FF9D   |
+------------+
4 rows in set (0.00 sec)

--------------
SELECT hex(utf8mb4) FROM t1 ORDER BY utf8mb4
--------------

+--------------+
| hex(utf8mb4) |
+--------------+
| F4808080     |
| F0908E84     |
| CE85         |
| EFBE9D       |
+--------------+
4 rows in set (0.00 sec)
[1 Sep 2010 7:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117292

3251 Alexander Barkov	2010-09-01
      This changeset is only adding tests to cover:
        Bug#37244 Character sets: short utf8_bin weight_string value
      
      The bug was fixed earlier, by the patch for 
      Bug#55980 Character sets: supplementary character _bin ordering is wrong
[1 Sep 2010 7:08] Alexander Barkov
Duplicate for:

Bug#55980 Character sets: supplementary character _bin ordering is wrong
[2 Oct 2010 18:13] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alexander.nozdrin@oracle.com-20101002180948-852x1cuv7c6i85ea) (version source revid:alexander.nozdrin@oracle.com-20101002180857-an32jpuwzemsp4f2) (merge vers: 5.6.1-m4) (pib:21)
[2 Oct 2010 18:15] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)