Bug #36961 Concat() between a UTF32 string and a binary string yields wrong result
Submitted: 26 May 2008 12:42 Modified: 25 Jul 2008 15:08
Reporter: Alexander Barkov Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any

[26 May 2008 12:42] Alexander Barkov
Description:
Concat() between  a UTF32 string and a binary string yields wrong result,
the binary part doesn't gets aligned to 4 byte.

How to repeat:
mysql> select hex(concat(_utf32 0x61, 0x61));
+--------------------------------+
| hex(concat(_utf32 0x61, 0x61)) |
+--------------------------------+
| 0000006161                     |
+--------------------------------+
1 row in set (0.00 sec)

The above result is 5 byte long, which is a wrong utf32 string.

Suggested fix:
The binary argument must be extended from 1 byte to 4 bytes
before concatenation, to be a good utf32 string.

The expected result is: 0000006100000061
[26 May 2008 12:58] MySQL Verification Team
Thank you for the bug report.
[25 Jul 2008 15:08] Alexander Barkov
Sorry, this is not a bug. When mixing an UTF32 string with a binary
string, like in the example, the binary string wins. So the result
string is binary, and no extending should happen - the strings should
be concatenated byte-wise.

Autoextending should happen only if UTF32 string has higher precedence.
For example:
- utf32 string with COLLATE clause vs binary constant
- utf32 field vs binary constant

And in these cases it works as expected:

Example of correct behavior:

mysql> select hex(concat(_utf32 0x61 collate utf32_general_ci , 0x61));
+----------------------------------------------------------+
| hex(concat(_utf32 0x61 collate utf32_general_ci , 0x61)) |
+----------------------------------------------------------+
| 0000006100000061                                         |
+----------------------------------------------------------+
1 row in set (0.04 sec)

Another example of correct behavior:

mysql> create table t1 (a varchar(10) character set utf32);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into t1 values (0x61);
Query OK, 1 row affected (0.00 sec)

mysql> select hex(concat(a, 0x61)) from t1;
+----------------------+
| hex(concat(a, 0x61)) |
+----------------------+
| 0000006100000061     |
+----------------------+
1 row in set (0.00 sec)

No bugs.

I'm sorry for confusion.