Bug #75001 CONCAT() of columns with same encoding but different collations behaves wrongly
Submitted: 25 Nov 2014 23:55 Modified: 9 Dec 2014 20:56
Reporter: Mark Amery Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.5.42, 5.6.23, 5.7.6 OS:Any
Assigned to: CPU Architecture:Any

[25 Nov 2014 23:55] Mark Amery
Description:
Per the documentation at http://dev.mysql.com/doc/refman/5.7/en/charset-collation-expressions.html, attempting to concatenate values from two columns which are in the same character set, but have different non-binary collations, should fail.

The docs state that,

    If both sides have the same coercibility, then:

        If both sides are Unicode, or both sides are not Unicode, it is an error.

Confusingly, there is another point listed LATER that is clearly supposed to overrule this, but it only applies in the case where one of the collations is a binary collation:

    For an operation with operands from the same character set but that mix a _bin collation and a _ci or _cs collation, the _bin collation is used. This is similar to how operations that mix nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations rather than data types.

However, the actual observed behaviour is that attempting to CONCAT() two columns that have the same CHARSET, but different nonbinary collations, succeeds and returns a string in the binary collation for their charset.

See the 'How to repeat' section for some example SQL that reproduces the issue.

Expected output: every single SELECT statement in the snippet below should be an error.

Observed output:

mysql> CREATE DATABASE bugdemo;
Query OK, 1 row affected (0.00 sec)

mysql> use bugdemo;
Database changed
mysql> 
mysql> CREATE TABLE Foo(
    ->     A varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci,
    ->     B varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
    ->     C varchar(10) CHARACTER SET utf8 COLLATE utf8_german2_ci,
    ->     D varchar(10) CHARACTER SET latin1 COLLATE latin1_danish_ci,
    ->     E varchar(10) CHARACTER SET latin1 COLLATE latin1_german1_ci
    -> );
Query OK, 0 rows affected (0.24 sec)

mysql> 
mysql> INSERT INTO Foo(A, B, C, D, E) VALUES ('a', 'b', 'c', 'd', 'e');
Query OK, 1 row affected (0.04 sec)

mysql> 
mysql> -- Should error
mysql> SELECT CONCAT(A, B) FROM Foo;
+--------------+
| CONCAT(A, B) |
+--------------+
| ab           |
+--------------+
1 row in set (0.00 sec)

mysql> 
mysql> -- Should error
mysql> SELECT CONCAT(A, C) FROM Foo;
+--------------+
| CONCAT(A, C) |
+--------------+
| ac           |
+--------------+
1 row in set (0.00 sec)

mysql> 
mysql> -- Should error
mysql> SELECT CONCAT(B, C) FROM Foo;
+--------------+
| CONCAT(B, C) |
+--------------+
| bc           |
+--------------+
1 row in set (0.00 sec)

mysql> 
mysql> -- Should error
mysql> SELECT CONCAT(D, E) FROM Foo;
+--------------+
| CONCAT(D, E) |
+--------------+
| de           |
+--------------+
1 row in set (0.00 sec)

mysql> 
mysql> SELECT COLLATION(CONCAT(A, B)) FROM Foo;
+-------------------------+
| COLLATION(CONCAT(A, B)) |
+-------------------------+
| utf8_bin                |
+-------------------------+
1 row in set (0.00 sec)

mysql> SELECT COLLATION(CONCAT(A, C)) FROM Foo;
+-------------------------+
| COLLATION(CONCAT(A, C)) |
+-------------------------+
| utf8_bin                |
+-------------------------+
1 row in set (0.00 sec)

mysql> SELECT COLLATION(CONCAT(B, C)) FROM Foo;
+-------------------------+
| COLLATION(CONCAT(B, C)) |
+-------------------------+
| utf8_bin                |
+-------------------------+
1 row in set (0.00 sec)

mysql> SELECT COLLATION(CONCAT(D, E)) FROM Foo;
+-------------------------+
| COLLATION(CONCAT(D, E)) |
+-------------------------+
| latin1_bin              |
+-------------------------+
1 row in set (0.00 sec)

How to repeat:
Paste the following into a MySQL shell:

CREATE DATABASE bugdemo;
use bugdemo;

CREATE TABLE Foo(
    A varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci,
    B varchar(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
    C varchar(10) CHARACTER SET utf8 COLLATE utf8_german2_ci,
    D varchar(10) CHARACTER SET latin1 COLLATE latin1_danish_ci,
    E varchar(10) CHARACTER SET latin1 COLLATE latin1_german1_ci
);

INSERT INTO Foo(A, B, C, D, E) VALUES ('a', 'b', 'c', 'd', 'e');

-- Should error
SELECT CONCAT(A, B) FROM Foo;

-- Should error
SELECT CONCAT(A, C) FROM Foo;

-- Should error
SELECT CONCAT(B, C) FROM Foo;

-- Should error
SELECT CONCAT(D, E) FROM Foo;

SELECT COLLATION(CONCAT(A, B)) FROM Foo;
SELECT COLLATION(CONCAT(A, C)) FROM Foo;
SELECT COLLATION(CONCAT(B, C)) FROM Foo;
SELECT COLLATION(CONCAT(D, E)) FROM Foo;

Suggested fix:
Change the behaviour to match that described in the documentation.
[9 Dec 2014 20:56] Sveta Smirnova
Thank you for the report.

Verified as described.