Bug #76215 Implicit casts to string by COLLATE don't use character_set_connection
Submitted: 9 Mar 2015 1:57 Modified: 11 Mar 2015 12:42
Reporter: Alan Egerton Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.5.44, 5.6.25 OS:Any
Assigned to: CPU Architecture:Any

[9 Mar 2015 1:57] Alan Egerton
Description:
I can't think of why anyone would do this, but it arose in http://stackoverflow.com/q/28677662:

To force an expression to use a particular collation, one can explicitly specify that one of its operands has an explicit, incoercible collation (via a COLLATE clause).  Whilst it should perhaps be erroneous if that operand is not a string type, MySQL nevertheless proceeds by casting the operand to a string; one should imagine that such casts would follow the documented behaviour under "Type Conversion in Expression Evaluation"[1]:

> In MySQL 5.6, implicit conversion of a numeric or temporal value to string produces a
> value that has a character set and collation determined by the character_set_connection
> and collation_connection system variables.

However, this is not the case—the operand is cast to a string using the latin1 character set irrespective of the server's settings.

[1]: http://dev.mysql.com/doc/en/type-conversion.html

How to repeat:
mysql> SHOW VARIABLES LIKE '%_connection';
+--------------------------+-----------------+
| Variable_name            | Value           |
+--------------------------+-----------------+
| character_set_connection | utf8            |
| collation_connection     | utf8_general_ci |
+--------------------------+-----------------+
2 rows in set (0.00 sec)

mysql> SELECT 123 COLLATE utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

Suggested fix:
Values that are implicitly converted to strings, even if only by a COLLATE clause, should use the connection character set.
[9 Mar 2015 2:08] Alan Egerton
This looks as though it may be the same issue as http://bugs.mysql.com/bug.php?id=73858
[9 Mar 2015 14:42] Peter Laursen
I don't remember now why I posted the other bug report.  But it was a practical matter for me/us at the time (all my reports here are). But the test case I posted is so much reduced/generalized/abstracted that I dont remember how I came across this.
 
But what if the SELECT is wrapped in an "INSERT INTO some_table.<string-column> SELECT <expression like in this report> CONCAT <any other expression resolving to a string>.." or similar? In such case the CONCAT'ed strings need to use same CHARSET and COlLATION as the colmnn inserted to. Some requirement for a query like this could have been the issue/reason for my report.
[9 Mar 2015 14:54] Peter Laursen
I think I remember now what our issue was.

The error occured with a query calculating an MD5-checksum on a number of table columns (of any type) 'serialized into a string' using CONCAT_WS(). This checksum was used to identify if rows were to be considered identical or not *as regards important columns* by an application. 

We found a workaround but I don't rememeber what that was now either.
[11 Mar 2015 12:42] MySQL Verification Team
Thank you for the report.
Observed similar behavior on 5.5.44, 5.6.25.

Thanks,
Umesh