Bug #73858 Can't use COLLATE
Submitted: 9 Sep 2014 13:00 Modified: 6 Oct 2015 17:48
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.6.12, 5.5.42, 5.6.23, 5.7.6 OS:Any
Assigned to: CPU Architecture:Any

[9 Sep 2014 13:00] Peter Laursen
Description:
Using a SELECT statement querying a single column of a single table does not allow using COLLATE specifying a collation that is vaid for data stored in the column and queried (and the defalt charset of the table as well). 

This rudimentary documentaton does not explain: http://dev.mysql.com/doc/refman/5.6/en/charset-collate.html

(and besides there is also an oddity with the  parser and the DESC keyword in such queries)

How to repeat:
SELECT VERSION(); -- 5.6.12-log (on Amazom hosted CentOS server)

USE `portal`;

SHOW  VARIABLES LIKE '%char%';
/*Variable_name             Value                                      
------------------------  -------------------------------------------
character_set_client      utf8                                       
character_set_connection  utf8                                       
character_set_database    latin1                                     
character_set_filesystem  binary                                     
character_set_results     utf8                                       
character_set_server      latin1                                     
character_set_system      utf8                                       
character_sets_dir        /rdsdbbin/mysql-5.6.12.R1/share/charsets/  
*/

SHOW CREATE TABLE `order_details`;
/* CREATE TABLE `order_details` (
  `order_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'PK',
  `txn_id` varchar(50) NOT NULL COMMENT 'Transaction to which this order belongs',
.. #cut here
) ENGINE=InnoDB AUTO_INCREMENT=43943 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT='This table stores the details of each order.'
*/

SELECT txn_id FROM `order_details` ORDER BY `order_id` LIMIT 100;  -- success

SELECT txn_id FROM `order_details` ORDER BY `order_id` COLLATE utf8_bin LIMIT 100;
-- Error Code: 1253
-- COLLATION 'utf8_bin' is not valid for CHARACTER SET 'latin1'

SELECT txn_id FROM `order_details` ORDER BY `order_id` COLLATE utf8_unicode_ci LIMIT 100;
-- Error Code: 1253
-- COLLATION 'utf8_unicode_ci' is not valid for CHARACTER SET 'latin1'

-- and here is an *oddity* with the DESC keyword when using COLLATE . This query should return the same as the 2 above IMO.
SELECT txn_id FROM `order_details` ORDER BY `order_id` DESC COLLATE utf8_bin LIMIT 100;
-- Error Code: 1064
-- You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'COLLATE utf8_bin LIMIT 100' at line 4

Suggested fix:
Probably it happens because of character_set_database being "latin1"? But this makes no sense to me.  Data are utf8.
[15 Sep 2014 16:23] Hartmut Holzgraefe
Syntax is 

  ORDER BY ... COLLATE ... DESC

not 

  ORDER BY ... DESC COLLATE ...

That explains the different error on the last statement, not the latin1/utf8 issue though ...

SELECT txn_id FROM `order_details` ORDER BY `order_id` COLLATE utf8_bin DESC LIMIT 100;
ERROR 1253 (42000): COLLATION 'utf8_bin' is not valid for CHARACTER SET 'latin1'
[15 Sep 2014 17:18] Peter Laursen
Thanks to Hartmut.

With hindsight it is logical, actually, that COLLATE must happen before SORT/ORDER BY.
[27 Jan 2015 21:11] Sveta Smirnova
Thank you for the report.

Verified as described. We should either allow to convert integer values to any character sets/table character sets or, at least, document these conversion rules for integers, converted to strings.
[6 Oct 2015 17:48] Paul DuBois
Noted in 5.7.10, 5.8.0 changelogs.

For constructs such as ORDER BY numeric_expr COLLATE collation_name,
the character set of the expression was treated as latin1, which
resulted in an error if the collation specified after COLLATE is
incompatible with latin1. Now when a numeric expression is implicitly
cast to a character expression in the presence of COLLATE, the
character set used is the one associated with the named collation.