Bug #57884 SELECT .. LIKE .. COLLATE etc.
Submitted: 1 Nov 2010 9:00 Modified: 9 Nov 2010 21:26
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:4.1 + OS:Any
Assigned to: CPU Architecture:Any
Tags: qc

[1 Nov 2010 9:00] Peter Laursen
Description:
According to http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html the LIKE operator does no handle COLLATE (it is not listed as supported). Same applies to other simple string operators ('=', '<>') etc.  

But you may include a COLLATE clause without getting en error or even a warning,

How to repeat:
SET NAMES UTF8;
SELECT 'a' LIKE 'å' COLLATE utf8_danish_ci; -- '0'
SELECT 'a' = 'å' COLLATE utf8_danish_ci; -- '0'
SHOW WARNINGS; -- empty set

Suggested fix:
In preferred order: 

1) Support it. Handle COLLATE with string comparisons
2) Return an error
3) Return a warning
4) Document in above mentioned docs page
[1 Nov 2010 9:10] Peter Laursen
same with the variants

SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- 0

-- however

SELECT STRCMP((SELECT 'a' COLLATE utf8_danish_ci),(SELECT 'å' COLLATE utf8_danish_ci)); -- (-1)
SHOW WARNINGS; -- empty set
[1 Nov 2010 9:11] Peter Laursen
sorry my last post should have read like

same with the variants

SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- '0'
SHOW WARNINGS; -- empty set

-- however

SELECT STRCMP((SELECT 'a' COLLATE utf8_danish_ci),(SELECT 'å' COLLATE utf8_danish_ci)); -- (-1)
[1 Nov 2010 10:04] Peter Laursen
I will summarize:

SET NAMES utf8; 

SELECT 'a' = 'å' COLLATE utf8_danish_ci; 
-- returns 0.  I would expect 'illegal mix of collations'

SELECT 'a' COLLATE utf8_danish_ci = 'å';
-- returns '0'.  I would expect 'illegal mix of collations'

SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- 0
-- here is no 'illegal mix of collation - everything is utf8_danish_ci
-- here I would expect !0 for both

.. but looks a COLLATE clause here is simply ignored (with no warning/error).
[3 Nov 2010 17:23] Peter Gulutzan
The page
http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html
does not say that LIKE and COLLATE are illegal,
in fact it gives an example with LIKE and COLLATE.

As for comparing 'a' with 'å', it's okay, they're comparable.
[3 Nov 2010 17:29] Peter Laursen
Why does it not report an error or a warning then?
[3 Nov 2010 17:32] Peter Laursen
The example 

With WHERE: 
     SELECT *
     FROM t1
     WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k;
     SELECT *
     FROM t1
     WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci;

'mismatches' WHERE and LIKE into one.

It is not clear what COLLATE .. LIKE .. COLLATE does.  It looks like it does nothing.
[3 Nov 2010 17:48] Peter Laursen
This is accepted (my mistake)!

SELECT ('aa' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '1'

this is too:

SELECT ('aa' COLLATE utf8_unicode_ci) = ('å' COLLATE utf8_danish_ci); -- -- Illegal mix of collations

but what happens here: 

SELECT ('a') = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT ('aa') = ('å' COLLATE utf8_danish_ci); -- '1'

'a' and 'aa' are utf8_general_ci.  Collation coercibility of 'å' is highest? That would be OK - but not fully explained.  And I would expect a warning here anyway. 

PG: one advice: please try to be a little more explanatory and polite when communicating with people.  I do not write here in order to bother you personally, but in order to improve the server and the documentation (sorry but I found no option to make this private).
[3 Nov 2010 20:38] Sveta Smirnova
Thank you for the feedback.

But LIKE is fully supported and I see no reason for warning here. Page http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html is about part of queries, but not about comparision operators. So closing as not a bug.
[9 Nov 2010 20:58] Alexander Barkov
> but what happens here: 
> 
> SELECT ('a') = ('å' COLLATE utf8_danish_ci); -- '0'
> SELECT ('aa') = ('å' COLLATE utf8_danish_ci); -- '1'

It compares according to utf8_danish_ci.
When aggregating operands with different collations,
MySQL uses the SQL Standard approach of so called "collation coercibility"
(or "collation derivation" in newer SQL Standard versions).

This article briefly explains how it works:
http://mysql.mirror.tw/doc/refman/5.5/en/charset-collate-tricky.html

What happens here is:
The right side with the explicit COLLATE clause is stronger
than a string literal on the left side, so the collation
of the right side is used.

> 
> 'a' and 'aa' are utf8_general_ci.  Collation coercibility of 'å' is highest? That would
> be OK - but not fully explained.  And I would expect a warning here anyway. 

This is a standard-comliant behaviour. This is what the COLLATE clause is
for here: to change collation used for the '=' operation.
No warning is needed.
[9 Nov 2010 21:26] Peter Laursen
Yes .. I have understood this 'collation coersibility' now.

Thanks anyway!