| Bug #57884 | SELECT .. LIKE .. COLLATE etc. | ||
|---|---|---|---|
| Submitted: | 1 Nov 2010 9:00 | Modified: | 9 Nov 2010 21:26 | 
| Reporter: | Peter Laursen (Basic Quality Contributor) | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) | 
| Version: | 4.1 + | OS: | Any | 
| Assigned to: | CPU Architecture: | Any | |
| Tags: | qc | ||
   [1 Nov 2010 9:00]
   Peter Laursen        
  
 
   [1 Nov 2010 9:10]
   Peter Laursen        
  same with the variants
SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- 0
-- however
SELECT STRCMP((SELECT 'a' COLLATE utf8_danish_ci),(SELECT 'å' COLLATE utf8_danish_ci)); -- (-1)
SHOW WARNINGS; -- empty set
 
   [1 Nov 2010 9:11]
   Peter Laursen        
  sorry my last post should have read like
same with the variants
SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- '0'
SHOW WARNINGS; -- empty set
-- however
SELECT STRCMP((SELECT 'a' COLLATE utf8_danish_ci),(SELECT 'å' COLLATE utf8_danish_ci)); -- (-1)
 
   [1 Nov 2010 10:04]
   Peter Laursen        
  I will summarize:
SET NAMES utf8; 
SELECT 'a' = 'å' COLLATE utf8_danish_ci; 
-- returns 0.  I would expect 'illegal mix of collations'
SELECT 'a' COLLATE utf8_danish_ci = 'å';
-- returns '0'.  I would expect 'illegal mix of collations'
SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- 0
-- here is no 'illegal mix of collation - everything is utf8_danish_ci
-- here I would expect !0 for both
.. but looks a COLLATE clause here is simply ignored (with no warning/error).
 
   [3 Nov 2010 17:23]
   Peter Gulutzan        
  The page http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html does not say that LIKE and COLLATE are illegal, in fact it gives an example with LIKE and COLLATE. As for comparing 'a' with 'å', it's okay, they're comparable.
   [3 Nov 2010 17:29]
   Peter Laursen        
  Why does it not report an error or a warning then?
   [3 Nov 2010 17:32]
   Peter Laursen        
  The example 
With WHERE: 
     SELECT *
     FROM t1
     WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k;
     SELECT *
     FROM t1
     WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci;
'mismatches' WHERE and LIKE into one.
It is not clear what COLLATE .. LIKE .. COLLATE does.  It looks like it does nothing.
 
   [3 Nov 2010 17:48]
   Peter Laursen        
  This is accepted (my mistake)!
SELECT ('aa' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '1'
this is too:
SELECT ('aa' COLLATE utf8_unicode_ci) = ('å' COLLATE utf8_danish_ci); -- -- Illegal mix of collations
but what happens here: 
SELECT ('a') = ('å' COLLATE utf8_danish_ci); -- '0'
SELECT ('aa') = ('å' COLLATE utf8_danish_ci); -- '1'
'a' and 'aa' are utf8_general_ci.  Collation coercibility of 'å' is highest? That would be OK - but not fully explained.  And I would expect a warning here anyway. 
PG: one advice: please try to be a little more explanatory and polite when communicating with people.  I do not write here in order to bother you personally, but in order to improve the server and the documentation (sorry but I found no option to make this private).
 
   [3 Nov 2010 20:38]
   Sveta Smirnova        
  Thank you for the feedback. But LIKE is fully supported and I see no reason for warning here. Page http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html is about part of queries, but not about comparision operators. So closing as not a bug.
   [9 Nov 2010 20:58]
   Alexander Barkov        
  > but what happens here: 
> 
> SELECT ('a') = ('å' COLLATE utf8_danish_ci); -- '0'
> SELECT ('aa') = ('å' COLLATE utf8_danish_ci); -- '1'
It compares according to utf8_danish_ci.
When aggregating operands with different collations,
MySQL uses the SQL Standard approach of so called "collation coercibility"
(or "collation derivation" in newer SQL Standard versions).
This article briefly explains how it works:
http://mysql.mirror.tw/doc/refman/5.5/en/charset-collate-tricky.html
What happens here is:
The right side with the explicit COLLATE clause is stronger
than a string literal on the left side, so the collation
of the right side is used.
> 
> 'a' and 'aa' are utf8_general_ci.  Collation coercibility of 'å' is highest? That would
> be OK - but not fully explained.  And I would expect a warning here anyway. 
This is a standard-comliant behaviour. This is what the COLLATE clause is
for here: to change collation used for the '=' operation.
No warning is needed.
 
   [9 Nov 2010 21:26]
   Peter Laursen        
  Yes .. I have understood this 'collation coersibility' now. Thanks anyway!
