Bug #57884 | SELECT .. LIKE .. COLLATE etc. | ||
---|---|---|---|
Submitted: | 1 Nov 2010 9:00 | Modified: | 9 Nov 2010 21:26 |
Reporter: | Peter Laursen (Basic Quality Contributor) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 4.1 + | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | qc |
[1 Nov 2010 9:00]
Peter Laursen
[1 Nov 2010 9:10]
Peter Laursen
same with the variants SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0' SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- 0 -- however SELECT STRCMP((SELECT 'a' COLLATE utf8_danish_ci),(SELECT 'å' COLLATE utf8_danish_ci)); -- (-1) SHOW WARNINGS; -- empty set
[1 Nov 2010 9:11]
Peter Laursen
sorry my last post should have read like same with the variants SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0' SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- '0' SHOW WARNINGS; -- empty set -- however SELECT STRCMP((SELECT 'a' COLLATE utf8_danish_ci),(SELECT 'å' COLLATE utf8_danish_ci)); -- (-1)
[1 Nov 2010 10:04]
Peter Laursen
I will summarize: SET NAMES utf8; SELECT 'a' = 'å' COLLATE utf8_danish_ci; -- returns 0. I would expect 'illegal mix of collations' SELECT 'a' COLLATE utf8_danish_ci = 'å'; -- returns '0'. I would expect 'illegal mix of collations' SELECT ('a' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '0' SELECT (SELECT 'a' COLLATE utf8_danish_ci) = (SELECT 'å' COLLATE utf8_danish_ci); -- 0 -- here is no 'illegal mix of collation - everything is utf8_danish_ci -- here I would expect !0 for both .. but looks a COLLATE clause here is simply ignored (with no warning/error).
[3 Nov 2010 17:23]
Peter Gulutzan
The page http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html does not say that LIKE and COLLATE are illegal, in fact it gives an example with LIKE and COLLATE. As for comparing 'a' with 'å', it's okay, they're comparable.
[3 Nov 2010 17:29]
Peter Laursen
Why does it not report an error or a warning then?
[3 Nov 2010 17:32]
Peter Laursen
The example With WHERE: SELECT * FROM t1 WHERE _latin1 'Müller' COLLATE latin1_german2_ci = k; SELECT * FROM t1 WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci; 'mismatches' WHERE and LIKE into one. It is not clear what COLLATE .. LIKE .. COLLATE does. It looks like it does nothing.
[3 Nov 2010 17:48]
Peter Laursen
This is accepted (my mistake)! SELECT ('aa' COLLATE utf8_danish_ci) = ('å' COLLATE utf8_danish_ci); -- '1' this is too: SELECT ('aa' COLLATE utf8_unicode_ci) = ('å' COLLATE utf8_danish_ci); -- -- Illegal mix of collations but what happens here: SELECT ('a') = ('å' COLLATE utf8_danish_ci); -- '0' SELECT ('aa') = ('å' COLLATE utf8_danish_ci); -- '1' 'a' and 'aa' are utf8_general_ci. Collation coercibility of 'å' is highest? That would be OK - but not fully explained. And I would expect a warning here anyway. PG: one advice: please try to be a little more explanatory and polite when communicating with people. I do not write here in order to bother you personally, but in order to improve the server and the documentation (sorry but I found no option to make this private).
[3 Nov 2010 20:38]
Sveta Smirnova
Thank you for the feedback. But LIKE is fully supported and I see no reason for warning here. Page http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html is about part of queries, but not about comparision operators. So closing as not a bug.
[9 Nov 2010 20:58]
Alexander Barkov
> but what happens here: > > SELECT ('a') = ('å' COLLATE utf8_danish_ci); -- '0' > SELECT ('aa') = ('å' COLLATE utf8_danish_ci); -- '1' It compares according to utf8_danish_ci. When aggregating operands with different collations, MySQL uses the SQL Standard approach of so called "collation coercibility" (or "collation derivation" in newer SQL Standard versions). This article briefly explains how it works: http://mysql.mirror.tw/doc/refman/5.5/en/charset-collate-tricky.html What happens here is: The right side with the explicit COLLATE clause is stronger than a string literal on the left side, so the collation of the right side is used. > > 'a' and 'aa' are utf8_general_ci. Collation coercibility of 'å' is highest? That would > be OK - but not fully explained. And I would expect a warning here anyway. This is a standard-comliant behaviour. This is what the COLLATE clause is for here: to change collation used for the '=' operation. No warning is needed.
[9 Nov 2010 21:26]
Peter Laursen
Yes .. I have understood this 'collation coersibility' now. Thanks anyway!