Bug #14716 'oe' does not match œ in utf8_unicode_ci
Submitted: 7 Nov 2005 19:11 Modified: 6 Dec 2005 21:08
Reporter: M K Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.15 OS:Windows (Windows XP)
Assigned to: Vasily Kishkin CPU Architecture:Any

[7 Nov 2005 19:11] M K
Description:
According to the documentation:
The most significant feature in utf8_unicode_ci is that it supports expansions, that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages ‘ß’ is equal to ‘ss’. 

I believe that this search:
     SELECT * from anytable1 where word = 'oeuf';

should match a row with an entry of word = 'œuf'.
Same thing for other ligatures like 'æ' and the uppercase versions of both.

Thanks.

How to repeat:
Create a row (collation uft8_unicode_ci) in a table with the value 'œuf' in it.  (That means 'egg' in French.)

Run a query similar to this one:
     SELECT * from anytable1 where word = 'oeuf';

and you will not get a match.

Suggested fix:
Update the uft8_unicode_ci collation to make this match.
[9 Nov 2005 11:43] Vasily Kishkin
Could you please write here table definition for example?
[9 Nov 2005 15:47] M K
#
# Warning!  Do not use MySQL Command Line Client (at least in Windows) for this.  Use MySQL Query Browser to correctly INSERT  the œ ligature.
#

CREATE TABLE egg (
  `col1` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT ''
)
ENGINE = MYISAM;

INSERT egg SET col1='oeuf';
INSERT egg SET col1='œuf';

# In short, these two queries should return the same results:
SELECT * FROM egg WHERE col1='oeuf';
SELECT * FROM egg WHERE col1 LIKE 'oeuf';

# More relevant queries:
# Equals (=) works fine (return both 'oeuf' and ''œuf''):
SELECT * FROM egg WHERE col1='oeuf';
SELECT * FROM egg WHERE col1='œuf';

# But LIKE and RLIKE only return one of the two:
SELECT * FROM egg WHERE col1 LIKE 'œuf';
SELECT * FROM egg WHERE col1 LIKE 'oeuf';
SELECT * FROM egg WHERE col1 RLIKE 'œuf';
SELECT * FROM egg WHERE col1 RLIKE 'oeuf';
SELECT * FROM egg WHERE col1 LIKE '%œ%';
SELECT * FROM egg WHERE col1 LIKE '%oe%';

# Each of the above only return one of the two lines.  They should all return two lines.  
# LIKE 'oeuf' should return the same as = 'oeuf'

# There might be a second bug -  the inability of INSERT egg SET col1='œuf'; to correctly insert binary C593 
# for œ MySQL Command Line Client in Windows.  This maybe a Windows System bug though.
[17 Nov 2005 8:08] Vasily Kishkin
Thanks for bug reprot. I was able to reproduce the bug on Windows. "Like" returns only one row.

SELECT * FROM egg WHERE col1='oeuf';

'oeuf'
'œuf'

SELECT * FROM egg WHERE col1 LIKE 'oeuf';

'oeuf'

It was tested in MySQL Query Browser.
[21 Nov 2005 19:32] M K
Strangely, this bug seems similar to a (fixed) bug that I had reported previously:
http://bugs.mysql.com/bug.php?id=7878

Maybe a test could be created to always test if '=' and LIKE produce the same results when accented or other special charaters are involved.

Thanks.
[24 Nov 2005 10:07] Alexander Barkov
Dear Michael,
This is expected behaviour, conformant to the SQL standard,
which requires one-to-one character comparison in LIKE operator.
Expansions is the case when LIKE works differently comparing to equals (=) operator:
- During equals, "OE ligature" is compared to "O+E",
- while during LIKE, "OE ligature" is compared only to "O",
  and then the next character after "OE ligature" is compared to "E".

Bug#7878 was really a bug. Accented letters are not expansions,
they are single letters. Thus, they are processed the same way LIKE and =.
[6 Dec 2005 21:08] M K
Thank you, but after thinking about it for a while, I still have a lot of trouble with your answer.

You are basically saying that it is OK for '=' and LIKE to have different set of matching rules for certain characters.  I really think that people are going to be very confused with functionality like this.

You are the experts, but I also feel that this should be put in the category of "won't fix" instead of "not a bug".
[6 Dec 2005 21:31] Paul DuBois
This behavior is according to standard SQL, and it is
documented here:

http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html