Bug #34996 confusing collation issue for German rules by using LIKE/'='
Submitted: 3 Mar 2008 12:30 Modified: 7 Mar 2008 13:58
Reporter: Susanne Ebrecht Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.0 OS:Any
Assigned to: CPU Architecture:Any

[3 Mar 2008 12:30] Susanne Ebrecht
Description:
Using latin1_german2_ci occurs different results for 'ü' and 'ue' by using '=' or using LIKE.

mysql> select text from t where text like 'mueller'\G
*************************** 1. row ***************************
text: mueller

mysql> select text from t where text='mueller'\G
*************************** 1. row ***************************
text: mueller
*************************** 2. row ***************************
text: müller

Vice versa:

mysql> select * from t where text like 'müller'\G
*************************** 1. row ***************************
  id: 2
text: müller

mysql> select * from t where text='müller'\G
*************************** 1. row ***************************
  id: 1
text: mueller
*************************** 2. row ***************************
  id: 2
text: müller

How to repeat:
Use Iso-8859-15 as environment encoding and Latin1 as client encoding:

mysql> create table t(text text character set latin1 collate latin1_german2_ci);

mysql>insert into t values('mueller'),('müller');

mysql>select length(text) from t;

Result should be 7 and 6.

Then do the selects from the description section here.
[4 Mar 2008 9:48] Susanne Ebrecht
That LIKE doesn't behave intuitive for German searching is documented here:

http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like
[7 Mar 2008 13:58] Susanne Ebrecht
Ok, this won't fix because it's given at the Standard SQL 2003:

part 2 foundations, section 8.5 <like predicate>

1) Let MCV be the value of MC and let PCV be the value of PC. If EC is specified, then let ECV be its value.
2) Let MBV be the value of MB and let PBV be the value of PB. If EB is specified, then let EBV be its value.
3) If <character like predicate> is specified, then:
...
c) Case:
i) If MCV and PCV are character strings whose lengths are variable and if the lengths of both MCV and PCV are 0 (zero), then
MC LIKE PC
is True.
ii) The <predicate>
MC LIKE PC
is True if there exists a partitioning of MCV into substrings such that:
1) A substring of MCV is a sequence of 0 (zero) or more contiguous characters of MCV and
each character of MCV is part of exactly one substring.
2) If the i-th substring of PCV is an arbitrary character specifier, then the i-th substring of MCV is any single character.
ISO/IEC 9075-2:2003 (E)
8.5 <like predicate>
©ISO/IEC 2003 – All rights reserved Predicates 387
3) If the i-th substring of PCV is an arbitrary string specifier, then the i-th substring of MCV is any sequence of 0 (zero) or more characters.
4) If the i-th substring of PCV is a single character specifier, then the i-th substring of MCV
contains exactly 1 (one) character that is equal to the character represented by the single
character specifier according to the collation of the <like predicate>.
5) The number of substrings of MCV is equal to the number of substring specifiers of PCV.

According to note 4 this means: when you use LIKE only one single character will taken for comparing.