Bug #48585 Character sets: latin1_german2_ci search fails
Submitted: 5 Nov 2009 22:13 Modified: 5 Nov 2009 22:55
Reporter: Peter Gulutzan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.41 OS:Linux (SUSE 11.1 64-bit)
Assigned to: Assigned Account CPU Architecture:Any

[5 Nov 2009 22:13] Peter Gulutzan
Description:
I create a table with an indexed latin1_german2_ci varchar(2) column.
I search for a value 'aeae' which should match two a-diaereses.
No result.

If I remove the index: all is well.

If I change the column length to varchar(4): all is well.

How to repeat:
drop table if exists t;
create table t (s1 varchar(2) character set latin1 collate latin1_german2_ci,
s2 varchar(2) character set latin1 collate latin1_german2_ci);
create index i on t (s1,s2);
insert into t values ('ää','ää');
select * from t where s1 = 'aeae';
[5 Nov 2009 22:55] MySQL Verification Team
Thank you for the bug report. Verified as described.
[5 Aug 2010 9:42] Alexander Barkov
The bug is reproducible with:

INNODB + latin1_german2_ci
INNODB + utf8_german2_ci

Everything works as expected for:
MYISAM + latin1_german2_ci
MYISAM + utf8_german2_ci
[5 Aug 2010 9:51] Alexander Barkov
The problem is reproducible with MYISAM if I insert two records:

drop table if exists t;
create table t (s1 varchar(2) character set latin1 collate latin1_german2_ci,
s2 varchar(2) character set latin1 collate latin1_german2_ci);
create index i on t (s1,s2);
insert into t values ('ää','ää');
insert into t values ('ää','ää');
select * from t where s1 = 'aeae';
[5 Aug 2010 10:51] Alexander Barkov
The same problem is repeatable with a single column:

set names utf8;
drop table if exists t;
create table t (s1 varchar(2) character set utf8 collate utf8_german2_ci) engine=myisam;
create index i on t (s1);
insert into t values ('ää'),('ää');
select * from t where s1 = 'aeae';

-> Empty set
[5 Aug 2010 11:20] Alexander Barkov
The problem happens on optimizer level.
The condition is in fact interpreted as:

WHERE s1='ae';

  instead of

WHERE s1='aeae';

I.e. the constant is cut to the column length, which is 2 characters.