MySQL Bugs: #10253: compound index length and utf8 char set produces invalid query results

Bug #10253	compound index length and utf8 char set produces invalid query results
Submitted:	29 Apr 2005 1:23	Modified:	23 Jun 2005 10:02
Reporter:	Clint Byrum	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	4.1.11	OS:	Linux (Linux)
Assigned to:	Alexander Barkov	CPU Architecture:	Any

Description:
I have a reproducable bug in MySQL 4.1.11 and 5.0.2 where a strange combination of data and index length is producing erroneous results. This only happens with a very specific set of circumstances, where there is a compound index in which the field you're querying on is limited to a number of characters. This only seems to happen after data is moved using INSERT ... SELECT;

I have placed a data file containing the data that causes this bug at a URL which I will give to the developers privately. It is copyrighted data, and so cannot be made publicly available. The script to reproduce, using that file, is below. I am confident that this will happen any time one imports data into a utf-8 table and then selects out of it.

Here are the results from running the code below against the MySQL Max 4.1.11 binaries from mysql.com on RedHat Linux 8.0. They are identical to tests performed Debian Sarge using MySQL 5.0.2.

CitiesUsingEquals
0
id      city    country_abbreviation    latitude        longitude
28      Durban  IR      27.05000000     62.21666670
33      Durban  SF      -29.85000000    31.01666670
38      Durban  FR      43.53333330     0.60000000
43      Durban  FR      43.01666670     1.35000000
48      Durban  FR      43.00000000     2.81666670
CitiesUsingLike
5
id      city    country_abbreviation    latitude        longitude
28      Durban  IR      27.05000000     62.21666670
33      Durban  SF      -29.85000000    31.01666670
38      Durban  FR      43.53333330     0.60000000
43      Durban  FR      43.01666670     1.35000000
48      Durban  FR      43.00000000     2.81666670
CitiesUsingEqualsNoId
5
id      city    country_abbreviation    latitude        longitude
28      Durban  IR      27.05000000     62.21666670
33      Durban  SF      -29.85000000    31.01666670
38      Durban  FR      43.53333330     0.60000000
43      Durban  FR      43.01666670     1.35000000
48      Durban  FR      43.00000000     2.81666670
CitiesUsingLikeNoId
5

How to repeat:
drop table if exists FeatureBrief;
create table FeatureBrief (
 Feature_int_ID int,
 FeatureType_str_code varchar(5),
 Admin1_str_code        varchar(5),
 Country_str_code varchar(2),
 Feature_int_font_code int,
 Feature_str_name varchar(120),
 Feature_str_local_name varchar(120),
 Feature_dec_lat        numeric(12,7),
 Feature_dec_lon        numeric(12,7),
 Feature_str_lat        varchar(12),
 Feature_str_lon        varchar(12),
 Feature_dec_source_lat numeric(12,7),
 Feature_dec_source_lon numeric(12,7),
 Feature_str_source_lat varchar(12),
 Feature_str_source_lon varchar(12)
) type=myisam character set utf8;

load data local infile 'Feature.brief.txt' into table FeatureBrief fields terminated by '\t' lines terminated by '\r\n';

create index FeatureBrief_Feature_str_name_index on FeatureBrief(Feature_str_name(7));

drop table if exists citiesBrief;
create table citiesBrief (
 id int unsigned auto_increment primary key,
 city varchar(100),
 country_abbreviation char(3),
 latitude decimal(16,8),
 longitude decimal(16,8)
) type=myisam character set=utf8;

-- Note that there is no extended character information being selected
-- in this insert.
insert into citiesBrief
 select
 null,
 F.Feature_str_name,
 F.Country_str_code,
 F.Feature_dec_lat,
 F.Feature_dec_lon
 from FeatureBrief F;

-- This index produces the problem. Notice that 0 rows are selected on the first query using =
create index citiesBrief_city_id_index on citiesBrief(city(7),id);

select * from citiesBrief where city = 'Durban';
select FOUND_ROWS() as CitiesUsingEquals;
-- These show that the problem does not manifest on the source table
--select * from FeatureBrief where Feature_str_name = 'Durban';
--select FOUND_ROWS() as FeatureUsingEquals;
select * from citiesBrief where city like 'Durban';
select FOUND_ROWS() as CitiesUsingLike;
--select * from FeatureBrief where Feature_str_name like 'Durban';
--select FOUND_ROWS() as FeatureUsingLike;

-- recreating the index with just the city field, the problem goes away
alter table citiesBrief drop index citiesBrief_city_id_index;
create index citiesBrief_city_id_index on citiesBrief(city(7));

select * from citiesBrief where city = 'Durban';
select FOUND_ROWS() as CitiesUsingEqualsNoId;
select * from citiesBrief where city like 'Durban';
select FOUND_ROWS() as CitiesUsingLikeNoId;

Suggested fix:
Whatever is going wrong here, the equals query should not fail where the like query does not! The data is _identical_ in all 5 resulting columns, so this clearly looks like a problem with the indexing code.

More info, this does not occurr on MySQL 4.0.25 in Debian sarge.

Forgot to mention: verified with BK source server on Slackware 10.0

A simplified sql script reproducing this problem:

drop table if exists t1;
create table t1
(
 id int not null,
 city varchar(20) not null,
 key (city(7),id)
) type=myisam character set=utf8;
insert into t1 values (1,'Durban North');
insert into t1 values (2,'Durban');
select * from t1 where city = 'Durban';

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/25678

Fixed in 4.1.13.

Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented in 4.1.13 changelog; closed.