Bug #6019 SELECT tries to use too short prefix index on utf8 data
Submitted: 11 Oct 2004 12:03 Modified: 29 Oct 2004 8:00
Reporter: Marko Mäkelä Email Updates:
Status: Closed Impact on me:
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.6-bk OS:Linux (GNU/Linux)
Assigned to: Sergei Golubchik CPU Architecture:Any

[11 Oct 2004 12:03] Marko Mäkelä
MySQL is using a 3-character prefix index in a prefix match (LIKE 'foob%'), although the length of the search string exceeds the length of the prefix index. As a result, it will not find any records.

How to repeat:
create table foo(a int primary key,b varchar(6),index b3(b(3))) engine=innodb default character set=utf8;
insert into foo values(1,'foo'),(2,'foobar');
select * from foo where b like 'foob%';

MySQL will report "Empty set", although it should find the record (2,'foobar').

Suggested fix:
Blank the excessive characters in the LIKE prefix with NUL characters and scan the resultset against the original LIKE prefix query.
[11 Oct 2004 12:16] Marko Mäkelä
Whoever fixes this bug should also consider adding an extra 0xff byte to the upper limit of the search area. Otherwise, if there are 3-byte UTF8 characters that are equivalent with 1-byte or 2-byte UTF8 characters, the search results may be inconsistent.

See the function my_strnncoll_utf8(). There, the end of string is treated as the 'infimum' character and 0xff (an illegal UTF8 character) is treated as the 'supremum'. Thus, "foo" < "foobar" but "foo\xff" > "foobar". Now consider the case where "foo" consists of 3-byte UTF8 characters. It is best to pad the upper limit of the LIKE prefix search with an extra 0xff byte, i.e., "foo\xff".
[11 Oct 2004 19:03] Victoria Reznichenko
Verified with 4.1.6-gamma-debug-log
[15 Oct 2004 10:05] Alexander Barkov
I fixed the bug in MySQL code, so InnoDB now works fine.

But there is still a problem with BDB, I'm continueing working on it.
[29 Oct 2004 8:00] Sergei Golubchik
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at

Additional info:

fixed in 4.1.7