Bug #24921 LIKE searches fail if index on utf8 column
Submitted: 8 Dec 2006 17:30 Modified: 20 May 2007 5:37
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:5.1.13-falcon-alpha-debug OS:Linux (SUSE 10.0 / 64-bit)
Assigned to: Kevin Lewis CPU Architecture:Any

[8 Dec 2006 17:30] Peter Gulutzan
Description:
I create a table with an indexed UTF8 column.
The collation is not utf8_general_ci (with utf8_general_ci there's no problem).
I select with LIKE.
The returned rows don't include rows where I expected '%' to match ' '.

In a discussion on dev-jstar thread
"RE: FW: #24858 [Com,Doc->NDI]: Falcon: searches fail if partialindex
on utf8 column"
Kevin Lewis suggested:
"I think this is associated with number 1 above [i.e. a test case for
bug#20447 Problem with prefix keys with contractions and expansions],
where the code does not count up the actual number of bytes used in a
variable multibyte string.  It has an algorithm to get the compare
length that is independent of what is in the string."

Nevertheless I report it as if it's a Falcon bug.

How to repeat:
mysql> create table t4 (s1 char(4) character set utf8 collate
    -> utf8_turkish_ci, key(s1)) engine=falcon;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t4 values ('abc'),('abc '),('abcd');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t4 where s1 like 'abc%';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> alter table t4 engine=myisam;
Query OK, 3 rows affected (0.07 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t4 where s1 like 'abc%';
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)
[9 Dec 2006 1:12] MySQL Verification Team
Thank you for the bug report. Verified as described:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.13-falcon-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use test;
Database changed
mysql> create table t4 (s1 char(4) character set utf8 collate
    -> utf8_turkish_ci, key(s1)) engine=falcon;
Query OK, 0 rows affected (0.49 sec)

mysql> insert into t4 values ('abc'),('abc '),('abcd');
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t4 where s1 like 'abc%';
+----------+
| count(*) |
+----------+
|        1 | 
+----------+
1 row in set (0.01 sec)

mysql> alter table t4 engine=myisam;
Query OK, 3 rows affected (0.28 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select count(*) from t4 where s1 like 'abc%';
+----------+
| count(*) |
+----------+
|        3 | 
+----------+
1 row in set (0.00 sec)
[10 Dec 2006 19:30] Hakan Küçükyılmaz
Added test case falcon_bug_24921.test to 5.1-falcon tree.

Regards, Hakan
[9 May 2007 19:05] Kevin Lewis
Fixed with recent code changes for handling MySQL character sets.
The problem was that some character sets pad the minimum search string on a 'like' search with something other than zero.  So falcon now makes the minimum search character so that it compares like nulls.
[14 May 2007 17:03] Hakan Küçükyılmaz
test case falcon_bug_24921 passes now:

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

falcon_bug_24921               [ pass ]             86
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.
The servers were restarted 1 times
Spent 0.086 seconds actually executing testcases
[20 May 2007 5:37] MC Brown
A note has been added to the 6.0.1 changelog.