Bug #26057 Falcon: UTF8 searches for accented character fail if index exists
Submitted: 4 Feb 2007 0:20 Modified: 20 May 2007 5:37
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version: OS:Linux (SUSE 10.0 / 64-bit)
Assigned to: Kevin Lewis CPU Architecture:Any

[4 Feb 2007 0:20] Peter Gulutzan
Description:
I create an indexed table with one UTF8 column.
I insert 'â' (a with circumflex).
I try to search, using "= 'a'", "> 'a'", or "< 'a'".
No results.

There is a slight similarity to my earlier bug reports,
Bug#22179 UCS2 searches fail if index exists
Bug#22180 Case-insensitive searches fail if index exists
which are now closed.
But the collation is not exotic, there's no similarity to
Bug#23689 Falcon: searches fail if exotic collation and index exists

ChangeSet@1.2417.1.1, 2007-02-01 18:19:26-05:00

How to repeat:
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table tuf8 (s1 varchar(5) character set utf8) engine=falcon;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tuf8 values ('â');
Query OK, 1 row affected (0.00 sec)

mysql> select * from tuf8 where s1 = 'a';
+------+
| s1   |
+------+
| â   |
+------+
1 row in set (0.00 sec)

mysql> create index ituf8 on tuf8 (s1);
Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from tuf8 where s1 = 'a';
Empty set (0.00 sec)

mysql> select * from tuf8 where s1 < 'a';
Empty set (0.00 sec)

mysql> select * from tuf8 where s1 > 'a';
Empty set (0.00 sec)

mysql> select s1, hex(s1) from tuf8;
+------+---------+
| s1   | hex(s1) |
+------+---------+
| â   | C3A2    |
+------+---------+
1 row in set (0.00 sec)
[5 Feb 2007 14:13] MySQL Verification Team
Thank you for the bug report. Verified as described.
[13 Feb 2007 11:53] Hakan Küçükyılmaz
Added test case falcon_bug_26057.test to mysql-5.1-falcon tree.

Regards, Hakan
[9 May 2007 19:07] Kevin Lewis
Fixed with recent code changes for handling MySQL character sets.
This test shows the problems when a multibyte character has the same weight as a single byte character.
[14 May 2007 17:19] Hakan Küçükyılmaz
test case falcon_bug_26057 passes now:

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

falcon_bug_26057               [ pass ]             92
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.
The servers were restarted 1 times
Spent 0.092 seconds actually executing testcases
[20 May 2007 5:37] MC Brown
A note has been added to the 6.0.1 changelog.