Bug #6043 erratic searching for diacriticals in indexed MyISAM UTF-8 table
Submitted: 12 Oct 2004 13:23 Modified: 14 Oct 2004 9:18
Reporter: Name Withheld Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.5 OS:MacOS (MacOS 10.2, Windows XP)
Assigned to: Alexander Barkov CPU Architecture:Any

[12 Oct 2004 13:23] Name Withheld
Description:
When using SELECT to search in indexed MyISAM UTF-8 tables, records containing diacriticals aren't always found.

This only occurs when such records are added after similar records which contain no diacriticals.  For example:  a record like "café" will be found correctly so long as it is added to the table before any record like "cafe".

This appears to be a UTF-8 index problem.  There's no problem for UCS-2 tables, and no problem when the table is unindexed.

Note:  this isn't the same issue as bug #5832.  We've reproduced it in a custom build of MySQL 4.1.5 (MacOS 10.2) which contains the patch to fix bug #5832.  (We've also reproduced it in the official build of MySQL Classic 4.1.5 for both MacOS 10.2 and Windows XP.)

How to repeat:
In the following test script, the third SELECT will return the wrong record:

--------------

set session character_set_client     = utf8;
set session character_set_connection = utf8;
set session character_set_results    = utf8;
set session character_set_server     = utf8;

DROP DATABASE IF EXISTS select_test;
CREATE DATABASE select_test DEFAULT CHARACTER SET utf8;
USE select_test;

CREATE TABLE terms (
    id int unsigned NOT NULL auto_increment,
    list_id smallint unsigned NOT NULL,
    term TEXT NOT NULL,
    PRIMARY KEY(id),
    INDEX(list_id, term(19))
) TYPE=MyISAM CHARSET=utf8;

INSERT INTO terms set list_id = 1, term = "testétest";
INSERT INTO terms set list_id = 1, term = "testetest";
INSERT INTO terms set list_id = 1, term = "testètest";

SELECT id, term FROM terms where (list_id = 1) AND (term = "testétest");
SELECT id, term FROM terms where (list_id = 1) AND (term = "testetest");
SELECT id, term FROM terms where (list_id = 1) AND (term = "testètest");

--------------

Here's the output from the three SELECT statements.  The third SELECT is clearly returning the wrong row.

(To see different kinds of problems, alter the script to change the order in which the three records are inserted.)

+----+------------+
| id | term       |
+----+------------+
|  1 | testétest |
+----+------------+
1 row in set (0.01 sec)

+----+------------+
| id | term       |
+----+------------+
|  1 | testétest |
|  2 | testetest  |
|  3 | testètest |
+----+------------+
3 rows in set (0.00 sec)

+----+------------+
| id | term       |
+----+------------+
|  1 | testétest |
+----+------------+
1 row in set (0.00 sec)
[12 Oct 2004 14:49] MySQL Verification Team
Tested also on Linux Slackware:

miguel@hegel:~/dbs/4.1$ bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 4.1.6-gamma-debug-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> DROP DATABASE IF EXISTS select_test;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE DATABASE select_test DEFAULT CHARACTER SET utf8;
Query OK, 1 row affected (0.00 sec)

mysql> USE select_test;
Database changed
mysql> 
mysql> CREATE TABLE terms (
    ->     id int unsigned NOT NULL auto_increment,
    ->     list_id smallint unsigned NOT NULL,
    ->     term TEXT NOT NULL,
    ->     PRIMARY KEY(id),
    ->     INDEX(list_id, term(19))
    -> ) TYPE=MyISAM CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> 
mysql> INSERT INTO terms set list_id = 1, term = "testétest";
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO terms set list_id = 1, term = "testetest";
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO terms set list_id = 1, term = "testètest";
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> SELECT id, term FROM terms where (list_id = 1) AND (term = "testétest");
+----+-----------+
| id | term      |
+----+-----------+
|  1 | testétest |
+----+-----------+
1 row in set (0.00 sec)

mysql> SELECT id, term FROM terms where (list_id = 1) AND (term = "testetest");
+----+-----------+
| id | term      |
+----+-----------+
|  1 | testétest |
|  2 | testetest |
|  3 | testètest |
+----+-----------+
3 rows in set (0.00 sec)

mysql> SELECT id, term FROM terms where (list_id = 1) AND (term = "testètest");
+----+-----------+
| id | term      |
+----+-----------+
|  1 | testétest |
+----+-----------+
1 row in set (0.00 sec)

mysql>
[14 Oct 2004 9:18] Alexander Barkov
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html