Bug #20471 LIKE search fails with indexed utf8 char column
Submitted: 14 Jun 2006 21:08 Modified: 25 Oct 2006 19:14
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.12-beta-debug/5.0BK/4.1BK OS:Linux (SUSE 10.0 / 64-bit)
Assigned to: Alexander Barkov CPU Architecture:Any

[14 Jun 2006 21:08] Peter Gulutzan
Description:
I create a table with one ut8 column and I insert 6 rows.
I create an index (if there's no index, there's no problem).
I do a LIKE search for some Japanese characters.
The search fails, but the row is there.
If I then delete a different row, the search succeeds.

How to repeat:
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table tu3 (s1 char(5) character set utf8);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into tu3 values ('a'),('b'),(null),('ペテルグル'),('ü'),('Y');
Query OK, 6 rows affected (0.02 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> create index itu3 on tu3 (s1);
Query OK, 6 rows affected (0.04 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from tu3 where s1 like 'ペテ%';
Empty set (0.01 sec)

mysql> delete from tu3 where s1 = 'Y';
Query OK, 1 row affected (0.01 sec)

mysql> select * from tu3 where s1 like 'ペテ%';
+-----------------+
| s1              |
+-----------------+
| ペテルグル |
+-----------------+
1 row in set (0.00 sec)
[14 Jun 2006 23:00] MySQL Verification Team
Thank you for the bug report.

mysql> create table tu3 (s1 char(5) character set utf8);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into tu3 values
    -> ('a'),('b'),(null),('ペテルグル'),('ü'),('Y');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> create index itu3 on tu3 (s1);
Query OK, 6 rows affected (0.05 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from tu3 where s1 like 'ペテ%';
Empty set (0.00 sec)

mysql> delete from tu3 where s1 = 'Y';
Query OK, 1 row affected (0.01 sec)

mysql> select * from tu3 where s1 like 'ペテ%';
+-----------------+
| s1              |
+-----------------+
| ペテルグル |
+-----------------+
1 row in set (0.00 sec)

Version 4.1BK isn't affected:

mysql> insert into tu3 values ('a'),('b'),(null),('ペテルグル'),('ü'),('Y'); (s1);
Query OK, 6 rows affected (0.02 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> create index itu3 on tu3 (s1);
Query OK, 6 rows affected (0.05 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from tu3 where s1 like 'ペテ%';';
+-----------------+
| s1              |
+-----------------+
| ペテルグル |
+-----------------+
1 row in set (0.02 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 4.1.21-debug |
+--------------+
1 row in set (0.00 sec)

mysql>
[28 Jun 2006 18:58] Sveta Smirnova
We have duplicate bug #20744
[29 Jun 2006 11:59] Alexander Barkov
The same problem exists in 4.1 with utf8_general_ci

ysql> select version();
+------------------+
| version()        |
+------------------+
| 4.1.19-debug-log |
+------------------+
1 row in set (0.02 sec)

mysql> drop table if exists t1;
Query OK, 0 rows affected (0.01 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table t1 (s1 char(5) character set utf8 collate utf8_unicode_ci);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values
    -> ('a'),('b'),(null),('ペテルグル'),('ü'),('Y');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> create index it1 on t1 (s1);
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select s1 as before_delete_unicode_ci from t1 where s1 like 'ペテ%';
Empty set (0.00 sec)

mysql> delete from t1 where s1 = 'Y';
Query OK, 1 row affected (0.00 sec)

mysql> select s1 as after_delete_unicode_ci from t1 where s1 like 'ペテ%';
Empty set (0.00 sec)
[29 Jun 2006 12:00] Alexander Barkov
The same problem exists in 4.1 with utf8_bin:

mysql> select version();
+------------------+
| version()        |
+------------------+
| 4.1.19-debug-log |
+------------------+
1 row in set (0.00 sec)

mysql> drop table if exists t1;
Query OK, 0 rows affected (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create table t1 (s1 char(5) character set utf8 collate utf8_bin);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values
    -> ('a'),('b'),(null),('ペテルグル'),('ü'),('Y');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> create index it1 on t1 (s1);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select s1 as before_delete_bin from t1 where s1 like 'ペテ%';
Empty set (0.00 sec)

mysql> delete from t1 where s1 = 'Y';
Query OK, 1 row affected (0.00 sec)

mysql> select s1 as after_delete_bin from t1 where s1 like 'ペテ%';
Empty set (0.00 sec)
[30 Jun 2006 10:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8547
[6 Jul 2006 8:45] Sergei Glukhov
ok to push
[18 Jul 2006 5:44] Alexander Barkov
Suggestions made by Sergey Petrunia after reviewing the patch:

* Please document the fact that max_sort_char may have different meanings
* When merging, please try to use my_like_range_mb as it was after fix for BUG#16674.

Setting back to "In progress".
[20 Jul 2006 10:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/9374
[14 Sep 2006 11:55] Alexander Barkov
Pushed into 4.1-kt
[17 Oct 2006 2:24] mantani shigeki
We Japanese concern about this problem.
When I faced this problem, I changed the utf8 column to binary.

When will these problem be fixed in version 5.0.x?
[17 Oct 2006 4:20] Alexander Barkov
Dear Shigeki, this fix will most likely appear in the next 5.0 release
(5.0.27)
[19 Oct 2006 2:22] mantani shigeki
Dear Alexander Barkov,

We hope a fast release.

Thank you!
[19 Oct 2006 5:17] Alexander Barkov
Appeared in 5.0 main, marked as 5.0.27
Appeared in 5.1 main, marked as 5.1.13-beta
[25 Oct 2006 19:14] Paul DuBois
Noted in 4.1.22, 5.0.30, 5.1.13 changelogs.