Bug #57737 Character sets: search fails with like, contraction, index
Submitted: 26 Oct 2010 13:03 Modified: 11 Jan 2011 16:42
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.6.1-m4, 5.6.99-m5 OS:Linux (SUSE 11 64-bit)
Assigned to: Alexander Barkov CPU Architecture:Any

[26 Oct 2010 13:03] Peter Gulutzan
Description:
I create a table with a utf16 column.
I insert a few rows, including one
which will be affected by a contraction
rule ('two letters = one unit') in the
collation I choose.
I do a LIKE search; okay.
I create an index.
I do the same LIKE search; not okay.

This happens only with utf16 and utf32.
This happens only with collations which
allow contractions, such as Danish 'aa',
Czech 'ch', and traditional Spanish 'ch'.

How to repeat:
set names utf8;
drop table if exists t1;
create table t1 (s1 varchar(10) character set utf16 collate utf16_danish_ci);
insert into t1 values ('a'),('ae'),('aé'),('aa');
select * from t1 where s1 like 'a%';
create index i on t1 (s1);
select * from t1 where s1 like 'a%'; 

set names utf8;
drop table if exists t1;
create table t1 (s1 varchar(10) character set utf16 collate utf16_czech_ci);
insert into t1 values ('c'),('ce'),('cé'),('ch');
select * from t1 where s1 like 'c%';
create index i on t1 (s1);
select * from t1 where s1 like 'c%';
[26 Oct 2010 13:13] Valeriy Kravchuk
Verified with current mysql-next-mr on Ubuntu 10.04 also.
[10 Nov 2010 16:18] Alexander Barkov
There is a similar problem with prefix keys:

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> drop table if exists t1;
Query OK, 0 rows affected (0.00 sec)

mysql> create table t1 (s1 varchar(10) character set utf8 collate utf8_czech_ci);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values ('c'),('ce'),('cé'),('ch');
Query OK, 4 rows affected (0.01 sec)

mysql> select * from t1 where s1 like 'ch%';
+------+
| s1   |
+------+
| ch   |
+------+
1 row in set (0.00 sec)

mysql> alter table t1 add index (s1(1));
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from t1 where s1 like 'ch%';
Empty set (0.01 sec)
[10 Nov 2010 17:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123473
[11 Nov 2010 9:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123539
[23 Nov 2010 7:06] Alexander Barkov
Similar problem happens for UCS2 for strings with ignorable character and TAB in the end.
Other Unicode character sets (UTF8, UTF16, UTF32) are not affected.

mysql> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE t1 (a VARCHAR(10) COLLATE ucs2_unicode_ci) ENGINE=MYISAM;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 VALUES ('a\0\0\0\0\0\t'),('a'),('b'),('c'),('d'),('e');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT HEX(a) FROM t1 WHERE a LIKE 'a%';
+------------------------------+
| HEX(a)                       |
+------------------------------+
| 0061000000000000000000000009 |
| 0061                         |
+------------------------------+
2 rows in set (0.00 sec)

mysql> ALTER TABLE t1 ADD KEY(a);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT HEX(a) FROM t1 WHERE a LIKE 'a%';
+--------+
| HEX(a) |
+--------+
| 0061   |
+--------+
1 row in set (0.00 sec)
[23 Nov 2010 12:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124735
[26 Nov 2010 10:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125117
[26 Nov 2010 13:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125139

3398 Alexander Barkov	2010-11-26
      Bug#57737 Character sets: search fails with like, contraction, index
      Additional 5.6 specific patch.
      - fixing problem with ctype_utf16_uca.test:
        splitting functions my_fill_mb2 into two separate
        functions, for ucs2 and for utf16.
      - adding tests for collations added in 5.6: xxx_unicode_520_ci
[26 Nov 2010 13:39] Alexander Barkov
Pushed into mysql-5.5-bugteam [5.5.8]
Pushed into mysql-trunk-bugfixing [5.6.1-m5]
[26 Nov 2010 13:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125149
[5 Dec 2010 12:37] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[11 Dec 2010 17:25] Paul DuBois
Bug does not appear in any released 5.6.x version.

Setting report to Need Merge pending push to 5.5.x.
[17 Dec 2010 12:51] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:georgi.kodinov@oracle.com-20101217124733-p1ivu6higouawv8l) (version source revid:georgi.kodinov@oracle.com-20101126153433-4dbn9nhn2fzehejo) (merge vers: 5.5.8) (pib:24)
[11 Jan 2011 16:42] Paul DuBois
Noted in 5.5.9 changelog.

In rare cases, LIKE expressions failed for an indexed column that
used a collation containing contractions.