| Bug #20447 | Problem with prefix keys with contractions and expansions | ||
|---|---|---|---|
| Submitted: | 14 Jun 2006 5:06 | Modified: | 24 Nov 2008 15:31 |
| Reporter: | Alexander Barkov | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
| Version: | 4.0, 4.1, 5.0, 5.1 | OS: | Any (any) |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
| Tags: | disabled | ||
[14 Jun 2006 5:08]
Alexander Barkov
Sorry, there was a typo in the previous post. It should be: when 'c' is not followed by 'h'.
[15 Jun 2006 6:41]
MySQL Verification Team
Thank you for the bug report. Verified as described just changed CHARACTER SET latin2 COLLATE latin2_czech_ci to CHARACTER SET latin2 COLLATE latin2_czech_cs
[8 Dec 2006 5:36]
Alexander Barkov
The same problem happens with expansions:
mysql> create table t7 (s1 char(5) character set latin1 collate
latin1_german2_ci) engine=myisam;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t7 values ('UE'), ('Ü');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from t7 where s1 = 'UE';
+------+
| s1 |
+------+
| UE |
| Ü |
+------+
2 rows in set (0.00 sec)
mysql> create index it7 on t7 (s1);
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from t7 where s1 = 'UE';
+------+
| s1 |
+------+
| UE |
| Ü |
+------+
2 rows in set (0.00 sec)
mysql> drop index it7 on t7;
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> create index it7 on t7 (s1(1));
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from t7 where s1 = 'UE';
+------+
| s1 |
+------+
| UE |
+------+
1 row in set (0.01 sec)
[16 Oct 2007 22:07]
Lars Thalmann
See also BUG#20353.
[6 Dec 2007 15:09]
Lars Thalmann
Mikael thinks that the only solution to this bug is to disable prefix indexes on multi-byte charsets. He says that anyone with a different opinion is welcome to come up with a solution. :)
[26 Jan 2008 6:48]
Alexander Barkov
We have discussed this problem with Peter and Serg. The easiest solution to fix this problem for Unicode character sets is to use so called "Private Use Area" Unicode code points to enumerate contractions. Contractions will be represented as a single code in indexes. This approach helps to avoid changing of prefix key lengths, and should be easy to implement.
[2 Jul 2008 20:59]
Sveta Smirnova
There is similar bug #37820 occured only with InnoDB storage engine.

Description: When using a collation having contractions, prefix keys may not work correctly. For example, in Czech 'ch' is sorted between 'h' and 'i'. All UCA based collations with contractions, as well as cp1251_czech_ci and latin2_czech_ci are affected. How to repeat: in 4.0 start server with --default-character-set=czech in 4.1 use CHARACTER SET latin2 COLLATE latin2_czech_ci in the table definition: DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( a char(128) NOT NULL default '', KEY a (a(1)) ) TYPE=MYISAM; SHOW CREATE TABLE t1; INSERT INTO t1 VALUES ('h'); INSERT INTO t1 VALUES ('ch'); INSERT INTO t1 VALUES ('i'); explain select a as prefix_key from t1 where a between 'h' and 'i'; select a as prefix_key from t1 where a between 'h' and 'i'; alter table t1 drop key a; select a as no_key from t1 where a between 'h' and 'i'; alter table t1 add key (a); select a as full_key from t1 where a between 'h' and 'i'; The results are: +-------+-------+---------------+------+---------+------+------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+-------+---------------+------+---------+------+------+-------------+ | t1 | range | a | a | 1 | NULL | 1 | Using where | +-------+-------+---------------+------+---------+------+------+-------------+ +------------+ | prefix_key | +------------+ | h | | i | +------------+ +--------+ | no_key | +--------+ | h | | ch | | i | +--------+ +----------+ | full_key | +----------+ | h | | ch | | i | +----------+ I.e. the first select with prefix key does not return 'ch', which is wrong. The second and the third selects correctly return all three records. Suggested fix: Some modifications are required in prefix key handling. Probably we should only put both characters 'c' and 'h' when storing 'ch' in a prefix index, and put 'c' alone only when it is not followed by 'c'. That means the number in parentheses in a prefix key length may change its meaning from "N characters" to "N collation elements".