Bug #29246 Falcon: searches fail if prefix index on latin1_german2_ci column
Submitted: 20 Jun 2007 17:06 Modified: 13 Dec 2008 9:59
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:4.1, 5.0, 5.1, 6.0 bzr,6.0.1-alpha-debug OS:Linux (SUSE 10 64-bit)
Assigned to: Sergey Vojtovich CPU Architecture:Any
Tags: F_INDEX
Triage: Triaged: D2 (Serious) / R3 (Medium) / E3 (Medium)

[20 Jun 2007 17:06] Peter Gulutzan
Description:
I create a Falcon table with a varchar column, collation latin1_german2_ci.
I create an index on the column.
It's a prefix index (formerly called a 'partial' index).
I insert 'u with diaeresis', which should equal 'ue' with this collation.
I search for 'ue'.
I find nothing.
With engine=myisam, I'd find something.

How to repeat:
mysql> create table t1 (s1 varchar(2) character set latin1 collate latin1_german2_ci) engine=falcon;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values (0xdc); /* i.e. 'Ü' */
Query OK, 1 row affected (0.00 sec)

mysql> create index i1 on t1 (s1(1));
Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 = 'ue';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.03 sec)

mysql> alter table t1 engine=myisam;
Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select count(*) from t1 where s1 = 'ue';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.01 sec)
[20 Jun 2007 22:57] Hakan Küçükyılmaz
Verified as described. Added test case falcon_bug_29246.test.
[24 Mar 2008 13:27] Peter Gulutzan
See also
Bug#20447 Problem with prefix keys with contractions and expansions.
[26 Jun 2008 21:54] Kevin Lewis
Another character-set related bug.
[2 Jul 2008 20:58] Sveta Smirnova
Same problem with InnoDB: bug #37820
[2 Jul 2008 21:10] Kevin Lewis
Since this character set problem occurs with both Falcon and InnoDB, I am assigning this bug to Category; Server-Character Sets.
[3 Jul 2008 9:18] Sveta Smirnova
Bug #37820 was marked as duplicate of this one.
[4 Jul 2008 14:00] Sergei Golubchik
InnoDB in 5.1 is also affected
[1 Dec 2008 16:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60299

2922 Sergey Vojtovich	2008-12-01
      BUG#29246 - Falcon: searches fail if prefix index on
                  latin1_german2_ci column
      
      Searching falcon table with prefix index on collations that
      have characters with multiple weights may return no result
      when multi-weight characters/sequences are truncated by
      prefix bounds.
      
      E.g. searching for german 'ue' in an index that has entry for
      u-umlaut (or vice versa) will return no result.
      
      Fixed that falcon creates incorrect keys and uses wrong function
      for prefix search.
[4 Dec 2008 11:24] Alexander Barkov
Hi Sergey,

The patch at http://lists.mysql.com/commits/60299 looks fine.

One suggestion, for this piece of code:

-							fldLen = falcon_strntrunc(cs, partialLength, fldString, fldLen);
-							keyLen = falcon_strntrunc(cs, partialLength, keyString, keyLen);
+							// We must use falcon_strnncoll() for partial key comparision,
+							// as falcon_strnncollsp() cannot handle expansion/contraction.
+							char padChar = falcon_get_pad_char(cs);
+							char minSortChar = falcon_get_min_sort_char(cs);
+							fldLen = MySQLCollation::computeKeyLength(fldLen, fldString, padChar,
minSortChar);
+							keyLen = MySQLCollation::computeKeyLength(keyLen, keyString, padChar,
minSortChar);
+							cmp = falcon_strnncoll(cs, fldString, fldLen, keyString, keyLen, true);
 							}
+						else
+							cmp = falcon_strnncollsp(cs, fldString, fldLen, 
+							                       keyString, keyLen, true);

As discussed, the first computeKeyLength() seems to be unnecessary.
It should be fine to pass whole field value.
Please check if it's true.

Thanks!
[4 Dec 2008 12:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60591

2922 Sergey Vojtovich	2008-12-04
      BUG#29246 - Falcon: searches fail if prefix index on
                  latin1_german2_ci column
      
      Searching falcon table with prefix index on collations that
      have characters with multiple weights may return no result
      when multi-weight characters/sequences are truncated by
      prefix bounds.
      
      E.g. searching for german 'ue' in an index that has entry for
      u-umlaut (or vice versa) will return no result.
      
      Fixed that falcon creates incorrect keys and uses wrong function
      for prefix search.
[4 Dec 2008 12:21] Alexander Barkov
http://lists.mysql.com/commits/60591 looks fine for me.
[9 Dec 2008 9:15] Lars-Erik Bjørk
To the extent of my knowledge, http://lists.mysql.com/commits/60591 looks fine to me too.

Personally I am no big fan of nested a ? b : c; constructs except for printfs, but I'll let it pass :)

Ok to push from me.
[11 Dec 2008 14:19] Bugs System
Pushed into 6.0.9-alpha  (revid:svoj@mysql.com-20081204110012-4a8qep7e0u8rjndg) (version source revid:hky@sun.com-20081210172504-jca50gguoy4fm3xb) (pib:5)
[13 Dec 2008 9:59] MC Brown
A note has been added to the 6.0.9 changelog: 

Searching for text values on a column using a character set that provides multi-weight characters and sequences on an INNODB or FALCON table with an index would fail to find the expanded value.