Bug #34479 | Falcon: search failure with indexed ucs2 varchar | ||
---|---|---|---|
Submitted: | 12 Feb 2008 2:39 | Modified: | 26 May 2010 17:49 |
Reporter: | Peter Gulutzan | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.5-alpha-debug | OS: | Linux (SUSE 10 | 32-bit) |
Assigned to: | Lars-Erik Bjørk | CPU Architecture: | Any |
Tags: | F_ENCODING |
[12 Feb 2008 2:39]
Peter Gulutzan
[12 Feb 2008 9:49]
MySQL Verification Team
Thank you for the bug report.
[26 Jun 2008 16:26]
Peter Gulutzan
As suspected, the bug appears with other characters whose final byte is 0x20. There are many such characters. One of them is common: 0420;CYRILLIC CAPITAL LETTER ER
[26 Jun 2008 21:50]
Kevin Lewis
Lars-Erik, I had been meaning to assign this bug to you along with a few other character-set related bugs because I think it is an isolated, yet important part of the Falcon design. I used to focus a lot of time on these, but never 'finished' them all. I can spend time with you on this area of code as you debug it.
[22 Oct 2008 10:29]
Lars-Erik Bjørk
I can also reproduce this scenario without creating the index, but simply by creating the table, inserting the values and doing the select.
[2 Dec 2008 13:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/60379 2922 lars-erik.bjork@sun.com 2008-12-02 This is a patch for bug#34479 Falcon: search failure with indexed ucs2 varchar When computing the key length, trailing spaces are removed. This was done by looking at a single byte at a time. When using f. ex the ucs2 character set (where every character is represented using two bytes), this would result in a character ending in 0x20, (f .ex Ä (0x0120) ) having its final byte 'trimmed'. I have now implemented a charset-wise version, taking into account the varying lengths of multi-byte sequences of different character sets. added file 'mysql-test/suite/falcon/t/falcon_bug_34479.test' ------------------------------------------------------------ This is a test file testing the patch. It is based on the bug report added file 'mysql-test/suite/falcon/r/falcon_bug_34479.result' -------------------------------------------------------------- This is the result file for the test. It states the expected output modified file 'storage/falcon/MySQLCollation.cpp' ------------------------------------------------- Modified the function computeKeyLength( ... ) Earlier, when removing trailing pad characters (' '), the function only looked at a single byte at a time. I have implemented a charset-wise implementation that translates the pad character into the relevant character set, and that compares relative to the (possible) multi-byte sequences of the different character sets. modified file 'storage/falcon/MySQLCollation.h' ----------------------------------------------- Changed the function computeKeyLength( ...) not to be inline, because the implementation grew big enough to clutter the header file. modified file 'storage/falcon/ha_falcon.cpp' -------------------------------------------- Added some functions giving access to the character set functions * int falcon_conv_uni_cs ( ... ) - Converts a character to the given character set * unsigned int falcon_get_mbminlen ( ... ) - Returns the minimum multi-byte sequence for the given charset * uint falcon_get_mbcharlen( ... ) - Returns the length of the current multi-byte sequence if the pointer given points to a valid header, 0 otherwise
[2 Dec 2008 13:15]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/60380 2922 lars-erik.bjork@sun.com 2008-12-02 This is a patch for bug#34479 Falcon: search failure with indexed ucs2 varchar When computing the key length, trailing spaces are removed. This was done by looking at a single byte at a time. When using f. ex the ucs2 character set (where every character is represented using two bytes), this would result in a character ending in 0x20, (f .ex Ä (0x0120) ) having its final byte 'trimmed'. I have now implemented a charset-wise version, taking into account the varying lengths of multi-byte sequences of different character sets. added file 'mysql-test/suite/falcon/t/falcon_bug_34479.test' ------------------------------------------------------------ This is a test file testing the patch. It is based on the bug report added file 'mysql-test/suite/falcon/r/falcon_bug_34479.result' -------------------------------------------------------------- This is the result file for the test. It states the expected output modified file 'storage/falcon/MySQLCollation.cpp' ------------------------------------------------- Modified the function computeKeyLength( ... ) Earlier, when removing trailing pad characters (' '), the function only looked at a single byte at a time. I have implemented a charset-wise implementation that translates the pad character into the relevant character set, and that compares relative to the (possible) multi-byte sequences of the different character sets. modified file 'storage/falcon/MySQLCollation.h' ----------------------------------------------- Changed the function computeKeyLength( ...) not to be inline, because the implementation grew big enough to clutter the header file. modified file 'storage/falcon/ha_falcon.cpp' -------------------------------------------- Added some functions giving access to the character set functions * int falcon_conv_uni_cs ( ... ) - Converts a character to the given character set * unsigned int falcon_get_mbminlen ( ... ) - Returns the minimum multi-byte sequence for the given charset * uint falcon_get_mbcharlen( ... ) - Returns the length of the current multi-byte sequence if the pointer given points to a valid header, 0 otherwise
[3 Dec 2008 19:16]
Alexander Barkov
Setting back to "In progress" as it's not correct to remove trailing spaces and minSortChar before strnncoll call. More comments sent by email.
[19 Dec 2008 11:34]
Lars-Erik Bjørk
I closed bug#30282 to be a duplicate of this one, since this is the bug with the highest priority an a beta tag. Bug#30282 could be reproduced in the following ways, which should be added to the test for this bug as well. Alternative 1: ----------- mysql> create table t6 (s1 char(1)) engine=falcon; Query OK, 0 rows affected (0.02 sec) mysql> insert into t6 values (0x20),(0x00); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> select * from t6 where s1 = 0x20; +------+ | s1 | +------+ | | +------+ 1 row in set (0.00 sec) mysql> select * from t6 where s1 = 0x00; +------+ | s1 | +------+ | | +------+ 1 row in set (0.00 sec) mysql> create unique index i6 on t6 (s1); ERROR 1582 (23000): Duplicate entry '' for key 'i6' Alternative 2: ----------- root@test>CREATE TABLE t2 (a varchar(1) CHARACTER SET UTF8 COLLATE UTF8_BIN PRIMARY KEY) Engine Falcon; Query OK, 0 rows affected (0.01 sec) root@test>INSERT INTO t2 (a) VALUES (UNHEX('00')); Query OK, 1 row affected (0.00 sec) root@test>INSERT INTO t2 (a) VALUES (UNHEX('20')); ERROR 1062 (23000): Duplicate entry ' ' for key 'PRIMARY' 6.0.9-alpha