Bug #22052 | Trailing spaces are not removed from UNICODE fields in an index | ||
---|---|---|---|
Submitted: | 6 Sep 2006 13:04 | Modified: | 14 Dec 2006 3:23 |
Reporter: | Paul McCullagh (Basic Quality Contributor) (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 4.1, 5.0 | OS: | |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
Tags: | compression, INDEX, spaces, Unicode |
[6 Sep 2006 13:04]
Paul McCullagh
[8 Sep 2006 9:19]
Sveta Smirnova
Thank you for the report. Verified as described on Linux using last BK sources.
[14 Sep 2006 14:45]
Trudy Pelzer
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php CHAR values are fixed-size. The MySQL Reference Manual says: "The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length." For more detail, please see: http://dev.mysql.com/doc/refman/5.1/en/char.html
[14 Sep 2006 19:02]
Trudy Pelzer
My apologies; the CHAR behaviour I described above is for data storage and not index storage.
[18 Sep 2006 6:51]
Alexander Barkov
This bug is the reason of wrong ORDER BY for an indexed column: drop table if exists t1; create table t1 ( a char(10) unicode not null, index a (a) ); insert into t1 values (repeat(0x201f, 10)); insert into t1 values (repeat(0x2020, 10)); insert into t1 values (repeat(0x2021, 10)); select hex(a) from t1 order by a; -- wrong result: +------------------------------------------+ | hex(a) | +------------------------------------------+ | 2020202020202020202020202020202020202020 | | 201F201F201F201F201F201F201F201F201F201F | | 2021202120212021202120212021202120212021 | +------------------------------------------+ 3 rows in set (0.01 sec) alter table t1 drop index a; select hex(a) from t1 order by a; -- correct result: +------------------------------------------+ | hex(a) | +------------------------------------------+ | 201F201F201F201F201F201F201F201F201F201F | | 2020202020202020202020202020202020202020 | | 2021202120212021202120212021202120212021 | +------------------------------------------+ 3 rows in set (0.00 sec) The reason of wrong order of 0x2020 ("DAGGER") is in the mentioned code - all trailing 0x20 bytes are incorrectly stripped. Suggested fix: strip only 0x0020 ("SPACE") sequences.
[18 Sep 2006 10:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/12118 ChangeSet@1.2528, 2006-09-18 15:29:29+05:00, bar@mysql.com +4 -0 Bug#22052 Trailing spaces are not removed from UNICODE fields in an index Fix: using charset-aware functions cs->cset->lengthsp() and cs->cset->fill() instead of single byte code which is not UCS2 compatible.
[26 Sep 2006 10:30]
Sergei Glukhov
ok to push
[3 Oct 2006 9:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/12987 ChangeSet@1.2554, 2006-10-03 14:11:57+05:00, bar@mysql.com +4 -0 Bug#22052 Trailing spaces are not removed from UNICODE fields in an index Fix: using charset-aware functions cs->cset->lengthsp() and cs->cset->fill() instead of single byte code which is not UCS2 compatible.
[11 Oct 2006 10:52]
Alexander Barkov
Pushed into 5.0-rpl, marked as 5.0.26
[12 Oct 2006 9:16]
Alexander Barkov
Pushed into 5.1-rpl, marked as 5.1.12
[16 Nov 2006 16:45]
Alexander Barkov
Appeared in 4.1.23 common Appeared in 5.0.32 common Appeared in 5.1.14 common Note for the doc team: Please mention in ChangeLog this information: Those using "CHAR(N) CHARACTER SET ucs2" columns must recreate indexes after upgrade.
[14 Dec 2006 3:23]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html Documented bugfix for 4.1.23, 5.0.32, and 5.1.14.
[9 Jan 2007 19:48]
Christian Hammers
Reading [16 Nov 2006 17:45] Alexander Barkov "Those using "CHAR(N) CHARACTER SET ucs2" columns must recreate indexes after upgrade." I wonder how that can be ensured on semi automated upgrades like distributions providing mysql packages have to face it. Does mysql_upgrade_shell fixes this problem or at least mentions it? What impact does not recreating have, will the bug just reappear or will something crash? :) bye, -christian-
[15 Jan 2007 14:35]
Alexander Barkov
Christian, >Does mysql_upgrade_shell fixes this problem or at least mentions it? Sorry, no. You need to run "ALTER TABLE t1 ENGINE=XXX" for all affected tables to rebuild indexes. >What impact does not recreating have, will the bug just reappear or will >something crash? :) It will most likely report "table t1 is corrupted, try to repair it".
[1 Feb 2007 2:46]
James Day
Alexander, please confirm that only MyISAM tables are affected by this bug. Jon, once confirmed, please do the usual modification to the changelog to add MyISAM to the description so people know which engines are affected.
[1 Feb 2007 4:54]
Alexander Barkov
James, right, only MyISAM tables are affected.