Bug #32510 | LIKE search fails with indexed 'eucjpms' and 'ujis' char column | ||
---|---|---|---|
Submitted: | 20 Nov 2007 7:55 | Modified: | 31 Mar 2008 16:46 |
Reporter: | Mikiya Okuno | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.0.50 | OS: | Any |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
Tags: | bfsm_2007_12_06, charset eucjpms like |
[20 Nov 2007 7:55]
Mikiya Okuno
[20 Nov 2007 7:56]
MySQL Verification Team
Statements to create the table and populate the data into it.
Attachment: DDL.sql (, text), 618 bytes.
[20 Nov 2007 7:58]
MySQL Verification Team
The SELECT statement with a problematic pattern.
Attachment: select.sql (, text), 58 bytes.
[20 Nov 2007 7:59]
MySQL Verification Team
Example result.
Attachment: query_and_result.txt (text/plain), 1.89 KiB.
[27 Nov 2007 5:52]
Yasufumi Kinoshita
I think this is "critical" bug! This kind of problem occurs when we use the following collations. ujis_japanese_ci euckr_korean_ci gb2312_chinese_ci ucs2_bin (Anyway, it doesn't work even if we fix as the follows... another BUG?) eucjpms_japanese_ci These collations use my_like_range_simple() method. This my_like_range_simple() seems to intend to count characters, but counts bytes... So conditions of LIKE clause are mis-interpreted by unintended termination. I think thet we should fix as the either of following. (a) makes my_like_range_simple() intend to count bytes. --(ex.)---------------------------------------------------- strings/ctype-simple.c @@ -1043,7 +1043,7 @@ const char *end= ptr + ptr_length; char *min_org=min_str; char *min_end=min_str+res_length; - uint charlen= res_length / cs->mbmaxlen; + uint charlen= res_length; for (; ptr != end && min_str != min_end && charlen > 0 ; ptr++, charlen--) { ----------------------------------------------------------- (b) use my_like_range_mb() in the such collations. Regards,
[31 Jan 2008 11:53]
Alexander Barkov
A simplified test demonstrating the same problem: DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( a int(6) PRIMARY KEY, b varchar(20), c varchar(8), d varchar(8), KEY (b) ) ENGINE = innodb COLLATE ujis_bin; INSERT INTO t1 (a, b) VALUES (1, repeat(0xA4A2,5)); INSERT INTO t1 (a, b) VALUES (2, repeat(0xA4A2,10)); INSERT INTO t1 (a, b) VALUES (3, repeat(0xA4A2,11)); INSERT INTO t1 (a, b) VALUES (4, repeat(0xA4A2,12)); SELECT a, hex(b),c FROM t1 WHERE b LIKE concat(repeat(0xA4A2,10), '%'); Verified with collations: big5_bin eucjpms_bin euckr_bin gb2312_bin gbk_bin ujis_bin eucjpms_japanese_ci euckr_korean_ci gb2312_chinese_ci ujis_japanese_ci TODO: check with ucs2
[31 Jan 2008 11:54]
Alexander Barkov
The same problem repeats with ENGINE=myisam.
[31 Jan 2008 12:02]
Alexander Barkov
The same problem happens with cp932_bin and sjis_bin, with character 0x8482. Note, the character 0xA4A2 used in the previous post is not a multi-byte character in sjis/cp932, so in order to repeat the same problem with sjis/cp932 one can use the same test, however with different two-byte sequence representing a single sjis/cp932 character. For example, 0x8482. DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( a int(6) PRIMARY KEY, b varchar(20), c varchar(8), d varchar(8), KEY (b) ) COLLATE cp932_bin; INSERT INTO t1 (a, b) VALUES (1, repeat(0x8482,5)); INSERT INTO t1 (a, b) VALUES (2, repeat(0x8482,10)); INSERT INTO t1 (a, b) VALUES (3, repeat(0x8482,11)); INSERT INTO t1 (a, b) VALUES (4, repeat(0x8482,12)); SELECT a, hex(b),c FROM t1 WHERE b LIKE concat(repeat(0x8482,10), '%');
[31 Jan 2008 12:33]
Alexander Barkov
Character 0xF1F2 is a two-byte character in all these character sets: big5, eucjpms, euckr, gb2312, gbk, ujis, cp932, sjis DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( a int(6) PRIMARY KEY, b varchar(20), c varchar(8), d varchar(8), KEY (b) ) COLLATE ujis_japanese_ci; INSERT INTO t1 (a, b) VALUES (1, repeat(0xF1F2,5)); INSERT INTO t1 (a, b) VALUES (2, repeat(0xF1F2,10)); INSERT INTO t1 (a, b) VALUES (3, repeat(0xF1F2,11)); INSERT INTO t1 (a, b) VALUES (4, repeat(0xF1F2,12)); SELECT a, hex(b),c FROM t1 WHERE b LIKE concat(repeat(0xF1F2,10), '%'); With 0xF1F2, the same problem happens with: - All _bin collations: Collation IsOk RangeHandlerFuncName big5_bin No my_like_range_mb cp932_bin No my_like_range_mb eucjpms_bin No my_like_range_mb euckr_bin No my_like_range_mb gb2312_bin No my_like_range_mb gbk_bin No my_like_range_mb sjis_bin No my_like_range_sjis ujis_bin No my_like_range_mb - Almost all case insensitive collations, as in this table: Collation IsOk RangeHandlerFuncName big5_chinese_ci Yes my_like_range_big5 cp932_japanese_ci Yes my_like_range_mb eucjpms_japanese_ci No my_like_range_simple euckr_korean_ci No my_like_range_simple gb2312_chinese_ci No my_like_range_simple gbk_chinese_ci Yes my_like_range_gbk sjis_japanese_ci Yes my_like_range_mb ujis_japanese_ci No my_like_range_simple * "IsOk == Yes" means that SELECT works fine * "IsOk == No" means incorrect behavior (bug)
[31 Jan 2008 13:06]
Alexander Barkov
More information for _bin collations: This bug can be repeated with mysql-5.0.22 with these collations: big5_bin cp932_bin eucjpms_bin euckr_bin gb2312_bin gbk_bin sjis_bin ujis_bin With mysql-5.0.52 all the above _bin collations work fine!
[31 Jan 2008 13:09]
Alexander Barkov
For _ci collations, mysql-5.0.22 and mysql-5.0.52 work similar: Collation IsOk RangeHandlerFuncName big5_chinese_ci Yes my_like_range_big5 cp932_japanese_ci Yes my_like_range_mb eucjpms_japanese_ci No my_like_range_simple euckr_korean_ci No my_like_range_simple gb2312_chinese_ci No my_like_range_simple gbk_chinese_ci Yes my_like_range_gbk sjis_japanese_ci Yes my_like_range_mb ujis_japanese_ci No my_like_range_simple
[4 Feb 2008 7:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/41621 ChangeSet@1.2568, 2008-02-04 11:10:40+04:00, bar@mysql.com +26 -0 Bug#32510 LIKE search fails with indexed 'eucjpms' and 'ujis' char column Problem: some collation handlers called incorrect version of my_like_range_xxx(), which led to wrong min_str and max_str, so like range optimizer threw away good records. Fix: changing the wrong handlers to call proper version of my_like_range_xxx().
[20 Feb 2008 12:10]
Alexander Barkov
Pushed into 5.0.58-rpl
[25 Mar 2008 11:22]
Bugs System
Pushed into 5.0.60
[25 Mar 2008 11:22]
Bugs System
Pushed into 5.1.24-rc
[26 Mar 2008 19:00]
Bugs System
Pushed into 6.0.5-alpha
[31 Mar 2008 16:46]
Jon Stephens
Documented bugfix in the 5.0.60, 5.1.23-ndb-6.3.11, 5.1.24, and 6.0.5 changelogs as follows: Queries using LIKE on tables having indexed CHAR columns using either of the eucjpms or ujis character sets did not return correct results.