Bug #47641 LIKE doesn't match when cp932_bin/sjis_bin collations are used
Submitted: 25 Sep 2009 6:31 Modified: 27 Aug 2014 23:02
Reporter: Mikiya Okuno Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any

[25 Sep 2009 6:31] Mikiya Okuno
Description:
When using cp932_bin or sjis_bin collations, only prefix search can be matched for specific characters called "hankaku-kana" with LIKE clause.

How to repeat:
1. create a table

CREATE TABLE t1 (a VARCHAR(100) COLLATE cp932_bin NOT NULL PRIMARY KEY);

2. insert a value

INSERT INTO t1 VALUES('ハンカクカナテスト');

3. select using LIKE clause

SELECT * FROM t1 WHERE a LIKE '%ンカク%';

This doesn't return any rows. Prefix search like following returns a row.

SELECT * FROM t1 WHERE a LIKE 'ハンカク%';
+-----------------------------+
| a                           |
+-----------------------------+
| ハンカクカナテスト                   |
+-----------------------------+
1 row in set (0.01 sec)

4. Modify collation or character set could fix the problem.

ALTER TABLE t1 MODIFY a VARCHAR(100) COLLATE cp932_japanese_ci;
SELECT * FROM t1 WHERE a LIKE '%テスト';
+-----------------------------+
| a                           |
+-----------------------------+
| ハンカクカナテスト                   |
+-----------------------------+
1 row in set (0.00 sec)

ALTER TABLE t1 MODIFY a VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_bin;
SELECT * FROM t1 WHERE a LIKE '%テスト';
+-----------------------------+
| a                           |
+-----------------------------+
| ハンカクカナテスト                   |
+-----------------------------+
1 row in set (0.00 sec)

Suggested fix:
Fix the bug on collation cp932_bin and sjis_bin.
[27 Aug 2014 23:02] Paul DuBois
Noted in 5.5.40, 5.6.21, 5.7.5 changelogs.

LIKE matches failed for code points of HALF WIDTH KATAKANA in the
sjis and cp932 character sets.
[24 Sep 2014 7:58] Laurynas Biveinis
revno: 4697
committer: mithun <mithun.c.y@oracle.com>
branch nick: mysql-5.5
timestamp: Tue 2014-08-12 17:16:51 +0530
message:
  Bug #11755818 : LIKE DOESN'T MATCH WHEN CP932_BIN/SJIS_BIN
                  COLLATIONS ARE USED.
  
  ISSUE :
  -------
  Code points of HALF WIDTH KATAKANA in SJIS/CP932 range from
  A1 to DF. In function my_wildcmp_mb_bin_impl while comparing
  such single byte code points, there is a code which compares
  signed character with unsigned character. Because of this,
  comparisons of two same code points representing a HALF
  WIDTH KATAKANA character always fails.
  
  Solution:
  ---------
  A code point of HALF WIDTH KATAKANA at-least need 8 bits.
  Promoting the variable from uchar to int will fix the issue.