Bug #76218 LIKE and % during diacritic insensitive search (Arabic)
Submitted: 9 Mar 2015 7:39 Modified: 5 Feb 2018 21:33
Reporter: Ehsan One Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:MySQL 5.5.38 OS:MacOS
Assigned to: CPU Architecture:Any

[9 Mar 2015 7:39] Ehsan One
Description:
I have trouble making a diacritic insensitive search with arabic text.

I have tested multiple setups for the table in question: encodings in utf8 and utf16 as well as collations in utf8_general_ci, utf16_general_ci and utf16_unicode_ci.

The search works for åä special characters. I.e:

> SELECT * FROM test WHERE text LIKE '%a%'

Would return columns where text is a, å or ä. But it won't work with the Arabic diacritics. I.e if the text is بِسْمِ and I search for بسم, I don't get any hits.

I asked the question on StackOverflow (http://stackoverflow.com/a/28891336/2964341) and it seems that LIKE or % doesn't work.

If you search using = it will work:

> SELECT * from test  WHERE text = 'بسم';

It will find both بِسْمِ and بسم however since the column is actually filled with entire sentences and not just words, for example بِسْمِ الله  won't get any hits, and since LIKE or % doesn't work (i.e I can't search with text = '%بسم'%') I won't get any hits.

How to repeat:
CREATE  TABLE so28863402 (
    id int(11) unsigned NOT NULL AUTO_INCREMENT,
    txt text COLLATE utf8_unicode_ci,   -- deliberate choice of COLLATION
    PRIMARY KEY (id)
) ENGINE=InnoDB
        DEFAULT CHARSET=utf8;
INSERT INTO so28863402 (txt) VALUES
    (UNHEX('D8A8D990D8B3D992D985D990')),  -- Using hex to avoid any copy/paste issues
    (UNHEX('D8A8D8B3D985'));  -- The values should compare equal
SELECT id, txt, HEX(txt) FROM so28863402;
SELECT txt, COUNT(*) FROM so28863402 GROUP BY txt; -- GROUP BY finds them equal.
SELECT * from so28863402
    WHERE txt = 'بسم';   -- Finds both rows (correct)
SELECT * from so28863402
    WHERE txt LIKE '%بسم%';  -- Finds one row (incorrect)
-- Further checks:
SELECT * FROM so28863402 WHERE txt  =   UNHEX(  'D8A8D8B3D985'  );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX(  'D8A8D8B3D985'  );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX('25D8A8D8B3D98525'); -- x25 is '%'
[28 Dec 2016 19:06] Shahid Anwar
Is this problem resolved now? My client is facing similar situation in his website search.
[5 Jan 2018 21:33] MySQL Verification Team
Please check with latest release e comment the results. Thanks.
[6 Feb 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[10 Jun 2018 7:14] Hendra Hendra
I hope this bug can be resolved soon.