Bug #90229 Bug #78048 has not been fixed properly for ngram indexes
Submitted: 27 Mar 2018 14:15 Modified: 27 Mar 2018 14:55
Reporter: Sveta Smirnova (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7.21 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[27 Mar 2018 14:15] Sveta Smirnova
Description:
Bug #78048 has not been fixed properly for ngram indexes.

It cannot find particular strings which uppercase letters.

How to repeat:
CREATE TABLE `ngram_simple` ( 
`i` int(11) NOT NULL AUTO_INCREMENT, 
`txt` text COLLATE utf8mb4_bin NOT NULL, 
PRIMARY KEY (`i`), 
FULLTEXT KEY `fx_txts` (`txt`) ) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
insert into ngram_simple (txt) values ('CompP&C01');
insert into ngram_simple (txt) values ('CompP&C02');
insert into ngram_simple (txt) values ('CompP&C03');
insert into ngram_simple (txt) values ('CompP&C04');
insert into ngram_simple (txt) values ('CompP&C05');
insert into ngram_simple (txt) values ('CompP&C06');
insert into ngram_simple (txt) values ('CompP&c04');
insert into ngram_simple (txt) values ('abc*efg');
insert into ngram_simple (txt) values ('abc&efg');
insert into ngram_simple (txt) values ('abC&Efg');
select * from ngram_simple where match(txt) against ('abc' in boolean mode);
i	txt
17	abc*efg
18	abc&efg
select * from ngram_simple where match(txt) against ('abC' in boolean mode);
i	txt
19	abC&Efg
select * from ngram_simple where match(txt) against ('C04' in boolean mode);
i	txt
13	CompP&C04
select * from ngram_simple where match(txt) against ('c04' in boolean mode);
i	txt
16	CompP&c04
alter table ngram_simple drop key fx_txts;
alter table ngram_simple add FULLTEXT KEY `fx_txts` (`txt`) with parser ngram;
optimize table ngram_simple;
Table	Op	Msg_type	Msg_text
test.ngram_simple	optimize	note	Table does not support optimize, doing recreate + analyze instead
test.ngram_simple	optimize	status	OK
select * from ngram_simple where match(txt) against ('abc' in boolean mode);
i	txt
17	abc*efg
18	abc&efg
select * from ngram_simple where match(txt) against ('abC' in boolean mode);
i	txt
19	abC&Efg
select * from ngram_simple where match(txt) against ('C04' in boolean mode);
i	txt
select * from ngram_simple where match(txt) against ('c04' in boolean mode);
i	txt
16	CompP&c04

Note different results for seaching for C04 if index uses ngram fulltext parser and if it does not.
[27 Mar 2018 14:15] Sveta Smirnova
test case for MTR

Attachment: PS-3928.test (application/octet-stream, text), 2.02 KiB.

[27 Mar 2018 14:55] Umesh Shastry
Hello Sveta,

Thank you for the report and test case.

Thanks,
Umesh
[30 Mar 2018 7:58] Zsolt Parragi
Fix and testcase for 5.7

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug78048-5.7.patch (text/x-patch), 5.67 KiB.

[13 Jun 2018 12:54] Laurynas Biveinis
Bug 90229 fix for 8.0.11

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug90229-8.0.11.patch (application/octet-stream, text), 5.39 KiB.

[14 Jun 2018 4:59] Umesh Shastry
Thank you for the contributions!

Regards,
Umesh