Bug #88395 fulltext boolean mode with mecab don't work as manual(builtin parser).
Submitted: 8 Nov 2017 3:41 Modified: 8 Nov 2017 6:23
Reporter: Meiji Kimura Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:5.7, 5.7.20 OS:Any
Assigned to: CPU Architecture:Any
Tags: MeCab

[8 Nov 2017 3:41] Meiji Kimura
Description:
MySQL 5.7 introduce mecab & ngram parser.

But fulltext search with mecab, don't work as manual(=builtin parser).
The manual said as belows, but match "some noise words" with mecab.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

 '"some words"'

Find rows that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”). Note that the " characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself. 

How to repeat:
(1) builtin parser works as manual.

drop table if exists `searches`;
CREATE TABLE `searches` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `search_description` mediumtext NOT NULL,
 PRIMARY KEY (`id`),
 FULLTEXT KEY `idx_desc` (`search_description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

insert into searches values(1 ,'some words of wisdom'),(2, 'some noise words'),(3, 'some, words of wisdom'),(4, 'words, some people'),(5, 'words, erros from some people');

select id,search_description from searches where match (search_description) against ('"some words"' in boolean mode); 

+----+-----------------------+
| id | search_description    |
+----+-----------------------+
|  1 | some words of wisdom  |
|  3 | some, words of wisdom |
+----+-----------------------+
2 rows in set (0.01 sec)

(2) mecab parser don't work as manual.
drop table if exists `searches2`;
CREATE TABLE `searches2` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `search_description` mediumtext NOT NULL,
 PRIMARY KEY (`id`),
 FULLTEXT KEY `idx_desc` (`search_description`) /*!50100 WITH PARSER `mecab` */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

insert into searches2 values(1 ,'some words of wisdom'),(2, 'some noise words'),(3, 'some, words of wisdom'),(4, 'words, some people'),(5, 'words, erros from some people');

select id,search_description from searches2 where match (search_description) against ('"some words"' in boolean mode); 
+----+-----------------------+
| id | search_description    |
+----+-----------------------+
|  1 | some words of wisdom  |
|  2 | some noise words      |
|  3 | some, words of wisdom |
+----+-----------------------+
3 rows in set (0.02 sec)

Suggested fix:
Modify fulltext search with mecab to keep compatibility.

or describe the difference between builtin and mecab in the manual.
[8 Nov 2017 6:23] Umesh Shastry
Hello Meiji-San,

Thank you for the report.
Verified as described with 5.7.20.

Thanks,
Umesh