Bug #76405 FTS with MeCab or ngram parser can set for MyISAM table
Submitted: 20 Mar 2015 8:17 Modified: 15 Apr 2015 14:22
Reporter: tsubasa tanaka (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.7.6 OS:Any
Assigned to: Dachao Gao CPU Architecture:Any
Tags: doc, fts, fulltext, MeCab

[20 Mar 2015 8:17] tsubasa tanaka
Description:
ngram FT parser and MeCab FT parser, which was introduced 5.7.6-m16, is described as "InnoDB FullText Parser" in doc and relnote.

MySQL :: MySQL 5.7 Reference Manual :: 12.9.9 InnoDB MeCab Full-Text Parser Plugin https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-mecab.html

MySQL :: MySQL 5.7 Reference Manual :: 12.9.8 InnoDB ngram Full-Text Parser https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngram.html

MySQL :: MySQL 5.7 Release Notes :: Changes in MySQL 5.7.6 (2015-03-09, Milestone 16) http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-6.html

But, actually, they can be set for MyISAM table.
Is it a bug or a shortage of doc?

How to repeat:
* ngram

mysql57> create table t1 (num int, val varchar(32)) Engine = MyISAM;
Query OK, 0 rows affected (0.02 sec)

mysql57> INSERT INTO t1 VALUES (1, 'one two three, いちにいさん');
Query OK, 1 row affected (0.01 sec)

mysql57> SELECT @@ngram_token_size;
+--------------------+
| @@ngram_token_size |
+--------------------+
|                  2 |
+--------------------+
1 row in set (0.00 sec)

mysql57> ALTER TABLE t1 ADD FULLTEXT KEY (val) WITH PARSER ngram;
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0

$ /usr/mysql/5.7.6/bin/myisam_ftdump -d t1 0
        0            0.8613265 e,
        0            0.8613265 ee
        0            0.8613265 hr
        0            0.8613265 ne
        0            0.8613265 on
        0            0.8613265 re
        0            0.8613265 th
        0            0.8613265 tw
        0            0.8613265 wo
        0            0.8613265 いさ
        0            0.8613265 いち
        0            0.8613265 さん
        0            0.8613265 ちに
        0            0.8613265 にい

That's well parsed by 2-gram parser.

* MeCab

mysql57> create table t1 (num int, val varchar(32)) Engine = MyISAM;
Query OK, 0 rows affected (0.02 sec)

mysql57> INSERT INTO t1 VALUES (1, 'one two three, いちにいさん');
Query OK, 1 row affected (0.01 sec)

mysql57> ALTER TABLE t1 ADD FULLTEXT KEY(val) WITH PARSER mecab;
Query OK, 1 row affected (0.16 sec)
Records: 1  Duplicates: 0  Warnings: 0

$ /usr/mysql/5.7.6/bin/myisam_ftdump -d t1 0
        0            1.4258175
        0            0.8421108 ,
        0            0.8421108 one
        0            0.8421108 three
        0            0.8421108 two
        0            0.8421108 いち
        0            0.8421108 にいさん

$ echo 'one two three, いちにいさん' | mecab
one     名詞,固有名詞,組織,*,*,*,*
two     名詞,一般,*,*,*,*,*
three   名詞,一般,*,*,*,*,*
,       名詞,サ変接続,*,*,*,*,*
いち    名詞,一般,*,*,*,*,いち,イチ,イチ
にいさん        名詞,一般,*,*,*,*,にいさん,ニイサン,ニーサン
EOS

Correctly parsed by MeCab, mysqm_ftdump's result is same as mecab command's one.

Suggested fix:
Is it a bug or a shortage of doc?
[8 Apr 2015 11:22] Erlend Dahl
The FTS parser can be used by MyISAM. So this is a documentation issue.
[15 Apr 2015 14:22] Daniel Price
Posted by developer:
 
ngram and MeCab content has been revised to remove information suggesting that the ngram and MeCab full-text parser plugins are InnoDB-only features. The parsers are also supported with MyISAM tables.

Thank you for the bug report.