| Bug #91437 | FULLTEXT index with ngram parser returns unexpected reults | ||
|---|---|---|---|
| Submitted: | 27 Jun 2018 12:04 | Modified: | 28 Jun 2018 9:12 |
| Reporter: | Sylvain Estevez | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: FULLTEXT search | Severity: | S2 (Serious) |
| Version: | 5.7.19, 5.7.22 | OS: | MacOS |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | fulltext, INDEX, NGRAM, search | ||
[27 Jun 2018 13:34]
Sylvain Estevez
Given the table in the reproduction steps, and adding a row with `david` as email value: searching for `david` does not return any results.
[28 Jun 2018 1:24]
Tsubasa Tanaka
I think this relates Bug#82330. Workaround is setting empty stopword table. mysql57> CREATE TABLE stopwords (value varchar(255) NOT NULL PRIMARY KEY); Query OK, 0 rows affected (0.16 sec) mysql57> SET GLOBAL innodb_ft_server_stopword_table = 'd1/stopwords'; Query OK, 0 rows affected (0.02 sec) https://bugs.mysql.com/bug.php?id=82330 (From Japan MySQL User Group)
[28 Jun 2018 9:12]
MySQL Verification Team
Hello Sylvain, Thank you for the report and test case. This is most likely duplicate of Bug #82330, please see Bug #82330 As Tanaka-San suggested, work around is ##### mysql> CREATE TABLE stopwords (value varchar(255) NOT NULL PRIMARY KEY); Query OK, 0 rows affected (0.01 sec) mysql> SET GLOBAL innodb_ft_server_stopword_table = 'd1/stopwords'; Query OK, 0 rows affected (0.00 sec) mysql> CREATE TABLE `ngram_test` ( -> email VARCHAR(150) NOT NULL, -> PRIMARY KEY (`email`), -> FULLTEXT KEY `email_ngram` (`email`) /*!50100 WITH PARSER `ngram` */ -> ); Query OK, 0 rows affected (0.04 sec) mysql> INSERT INTO `ngram_test` (`email`) VALUES ('nicolas.lich@toto.com'), ('Katlynn.Connelly@gmail.com'), ('nicolasch@foo.com'), ('alexbraico@me.com'), ('sylvain+old@flux.com'), ('6@flux.com'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> SELECT `email` FROM `ngram_test` WHERE ((MATCH(`email`) AGAINST('+nico' IN BOOLEAN MODE))); +-----------------------+ | email | +-----------------------+ | nicolas.lich@toto.com | | nicolasch@foo.com | +-----------------------+ 2 rows in set (0.00 sec) mysql> SELECT `email` FROM `ngram_test` WHERE ((MATCH(`email`) AGAINST('+sylv' IN BOOLEAN MODE))); +----------------------+ | email | +----------------------+ | sylvain+old@flux.com | +----------------------+ 1 row in set (0.00 sec) Thanks, Umesh

Description: We use a fulltext index with ngram parser to serach for partial matches on user emails. While the featire works very well, we recently ran into a case where the search returns unexpected results. With the following emails in the DB: - 'nicolas.lich@toto.com' - 'Katlynn.Connelly@gmail.com' - 'nicolasch@foo.com' - 'alexbraico@me.com' - 'sylvain+old@flux.com' - '6@flux.com' A query like `SELECT `email` FROM `ngram_test` WHERE ((MATCH(`email`) AGAINST('+nico' IN BOOLEAN MODE)));` returns all the users, which is incorrect. A query like `SELECT `email` FROM `ngram_test` WHERE ((MATCH(`email`) AGAINST('+sylv' IN BOOLEAN MODE)));` returns only one user, which is correct. See below for more details on reproducing. How to repeat: ``` CREATE TABLE `ngram_test` ( email VARCHAR(150) NOT NULL, PRIMARY KEY (`email`), FULLTEXT KEY `email_ngram` (`email`) /*!50100 WITH PARSER `ngram` */ ); INSERT INTO `ngram_test` (`email`) VALUES ('nicolas.lich@toto.com'), ('Katlynn.Connelly@gmail.com'), ('nicolasch@foo.com'), ('alexbraico@me.com'), ('sylvain+old@flux.com'), ('6@flux.com'); /* This should match values that contain 'ni' AND 'ic' AND 'co' */ SELECT `email`, (MATCH(`email`) AGAINST('+nico' IN BOOLEAN MODE)) as `score` FROM `ngram_test` WHERE ((MATCH(`email`) AGAINST('+nico' IN BOOLEAN MODE))); ``` Additionally: > SHOW VARIABLES WHERE Variable_name LIKE '%ngram%'; { "data": [ { "Variable_name": "ngram_token_size", "Value": "2" } ] } > SHOW VARIABLES WHERE Variable_name LIKE '%ft%'; { "data": [ { "Variable_name": "ft_boolean_syntax", "Value": "+ -><()~*:\"\"&|" }, { "Variable_name": "ft_max_word_len", "Value": "84" }, { "Variable_name": "ft_min_word_len", "Value": "4" }, { "Variable_name": "ft_query_expansion_limit", "Value": "20" }, { "Variable_name": "ft_stopword_file", "Value": "(built-in)" }, { "Variable_name": "innodb_ft_aux_table", "Value": "auth_service/user" }, { "Variable_name": "innodb_ft_cache_size", "Value": "8000000" }, { "Variable_name": "innodb_ft_enable_diag_print", "Value": "OFF" }, { "Variable_name": "innodb_ft_enable_stopword", "Value": "ON" }, { "Variable_name": "innodb_ft_max_token_size", "Value": "84" }, { "Variable_name": "innodb_ft_min_token_size", "Value": "3" }, { "Variable_name": "innodb_ft_num_word_optimize", "Value": "2000" }, { "Variable_name": "innodb_ft_result_cache_limit", "Value": "2000000000" }, { "Variable_name": "innodb_ft_server_stopword_table", "Value": "" }, { "Variable_name": "innodb_ft_sort_pll_degree", "Value": "2" }, { "Variable_name": "innodb_ft_total_cache_size", "Value": "640000000" }, { "Variable_name": "innodb_ft_user_stopword_table", "Value": "" } ] }