Bug #82330 | Don't recursively-evaluate stopword after tokenize | ||
---|---|---|---|
Submitted: | 25 Jul 2016 5:55 | Modified: | 6 Apr 9:15 |
Reporter: | Tsubasa Tanaka (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S1 (Critical) |
Version: | 5.7.13, 5.7.22, 8.0.16, 8.3.0 | OS: | CentOS (6.6) |
Assigned to: | CPU Architecture: | Any | |
Tags: | fulltext, NGRAM |
[25 Jul 2016 5:55]
Tsubasa Tanaka
[25 Jul 2016 6:09]
MySQL Verification Team
Hello Tanaka-San, Thank you for the report and test case. Verified as described with 5.7.13 build. Thanks, Umesh
[28 Jun 2018 9:12]
MySQL Verification Team
Bug #91437 marked as duplicate of this one
[20 May 2019 3:46]
Tsubasa Tanaka
8.0.16 is affected too.
[6 Apr 7:06]
Tsubasa Tanaka
I seem MySQL 8.3.0 has not been affected this issue. mysql83 10> INSERT INTO t1 VALUES (1, '泣かないでbaby'); Query OK, 1 row affected (0.01 sec) mysql83 10> SELECT * FROM t1 WHERE MATCH(val) AGAINST('baby' IN BOOLEAN MODE); +-----+---------------------+ | num | val | +-----+---------------------+ | 1 | 泣かないでbaby | +-----+---------------------+ 1 row in set (0.02 sec) mysql83 10> SET GLOBAL innodb_ft_aux_table = 'd1/t1'; Query OK, 0 rows affected (0.00 sec) mysql83 10> SELECT * FROM information_schema.INNODB_FT_INDEX_CACHE ORDER BY position; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | 泣 | 2 | 2 | 1 | 2 | 0 | | b | 2 | 2 | 1 | 2 | 2 | | か | 2 | 2 | 1 | 2 | 3 | | な | 2 | 2 | 1 | 2 | 6 | | い | 2 | 2 | 1 | 2 | 9 | | で | 2 | 2 | 1 | 2 | 12 | | b | 2 | 2 | 1 | 2 | 15 | | y | 2 | 2 | 1 | 2 | 18 | +------+--------------+-------------+-----------+--------+----------+ 8 rows in set (0.01 sec)
[6 Apr 9:15]
Tsubasa Tanaka
Sorry, above comment is wrong (My environment is set ngram_token_size=1 When ngram_token_size=2, this issue can be reproduced as of 8.3.0 mysql83 8> SELECT @@ngram_token_size; +--------------------+ | @@ngram_token_size | +--------------------+ | 2 | +--------------------+ 1 row in set (0.00 sec) mysql83 8> CREATE TABLE t1 (num serial, val varchar(32), FULLTEXT KEY fts_with_ngram (val) WITH PARSER ngram); Query OK, 0 rows affected (0.12 sec) mysql83 8> INSERT INTO t1 VALUES (1, '泣かないでbaby'); Query OK, 1 row affected (0.01 sec) mysql83 8> SELECT * FROM t1 WHERE MATCH(val) AGAINST('baby' IN BOOLEAN MODE); Empty set (0.00 sec) mysql83 8> SET GLOBAL innodb_ft_aux_table = 'd1/t1'; Query OK, 0 rows affected (0.00 sec) mysql83 8> SELECT * FROM information_schema.INNODB_FT_INDEX_CACHE ORDER BY position; +--------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +--------+--------------+-------------+-----------+--------+----------+ | 泣か | 2 | 2 | 1 | 2 | 0 | | かな | 2 | 2 | 1 | 2 | 3 | | ない | 2 | 2 | 1 | 2 | 6 | | いで | 2 | 2 | 1 | 2 | 9 | | でb | 2 | 2 | 1 | 2 | 12 | +--------+--------------+-------------+-----------+--------+----------+ 5 rows in set (0.00 sec)
[25 Oct 21:15]
Yoshiaki Yamasaki
This bug also reproduced in 8.4.3. mysql> select version(); +-----------+ | version() | +-----------+ | 8.4.3 | +-----------+ 1 row in set (0.00 sec) mysql> SELECT @@ngram_token_size; +--------------------+ | @@ngram_token_size | +--------------------+ | 2 | +--------------------+ 1 row in set (0.00 sec) mysql> CREATE TABLE t1 (num serial, val varchar(32), FULLTEXT KEY fts_with_ngram (val) WITH PARSER ngram); Query OK, 0 rows affected (0.11 sec) mysql> INSERT INTO t1 VALUES (1, '泣かないでbaby'); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM t1 WHERE MATCH(val) AGAINST('baby' IN BOOLEAN MODE); Empty set (0.00 sec) mysql> SET GLOBAL innodb_ft_aux_table = 'd1/t1'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT * FROM information_schema.INNODB_FT_INDEX_CACHE ORDER BY position; +--------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +--------+--------------+-------------+-----------+--------+----------+ | 泣か | 2 | 2 | 1 | 2 | 0 | | かな | 2 | 2 | 1 | 2 | 3 | | ない | 2 | 2 | 1 | 2 | 6 | | いで | 2 | 2 | 1 | 2 | 9 | | でb | 2 | 2 | 1 | 2 | 12 | +--------+--------------+-------------+-----------+--------+----------+ 5 rows in set (0.00 sec)