| Bug #82330 | Don't recursively-evaluate stopword after tokenize | ||
|---|---|---|---|
| Submitted: | 25 Jul 2016 5:55 | Modified: | 6 Apr 2024 9:15 |
| Reporter: | Tsubasa Tanaka (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: FULLTEXT search | Severity: | S1 (Critical) |
| Version: | 5.7.13, 5.7.22, 8.0.16, 8.3.0 | OS: | CentOS (6.6) |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | fulltext, NGRAM | ||
[25 Jul 2016 5:55]
Tsubasa Tanaka
[25 Jul 2016 6:09]
MySQL Verification Team
Hello Tanaka-San, Thank you for the report and test case. Verified as described with 5.7.13 build. Thanks, Umesh
[28 Jun 2018 9:12]
MySQL Verification Team
Bug #91437 marked as duplicate of this one
[20 May 2019 3:46]
Tsubasa Tanaka
8.0.16 is affected too.
[6 Apr 2024 7:06]
Tsubasa Tanaka
I seem MySQL 8.3.0 has not been affected this issue.
mysql83 10> INSERT INTO t1 VALUES (1, '泣かないでbaby');
Query OK, 1 row affected (0.01 sec)
mysql83 10> SELECT * FROM t1 WHERE MATCH(val) AGAINST('baby' IN BOOLEAN MODE);
+-----+---------------------+
| num | val |
+-----+---------------------+
| 1 | 泣かないでbaby |
+-----+---------------------+
1 row in set (0.02 sec)
mysql83 10> SET GLOBAL innodb_ft_aux_table = 'd1/t1';
Query OK, 0 rows affected (0.00 sec)
mysql83 10> SELECT * FROM information_schema.INNODB_FT_INDEX_CACHE ORDER BY position;
+------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+------+--------------+-------------+-----------+--------+----------+
| 泣 | 2 | 2 | 1 | 2 | 0 |
| b | 2 | 2 | 1 | 2 | 2 |
| か | 2 | 2 | 1 | 2 | 3 |
| な | 2 | 2 | 1 | 2 | 6 |
| い | 2 | 2 | 1 | 2 | 9 |
| で | 2 | 2 | 1 | 2 | 12 |
| b | 2 | 2 | 1 | 2 | 15 |
| y | 2 | 2 | 1 | 2 | 18 |
+------+--------------+-------------+-----------+--------+----------+
8 rows in set (0.01 sec)
[6 Apr 2024 9:15]
Tsubasa Tanaka
Sorry, above comment is wrong (My environment is set ngram_token_size=1
When ngram_token_size=2, this issue can be reproduced as of 8.3.0
mysql83 8> SELECT @@ngram_token_size;
+--------------------+
| @@ngram_token_size |
+--------------------+
| 2 |
+--------------------+
1 row in set (0.00 sec)
mysql83 8> CREATE TABLE t1 (num serial, val varchar(32), FULLTEXT KEY fts_with_ngram (val) WITH PARSER ngram);
Query OK, 0 rows affected (0.12 sec)
mysql83 8> INSERT INTO t1 VALUES (1, '泣かないでbaby');
Query OK, 1 row affected (0.01 sec)
mysql83 8> SELECT * FROM t1 WHERE MATCH(val) AGAINST('baby' IN BOOLEAN MODE);
Empty set (0.00 sec)
mysql83 8> SET GLOBAL innodb_ft_aux_table = 'd1/t1';
Query OK, 0 rows affected (0.00 sec)
mysql83 8> SELECT * FROM information_schema.INNODB_FT_INDEX_CACHE ORDER BY position;
+--------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| 泣か | 2 | 2 | 1 | 2 | 0 |
| かな | 2 | 2 | 1 | 2 | 3 |
| ない | 2 | 2 | 1 | 2 | 6 |
| いで | 2 | 2 | 1 | 2 | 9 |
| でb | 2 | 2 | 1 | 2 | 12 |
+--------+--------------+-------------+-----------+--------+----------+
5 rows in set (0.00 sec)
[25 Oct 2024 21:15]
Yoshiaki Yamasaki
This bug also reproduced in 8.4.3.
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.4.3 |
+-----------+
1 row in set (0.00 sec)
mysql> SELECT @@ngram_token_size;
+--------------------+
| @@ngram_token_size |
+--------------------+
| 2 |
+--------------------+
1 row in set (0.00 sec)
mysql> CREATE TABLE t1 (num serial, val varchar(32), FULLTEXT KEY fts_with_ngram (val) WITH PARSER ngram);
Query OK, 0 rows affected (0.11 sec)
mysql> INSERT INTO t1 VALUES (1, '泣かないでbaby');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM t1 WHERE MATCH(val) AGAINST('baby' IN BOOLEAN MODE);
Empty set (0.00 sec)
mysql> SET GLOBAL innodb_ft_aux_table = 'd1/t1';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM information_schema.INNODB_FT_INDEX_CACHE ORDER BY position;
+--------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| 泣か | 2 | 2 | 1 | 2 | 0 |
| かな | 2 | 2 | 1 | 2 | 3 |
| ない | 2 | 2 | 1 | 2 | 6 |
| いで | 2 | 2 | 1 | 2 | 9 |
| でb | 2 | 2 | 1 | 2 | 12 |
+--------+--------------+-------------+-----------+--------+----------+
5 rows in set (0.00 sec)
