Bug #117035 fulltext count(*) where match against('delete') found rows that not exist
Submitted: 23 Dec 2024 6:53 Modified: 7 Jan 12:42
Reporter: Zhenye Lin Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:8.0.40 OS:Any
Assigned to: CPU Architecture:Any

[23 Dec 2024 6:53] Zhenye Lin
Description:
After inserting some data into a table with a fulltext index, when querying with the keyword 'delete' through the fulltext index, data was returned, but in fact, there is no content with 'delete' in the table.

mysql> select count(*) from test.t;
+----------+
| count(*) |
+----------+
|    10024 |
+----------+
1 row in set (0.03 sec)

mysql> select count(*) from test.t where match(content) against ('delete');
+----------+
| count(*) |
+----------+
|     7176 |
+----------+
1 row in set (0.02 sec)

mysql> select * from test.t where match(content) against ('delete');
+-------+-----------------------+
| id    | content               |
+-------+-----------------------+
|  1436 | 3112336275 elderberry |
|  1452 | 0150978814 elderberry |
|  1492 | 7073588631 elderberry |
|  1499 | 8019551115 elderberry |
|  1518 | 8172327062 elderberry |
|  1521 | 2594195326 elderberry |
|  1524 | 5947236864 elderberry |
|  1525 | 5457509870 elderberry |
|  1526 | 9677026647 elderberry |
|  1527 | 1644922500 elderberry |
|  1528 | 7379734130 elderberry |
|  1529 | 1700071738 elderberry |
...

How to repeat:
1. create test table
CREATE TABLE `t` (   `id` int NOT NULL AUTO_INCREMENT,   `content` text NOT NULL,   PRIMARY KEY (`id`),   FULLTEXT KEY `content_fulltext` (`content`) /*!50100 WITH PARSER `ngram` */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

2. insert some data that `content` without 'delete'

3. query with fulltext index
select count(*) from test.t where match(content) against ('delete');
[7 Jan 12:42] MySQL Verification Team
Hello Zhenye Lin,

Thank you for the report and feedback.
IMHO this is not a bug but expected behaviour. 
Please see https://dev.mysql.com/doc/refman/8.0/en/fulltext-search-ngram.html for more details on this.

--
bin/mysql -uroot -S/tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.40 MySQL Community Server - GPL

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show global variables like 'ngram_%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| ngram_token_size | 2     |
+------------------+-------+
1 row in set (0.00 sec)

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql> CREATE TABLE `t` (   `id` int NOT NULL AUTO_INCREMENT,   `content` text NOT NULL,   PRIMARY KEY (`id`),   FULLTEXT KEY `content_fulltext` (`content`) /*!50100 WITH PARSER `ngram` */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.15 sec)

mysql> SET GLOBAL innodb_ft_aux_table="test/t";
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t (content) VALUES("apple"), ("banana"), ("cherry"), ("date"), ("elderberry");
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select * from t;
+----+------------+
| id | content    |
+----+------------+
|  1 | apple      |
|  2 | banana     |
|  3 | cherry     |
|  4 | date       |
|  5 | elderberry |
+----+------------+
5 rows in set (0.00 sec)

mysql> select count(*) from test.t where match(content) against ('delete');
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

IMHO, phrase searches are converted to ngram phrase searches. For example, in the above case - The search phrase “delete” is converted to “de el le et te”, which returns documents containing “apple”, "date" and “elderberry”.

mysql> select * from test.t where match(content) against ('delete');
+----+------------+
| id | content    |
+----+------------+
|  1 | apple      |
|  4 | date       |
|  5 | elderberry |
+----+------------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;
+------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+------+--------------+-------------+-----------+--------+----------+
| pp   |            2 |           2 |         1 |      2 |        1 |
| pl   |            2 |           2 |         1 |      2 |        2 |
| le   |            2 |           2 |         1 |      2 |        3 |
| ch   |            4 |           4 |         1 |      4 |        0 |
| he   |            4 |           4 |         1 |      4 |        1 |
| er   |            4 |           6 |         2 |      4 |        2 |
| rr   |            4 |           6 |         2 |      4 |        3 |
| ry   |            4 |           6 |         2 |      4 |        4 |
| te   |            5 |           5 |         1 |      5 |        2 |
| el   |            6 |           6 |         1 |      6 |        0 |
| ld   |            6 |           6 |         1 |      6 |        1 |
| er   |            4 |           6 |         2 |      6 |        3 |
| er   |            4 |           6 |         2 |      6 |        3 |
| rb   |            6 |           6 |         1 |      6 |        4 |
| rr   |            4 |           6 |         2 |      6 |        7 |
| ry   |            4 |           6 |         2 |      6 |        8 |
+------+--------------+-------------+-----------+--------+----------+
16 rows in set (0.00 sec)

For ngram_token_size=3, 

mysql> SET GLOBAL innodb_ft_aux_table="test/t";
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t (content) VALUES("apple"), ("banana"), ("cherry"), ("date"), ("elderberry");
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;
+------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+------+--------------+-------------+-----------+--------+----------+
| ppl  |            2 |           2 |         1 |      2 |        1 |
| ple  |            2 |           2 |         1 |      2 |        2 |
| che  |            4 |           4 |         1 |      4 |        0 |
| her  |            4 |           4 |         1 |      4 |        1 |
| err  |            4 |           6 |         2 |      4 |        2 |
| rry  |            4 |           6 |         2 |      4 |        3 |
| eld  |            6 |           6 |         1 |      6 |        0 |
| erb  |            6 |           6 |         1 |      6 |        3 |
| err  |            4 |           6 |         2 |      6 |        6 |
| rry  |            4 |           6 |         2 |      6 |        7 |
+------+--------------+-------------+-----------+--------+----------+
10 rows in set (0.00 sec)

mysql> select * from test.t where match(content) against ('delete');
Empty set (0.01 sec)

mysql> show global variables like 'ngram_%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| ngram_token_size | 3     |
+------------------+-------+
1 row in set (0.00 sec)

regards,
Umesh