| Bug #117035 | fulltext count(*) where match against('delete') found rows that not exist | ||
|---|---|---|---|
| Submitted: | 23 Dec 2024 6:53 | Modified: | 7 Jan 12:42 |
| Reporter: | Zhenye Lin | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: FULLTEXT search | Severity: | S3 (Non-critical) |
| Version: | 8.0.40 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[7 Jan 12:42]
MySQL Verification Team
Hello Zhenye Lin, Thank you for the report and feedback. IMHO this is not a bug but expected behaviour. Please see https://dev.mysql.com/doc/refman/8.0/en/fulltext-search-ngram.html for more details on this. -- bin/mysql -uroot -S/tmp/mysql.sock Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7 Server version: 8.0.40 MySQL Community Server - GPL Copyright (c) 2000, 2024, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show global variables like 'ngram_%'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 2 | +------------------+-------+ 1 row in set (0.00 sec) mysql> create database test; Query OK, 1 row affected (0.00 sec) mysql> use test Database changed mysql> CREATE TABLE `t` ( `id` int NOT NULL AUTO_INCREMENT, `content` text NOT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `content_fulltext` (`content`) /*!50100 WITH PARSER `ngram` */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; Query OK, 0 rows affected (0.15 sec) mysql> SET GLOBAL innodb_ft_aux_table="test/t"; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t (content) VALUES("apple"), ("banana"), ("cherry"), ("date"), ("elderberry"); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> select * from t; +----+------------+ | id | content | +----+------------+ | 1 | apple | | 2 | banana | | 3 | cherry | | 4 | date | | 5 | elderberry | +----+------------+ 5 rows in set (0.00 sec) mysql> select count(*) from test.t where match(content) against ('delete'); +----------+ | count(*) | +----------+ | 3 | +----------+ 1 row in set (0.00 sec) IMHO, phrase searches are converted to ngram phrase searches. For example, in the above case - The search phrase “delete” is converted to “de el le et te”, which returns documents containing “apple”, "date" and “elderberry”. mysql> select * from test.t where match(content) against ('delete'); +----+------------+ | id | content | +----+------------+ | 1 | apple | | 4 | date | | 5 | elderberry | +----+------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | pp | 2 | 2 | 1 | 2 | 1 | | pl | 2 | 2 | 1 | 2 | 2 | | le | 2 | 2 | 1 | 2 | 3 | | ch | 4 | 4 | 1 | 4 | 0 | | he | 4 | 4 | 1 | 4 | 1 | | er | 4 | 6 | 2 | 4 | 2 | | rr | 4 | 6 | 2 | 4 | 3 | | ry | 4 | 6 | 2 | 4 | 4 | | te | 5 | 5 | 1 | 5 | 2 | | el | 6 | 6 | 1 | 6 | 0 | | ld | 6 | 6 | 1 | 6 | 1 | | er | 4 | 6 | 2 | 6 | 3 | | er | 4 | 6 | 2 | 6 | 3 | | rb | 6 | 6 | 1 | 6 | 4 | | rr | 4 | 6 | 2 | 6 | 7 | | ry | 4 | 6 | 2 | 6 | 8 | +------+--------------+-------------+-----------+--------+----------+ 16 rows in set (0.00 sec) For ngram_token_size=3, mysql> SET GLOBAL innodb_ft_aux_table="test/t"; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t (content) VALUES("apple"), ("banana"), ("cherry"), ("date"), ("elderberry"); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | ppl | 2 | 2 | 1 | 2 | 1 | | ple | 2 | 2 | 1 | 2 | 2 | | che | 4 | 4 | 1 | 4 | 0 | | her | 4 | 4 | 1 | 4 | 1 | | err | 4 | 6 | 2 | 4 | 2 | | rry | 4 | 6 | 2 | 4 | 3 | | eld | 6 | 6 | 1 | 6 | 0 | | erb | 6 | 6 | 1 | 6 | 3 | | err | 4 | 6 | 2 | 6 | 6 | | rry | 4 | 6 | 2 | 6 | 7 | +------+--------------+-------------+-----------+--------+----------+ 10 rows in set (0.00 sec) mysql> select * from test.t where match(content) against ('delete'); Empty set (0.01 sec) mysql> show global variables like 'ngram_%'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 3 | +------------------+-------+ 1 row in set (0.00 sec) regards, Umesh

Description: After inserting some data into a table with a fulltext index, when querying with the keyword 'delete' through the fulltext index, data was returned, but in fact, there is no content with 'delete' in the table. mysql> select count(*) from test.t; +----------+ | count(*) | +----------+ | 10024 | +----------+ 1 row in set (0.03 sec) mysql> select count(*) from test.t where match(content) against ('delete'); +----------+ | count(*) | +----------+ | 7176 | +----------+ 1 row in set (0.02 sec) mysql> select * from test.t where match(content) against ('delete'); +-------+-----------------------+ | id | content | +-------+-----------------------+ | 1436 | 3112336275 elderberry | | 1452 | 0150978814 elderberry | | 1492 | 7073588631 elderberry | | 1499 | 8019551115 elderberry | | 1518 | 8172327062 elderberry | | 1521 | 2594195326 elderberry | | 1524 | 5947236864 elderberry | | 1525 | 5457509870 elderberry | | 1526 | 9677026647 elderberry | | 1527 | 1644922500 elderberry | | 1528 | 7379734130 elderberry | | 1529 | 1700071738 elderberry | ... How to repeat: 1. create test table CREATE TABLE `t` ( `id` int NOT NULL AUTO_INCREMENT, `content` text NOT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `content_fulltext` (`content`) /*!50100 WITH PARSER `ngram` */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; 2. insert some data that `content` without 'delete' 3. query with fulltext index select count(*) from test.t where match(content) against ('delete');