Bug #117035 | fulltext count(*) where match against('delete') found rows that not exist | ||
---|---|---|---|
Submitted: | 23 Dec 2024 6:53 | Modified: | 7 Jan 12:42 |
Reporter: | Zhenye Lin | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S3 (Non-critical) |
Version: | 8.0.40 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[23 Dec 2024 6:53]
Zhenye Lin
[7 Jan 12:42]
MySQL Verification Team
Hello Zhenye Lin, Thank you for the report and feedback. IMHO this is not a bug but expected behaviour. Please see https://dev.mysql.com/doc/refman/8.0/en/fulltext-search-ngram.html for more details on this. -- bin/mysql -uroot -S/tmp/mysql.sock Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7 Server version: 8.0.40 MySQL Community Server - GPL Copyright (c) 2000, 2024, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show global variables like 'ngram_%'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 2 | +------------------+-------+ 1 row in set (0.00 sec) mysql> create database test; Query OK, 1 row affected (0.00 sec) mysql> use test Database changed mysql> CREATE TABLE `t` ( `id` int NOT NULL AUTO_INCREMENT, `content` text NOT NULL, PRIMARY KEY (`id`), FULLTEXT KEY `content_fulltext` (`content`) /*!50100 WITH PARSER `ngram` */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; Query OK, 0 rows affected (0.15 sec) mysql> SET GLOBAL innodb_ft_aux_table="test/t"; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t (content) VALUES("apple"), ("banana"), ("cherry"), ("date"), ("elderberry"); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> select * from t; +----+------------+ | id | content | +----+------------+ | 1 | apple | | 2 | banana | | 3 | cherry | | 4 | date | | 5 | elderberry | +----+------------+ 5 rows in set (0.00 sec) mysql> select count(*) from test.t where match(content) against ('delete'); +----------+ | count(*) | +----------+ | 3 | +----------+ 1 row in set (0.00 sec) IMHO, phrase searches are converted to ngram phrase searches. For example, in the above case - The search phrase “delete” is converted to “de el le et te”, which returns documents containing “apple”, "date" and “elderberry”. mysql> select * from test.t where match(content) against ('delete'); +----+------------+ | id | content | +----+------------+ | 1 | apple | | 4 | date | | 5 | elderberry | +----+------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | pp | 2 | 2 | 1 | 2 | 1 | | pl | 2 | 2 | 1 | 2 | 2 | | le | 2 | 2 | 1 | 2 | 3 | | ch | 4 | 4 | 1 | 4 | 0 | | he | 4 | 4 | 1 | 4 | 1 | | er | 4 | 6 | 2 | 4 | 2 | | rr | 4 | 6 | 2 | 4 | 3 | | ry | 4 | 6 | 2 | 4 | 4 | | te | 5 | 5 | 1 | 5 | 2 | | el | 6 | 6 | 1 | 6 | 0 | | ld | 6 | 6 | 1 | 6 | 1 | | er | 4 | 6 | 2 | 6 | 3 | | er | 4 | 6 | 2 | 6 | 3 | | rb | 6 | 6 | 1 | 6 | 4 | | rr | 4 | 6 | 2 | 6 | 7 | | ry | 4 | 6 | 2 | 6 | 8 | +------+--------------+-------------+-----------+--------+----------+ 16 rows in set (0.00 sec) For ngram_token_size=3, mysql> SET GLOBAL innodb_ft_aux_table="test/t"; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t (content) VALUES("apple"), ("banana"), ("cherry"), ("date"), ("elderberry"); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | ppl | 2 | 2 | 1 | 2 | 1 | | ple | 2 | 2 | 1 | 2 | 2 | | che | 4 | 4 | 1 | 4 | 0 | | her | 4 | 4 | 1 | 4 | 1 | | err | 4 | 6 | 2 | 4 | 2 | | rry | 4 | 6 | 2 | 4 | 3 | | eld | 6 | 6 | 1 | 6 | 0 | | erb | 6 | 6 | 1 | 6 | 3 | | err | 4 | 6 | 2 | 6 | 6 | | rry | 4 | 6 | 2 | 6 | 7 | +------+--------------+-------------+-----------+--------+----------+ 10 rows in set (0.00 sec) mysql> select * from test.t where match(content) against ('delete'); Empty set (0.01 sec) mysql> show global variables like 'ngram_%'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 3 | +------------------+-------+ 1 row in set (0.00 sec) regards, Umesh