Bug #115375 Full Text index does not work as expected with Boolean mode
Submitted: 19 Jun 2024 4:40 Modified: 19 Jun 2024 7:52
Reporter: Yunus Shaikh Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:8.0.37 OS:Any
Assigned to: CPU Architecture:Any
Tags: full text search

[19 Jun 2024 4:40] Yunus Shaikh
Description:
When we try to lookup full text index using boolean mode with ngram parser, it does not generate enough tokens to lookup.

Here is step to reproduce,
1) Install MySQL 8.0.37 community edition.
2) Create table with fulltext index with ngram parser 

1) Create a test table

CREATE TABLE testing ( id INT UNSIGNED NOT NULL, name varchar(300) BINARY NOT NULL, PRIMARY KEY (id, name), FULLTEXT KEY (name) WITH PARSER ngram );

2) Insert a row in the table

insert into testing (id,name) values (1 ,'Movie9/PlayMovie/12345');

3) Set global parameters for full text index,

set global innodb_ft_aux_table = 'test/testing';

set global innodb_ft_enable_stopword=OFF;

4) Run the select statement

mysql> select count(*) from testing where MATCH(name) AGAINST('Mov' IN BOOLEAN MODE);
+----------+
| count(*) |
+----------+
|        0 |
+----------+

It should fetch the row.

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE;
+------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+------+--------------+-------------+-----------+--------+----------+
| 12   |            2 |           2 |         1 |      2 |       17 |
| 23   |            2 |           2 |         1 |      2 |       18 |
| 34   |            2 |           2 |         1 |      2 |       19 |
| 45   |            2 |           2 |         1 |      2 |       20 |
| Mo   |            2 |           2 |         1 |      2 |        0 |
| Mo   |            2 |           2 |         1 |      2 |       11 |
| Pl   |            2 |           2 |         1 |      2 |        7 |
| e9   |            2 |           2 |         1 |      2 |        4 |
| ov   |            2 |           2 |         1 |      2 |        1 |
| ov   |            2 |           2 |         1 |      2 |       11 |
| yM   |            2 |           2 |         1 |      2 |       10 |
+------+--------------+-------------+-----------+--------+----------+

It should load better index cache statistics to pull the record for the select query.

How to repeat:

When we try to lookup full text index using boolean mode with ngram parser, it does not generate enough tokens to lookup.

Here is step to reproduce,
1) Install MySQL 8.0.37 community edition.
2) Create table with fulltext index with ngram parser 

1) Create a test table

CREATE TABLE testing ( id INT UNSIGNED NOT NULL, name varchar(300) BINARY NOT NULL, PRIMARY KEY (id, name), FULLTEXT KEY (name) WITH PARSER ngram );

2) Insert a row in the table

insert into testing (id,name) values (1 ,'Movie9/PlayMovie/12345');

3) Set global parameters for full text index,

set global innodb_ft_aux_table = 'test/testing';

set global innodb_ft_enable_stopword=OFF;

4) Run the select statement

mysql> select count(*) from testing where MATCH(name) AGAINST('Mov' IN BOOLEAN MODE);
+----------+
| count(*) |
+----------+
|        0 |
+----------+

It should fetch the row.

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE;
+------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+------+--------------+-------------+-----------+--------+----------+
| 12   |            2 |           2 |         1 |      2 |       17 |
| 23   |            2 |           2 |         1 |      2 |       18 |
| 34   |            2 |           2 |         1 |      2 |       19 |
| 45   |            2 |           2 |         1 |      2 |       20 |
| Mo   |            2 |           2 |         1 |      2 |        0 |
| Mo   |            2 |           2 |         1 |      2 |       11 |
| Pl   |            2 |           2 |         1 |      2 |        7 |
| e9   |            2 |           2 |         1 |      2 |        4 |
| ov   |            2 |           2 |         1 |      2 |        1 |
| ov   |            2 |           2 |         1 |      2 |       11 |
| yM   |            2 |           2 |         1 |      2 |       10 |
+------+--------------+-------------+-----------+--------+----------+

It should load better index cache statistics to pull the record for the select query.
[19 Jun 2024 7:52] MySQL Verification Team
Hello Yunus Shaikh,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh