Bug #92230 Fulltext search returns unexpected results
Submitted: 29 Aug 2018 14:42 Modified: 9 Oct 2018 14:11
Reporter: Sylvain Estevez Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.7/8.0 OS:Any (RDS Aurora)
Assigned to: CPU Architecture:Any
Tags: against, fulltext, match, NGRAM

[29 Aug 2018 14:42] Sylvain Estevez
Description:
Hi,
We have a couple fulltext indexes that use to search users by email/name.

We've had good results so far but we recently came across a special case for which the results are unexpected.

In particular, a user with last name "kapasi" is not returned despite a perfect match query.

Note1: We have stopwords disabled
Note1: If it can help at all, I've done several tests using various values and only values containing "pa" are not returned (all tests were perfect match queries)

How to repeat:
1. Create the table
```
CREATE TABLE `ngram_test` (
  `first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  `last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */ 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
```

2. Insert a row
```
INSERT INTO `ngram_test` (`first_name`, `last_name`) VALUES ('Foo', 'Kapasi');
```

3. Query it
```
SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
```
[29 Aug 2018 14:52] Sylvain Estevez
```
SHOW VARIABLES LIKE '%ngram%';

ngram_token_size	2
```

Regarding the note on stopwords, they're not disabled globally but were disabled for the session before creating the index.
[29 Aug 2018 15:14] Sylvain Estevez
Additional information: I have tried to drop the index and recreate it after setting `SET `innodb_ft_enable_stopword` = OFF;` but the issue persists.
[9 Oct 2018 14:07] MySQL Verification Team
C:\DBS>c:\dbs\5.7\bin\mysql -uroot --port=3570 -p  --prompt="mysql 5.7 > "
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.25-log Source distribution BUILD: 2018-OCT-04

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.7 > CREATE DATABASE h;
Query OK, 1 row affected (0.01 sec)

mysql 5.7 > USE h
Database changed
mysql 5.7 > CREATE TABLE `ngram_test` (
    ->   `first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
    ->   `last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
    ->   FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.15 sec)

mysql 5.7 > INSERT INTO `ngram_test` (`first_name`, `last_name`) VALUES ('Foo', 'Kapasi');
Query OK, 1 row affected (0.01 sec)

mysql 5.7 >
mysql 5.7 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql 5.7 > ALTER TABLE ngram_test ENGINE MyISAM;
Query OK, 1 row affected (0.06 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql 5.7 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Foo        | Kapasi    |
+------------+-----------+
1 row in set (0.00 sec)

mysql 5.7 > SHOW CREATE TABLE ngram_test\G
*************************** 1. row ***************************
       Table: ngram_test
Create Table: CREATE TABLE `ngram_test` (
  `first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  `last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
  FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)

mysql 5.7 > SHOW VARIABLES LIKE '%ngram%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| ngram_token_size | 2     |
+------------------+-------+
1 row in set (0.01 sec)

mysql 5.7 >
[9 Oct 2018 14:11] MySQL Verification Team
Thank you for the bug report.

C:\DBS>c:\dbs\8.0\bin\mysql -uroot --port=3580 -p --prompt="mysql 8.0 > " --default-character-set=utf8mb4
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.14 Source distribution BUILD: 2018-OCT-04

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 8.0 > CREATE DATABASE h;
Query OK, 1 row affected (0.02 sec)

mysql 8.0 > USE h
Database changed
mysql 8.0 > CREATE TABLE `ngram_test` (
    ->   `first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
    ->   `last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
    ->   FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.22 sec)

mysql 8.0 > INSERT INTO `ngram_test` (`first_name`, `last_name`) VALUES ('Foo', 'Kapasi');
Query OK, 1 row affected (0.01 sec)

mysql 8.0 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql 8.0 > ALTER TABLE ngram_test ENGINE MyISAM;
Query OK, 1 row affected (0.10 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql 8.0 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Foo        | Kapasi    |
+------------+-----------+
1 row in set (0.00 sec)

mysql 8.0 > SHOW CREATE TABLE ngram_test\G
*************************** 1. row ***************************
       Table: ngram_test
Create Table: CREATE TABLE `ngram_test` (
  `first_name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `last_name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)

mysql 8.0 > SHOW VARIABLES LIKE '%ngram%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| ngram_token_size | 2     |
+------------------+-------+
1 row in set (0.00 sec)

mysql 8.0 >