| Bug #92230 | Fulltext search returns unexpected results | ||
|---|---|---|---|
| Submitted: | 29 Aug 2018 14:42 | Modified: | 9 Oct 2018 14:11 |
| Reporter: | Sylvain Estevez | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: FULLTEXT search | Severity: | S3 (Non-critical) |
| Version: | 5.7/8.0 | OS: | Any (RDS Aurora) |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | against, fulltext, match, NGRAM | ||
[29 Aug 2018 14:52]
Sylvain Estevez
``` SHOW VARIABLES LIKE '%ngram%'; ngram_token_size 2 ``` Regarding the note on stopwords, they're not disabled globally but were disabled for the session before creating the index.
[29 Aug 2018 15:14]
Sylvain Estevez
Additional information: I have tried to drop the index and recreate it after setting `SET `innodb_ft_enable_stopword` = OFF;` but the issue persists.
[9 Oct 2018 14:07]
MySQL Verification Team
C:\DBS>c:\dbs\5.7\bin\mysql -uroot --port=3570 -p --prompt="mysql 5.7 > "
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.25-log Source distribution BUILD: 2018-OCT-04
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql 5.7 > CREATE DATABASE h;
Query OK, 1 row affected (0.01 sec)
mysql 5.7 > USE h
Database changed
mysql 5.7 > CREATE TABLE `ngram_test` (
-> `first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
-> `last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
-> FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.15 sec)
mysql 5.7 > INSERT INTO `ngram_test` (`first_name`, `last_name`) VALUES ('Foo', 'Kapasi');
Query OK, 1 row affected (0.01 sec)
mysql 5.7 >
mysql 5.7 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
Empty set (0.00 sec)
mysql 5.7 > ALTER TABLE ngram_test ENGINE MyISAM;
Query OK, 1 row affected (0.06 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql 5.7 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Foo | Kapasi |
+------------+-----------+
1 row in set (0.00 sec)
mysql 5.7 > SHOW CREATE TABLE ngram_test\G
*************************** 1. row ***************************
Table: ngram_test
Create Table: CREATE TABLE `ngram_test` (
`first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
`last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
mysql 5.7 > SHOW VARIABLES LIKE '%ngram%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| ngram_token_size | 2 |
+------------------+-------+
1 row in set (0.01 sec)
mysql 5.7 >
[9 Oct 2018 14:11]
MySQL Verification Team
Thank you for the bug report.
C:\DBS>c:\dbs\8.0\bin\mysql -uroot --port=3580 -p --prompt="mysql 8.0 > " --default-character-set=utf8mb4
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.14 Source distribution BUILD: 2018-OCT-04
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql 8.0 > CREATE DATABASE h;
Query OK, 1 row affected (0.02 sec)
mysql 8.0 > USE h
Database changed
mysql 8.0 > CREATE TABLE `ngram_test` (
-> `first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
-> `last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
-> FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.22 sec)
mysql 8.0 > INSERT INTO `ngram_test` (`first_name`, `last_name`) VALUES ('Foo', 'Kapasi');
Query OK, 1 row affected (0.01 sec)
mysql 8.0 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
Empty set (0.00 sec)
mysql 8.0 > ALTER TABLE ngram_test ENGINE MyISAM;
Query OK, 1 row affected (0.10 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql 8.0 > SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Foo | Kapasi |
+------------+-----------+
1 row in set (0.00 sec)
mysql 8.0 > SHOW CREATE TABLE ngram_test\G
*************************** 1. row ***************************
Table: ngram_test
Create Table: CREATE TABLE `ngram_test` (
`first_name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`last_name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.00 sec)
mysql 8.0 > SHOW VARIABLES LIKE '%ngram%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| ngram_token_size | 2 |
+------------------+-------+
1 row in set (0.00 sec)
mysql 8.0 >

Description: Hi, We have a couple fulltext indexes that use to search users by email/name. We've had good results so far but we recently came across a special case for which the results are unexpected. In particular, a user with last name "kapasi" is not returned despite a perfect match query. Note1: We have stopwords disabled Note1: If it can help at all, I've done several tests using various values and only values containing "pa" are not returned (all tests were perfect match queries) How to repeat: 1. Create the table ``` CREATE TABLE `ngram_test` ( `first_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL, `last_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL, FULLTEXT KEY `foobar_full_name_ngram` (`first_name`,`last_name`) /*!50100 WITH PARSER `ngram` */ ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; ``` 2. Insert a row ``` INSERT INTO `ngram_test` (`first_name`, `last_name`) VALUES ('Foo', 'Kapasi'); ``` 3. Query it ``` SELECT * FROM `ngram_test` WHERE MATCH(first_name, last_name) AGAINST('kapasi' IN BOOLEAN MODE); ```