Bug #92705 | ngram full text search - can't find words smaller than ngram_token_size | ||
---|---|---|---|
Submitted: | 8 Oct 2018 10:49 | Modified: | 8 Oct 2018 17:17 |
Reporter: | Tomek Marecki | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S2 (Serious) |
Version: | 5.7.22/8.0 | OS: | Ubuntu |
Assigned to: | CPU Architecture: | x86 | |
Tags: | boolean, full text search, NGRAM |
[8 Oct 2018 10:49]
Tomek Marecki
[8 Oct 2018 10:59]
MySQL Verification Team
Thank you for the bug report. I couldn't repeat with current source server, so assuming it was fixed some what: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.7.25 Source distribution BUILD: 2018-OCT-04 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql 5.7 > CREATE DATABASE dft; Query OK, 1 row affected (0,02 sec) mysql 5.7 > USE dft Database changed mysql 5.7 > CREATE TABLE `fts` ( -> `name` varchar(255) NOT NULL, -> FULLTEXT KEY `fullTextName` (`name`) WITH PARSER `ngram` -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8; INSERT INTO `fts` (name) VALUES ('LONG'); INSERT INTO `fts` (name) VALUES ('LONGA'); INSERT INTO `fts` (name) VALUES ('LONGAB'); INSERT INTO `fts` (name) VALUES ('LONGABC'); INSERT INTO `fts` (name) VALUES ('LONGABCD'); INSERT INTO `fts` (name) VALUES ('LONGABCDE'); INSERT INTO `fts` (name) VALUES ('LONGABCDEF'); Query OK, 0 rows affected (0,13 sec) mysql 5.7 > mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONG'); Query OK, 1 row affected (0,01 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGA'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGAB'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABC'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCD'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCDE'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCDEF'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCDEFG'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > mysql 5.7 > SELECT name FROM fts WHERE MATCH (name) AGAINST ('+long*' IN boolean mode); +-------------+ | name | +-------------+ | LONG | | LONGA | | LONGAB | | LONGABC | | LONGABCD | | LONGABCDE | | LONGABCDEF | | LONGABCDEFG | +-------------+ 8 rows in set (0,00 sec) mysql 5.7 >
[8 Oct 2018 12:03]
Tomek Marecki
I can't see easy way to check it on 5.7.25 version that you're running. I haven't added (in the: `How to repeat:` section) that you also need to set the: `ngram_token_size` to 10 in the MySQL .cnf configuration. Can you please try that?
[8 Oct 2018 13:01]
MySQL Verification Team
Thank you for the feedback. miguel@luz:~/dbs $ ./57c Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.25 Source distribution BUILD: 2018-OCT-04 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql 5.7 > USE dft Database changed mysql 5.7 > SHOW VARIABLES LIKE "%ngr%"; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 10 | +------------------+-------+ 1 row in set (0,00 sec) mysql 5.7 > CREATE TABLE `fts` ( -> `name` varchar(255) NOT NULL, -> FULLTEXT KEY `fullTextName` (`name`) WITH PARSER `ngram` -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8; INSERT INTO `fts` (name) VALUES ('LONG'); INSERT INTO `fts` (name) VALUES ('LONGA'); INSERT INTO `fts` (name) VALUES ('LONGAB'); INSERT INTO `fts` (name) VALUES ('LONGABC'); INSERT INTO `fts` (name) VALUES ('LONGABCD'); INSERT INTO `fts` (name) VALUES ('LONGABCDE'); INSERT INTO `fts` (name) VALUES ('LONGABCDEF'); INSERT INTO `fts` (name) VALUES ('LONGABCDEFG'); Query OK, 0 rows affected (0,12 sec) mysql 5.7 > mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONG'); Query OK, 1 row affected (0,01 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGA'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGAB'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABC'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCD'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCDE'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCDEF'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > INSERT INTO `fts` (name) VALUES ('LONGABCDEFG'); Query OK, 1 row affected (0,00 sec) mysql 5.7 > mysql 5.7 > SELECT name FROM fts WHERE MATCH (name) AGAINST ('+long*' IN boolean mode); +-------------+ | name | +-------------+ | LONGABCDEF | | LONGABCDEFG | +-------------+ 2 rows in set (0,00 sec) mysql 5.7 >
[8 Oct 2018 17:17]
Tomek Marecki
I've also tested this on MySQL 8.0.12, this bug is reproducible.