Bug #85876 | Fulltext search can not find word which contains "," or "." | ||
---|---|---|---|
Submitted: | 10 Apr 2017 8:46 | Modified: | 24 May 2018 10:31 |
Reporter: | Seunguck Lee | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S3 (Non-critical) |
Version: | 5.7.17, 5.7.18 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | boolean, fulltext, NGRAM |
[10 Apr 2017 8:46]
Seunguck Lee
[11 Apr 2017 6:04]
MySQL Verification Team
Hello Seunguck Lee, Thank you for the report and test case. Thanks, Umesh
[11 Apr 2017 6:05]
MySQL Verification Team
## Master>show variables like 'ngram_token_size'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 2 | +------------------+-------+ 1 row in set (0.01 sec) Master>set global innodb_ft_enable_stopword=off; Query OK, 0 rows affected (0.00 sec) Master>set innodb_ft_enable_stopword=off; Query OK, 0 rows affected (0.00 sec) Master>CREATE TABLE ft_test3( -> id int not null, -> contents text, -> primary key(id),fulltext index fx_contents(contents) with parser ngram -> ) engine=innodb; Query OK, 0 rows affected (0.01 sec) Master>set global innodb_ft_aux_table='test/ft_test3'; Query OK, 0 rows affected (0.00 sec) Master> insert into ft_test3 values (1, 'xy,yz'); Query OK, 1 row affected (0.00 sec) Master>SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE order by position; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | xy | 2 | 2 | 1 | 2 | 0 | | y, | 2 | 2 | 1 | 2 | 1 | | ,y | 2 | 2 | 1 | 2 | 2 | | yz | 2 | 2 | 1 | 2 | 3 | +------+--------------+-------------+-----------+--------+----------+ 4 rows in set (0.00 sec) Master>select * from ft_test3 where match(contents) against('y,' in boolean mode); Empty set (0.00 sec) Master>select * from ft_test3 where match(contents) against(',y' in boolean mode); Empty set (0.00 sec) Master> select * from ft_test3 where match(contents) against('y,' in natural language mode); +----+----------+ | id | contents | +----+----------+ | 1 | xy,yz | +----+----------+ 1 row in set (0.00 sec) Master>select * from ft_test3 where match(contents) against(',y' in natural language mode); +----+----------+ | id | contents | +----+----------+ | 1 | xy,yz | +----+----------+ 1 row in set (0.00 sec) Master>show variables like '%version%'; +-------------------------+------------------------------+ | Variable_name | Value | +-------------------------+------------------------------+ | innodb_version | 5.7.18 | | protocol_version | 10 | | slave_type_conversions | | | tls_version | TLSv1,TLSv1.1 | | version | 5.7.18-log | | version_comment | MySQL Community Server (GPL) | | version_compile_machine | x86_64 | | version_compile_os | linux-glibc2.5 | +-------------------------+------------------------------+ 8 rows in set (0.00 sec) Master>
[24 May 2018 10:31]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.7.23, 8.0.12 release, and here's the changelog entry: The ngram full-text search parser permitted comma and period characters to be tokenized as words, which caused an inconsistency between boolean and natural language mode search results. Comma and period characters are no longer tokenized.