Bug #84420 | stopwords and ngram indexes | ||
---|---|---|---|
Submitted: | 5 Jan 2017 11:19 | Modified: | 5 Jan 2017 12:16 |
Reporter: | Miguel Angel Nieto | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S3 (Non-critical) |
Version: | 5.7.17 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[5 Jan 2017 11:19]
Miguel Angel Nieto
[5 Jan 2017 12:16]
MySQL Verification Team
Hello Miguel, Thank you for the report and feedback! Imho by default, the ngram parser uses the default stopword list. Thanks, Umesh
[5 Jan 2017 12:19]
MySQL Verification Team
Currently, to define your own stopword list for all InnoDB tables, define a table with the same structure as the INNODB_FT_DEFAULT_STOPWORD table, populate it with stopwords, and set the value of the innodb_ft_server_stopword_table option to a value in the form db_name/table_name before creating the full-text index. The stopword table must have a single VARCHAR column named value. The following example demonstrates creating and configuring a new global stopword table for InnoDB. https://dev.mysql.com/doc/refman/5.7/en/fulltext-stopwords.html
[5 Jan 2017 12:20]
MySQL Verification Team
-- with ngram_token_size=1 [umshastr@hod03]/export/umesh/server/binaries/GABuilds/mysql-5.7.17: bin/mysqld --no-defaults --basedir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17 --datadir=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/84108 --core-file --socket=/tmp/mysql_ushastry.sock --port=3306 --log-error=/export/umesh/server/binaries/GABuilds/mysql-5.7.17/84108/log.err --ngram_token_size=1 2>&1 & [1] 20217 [umshastr@hod03]/export/umesh/server/binaries/GABuilds/mysql-5.7.17: bin/mysql -uroot -S /tmp/mysql_ushastry.sock Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.17 MySQL Community Server (GPL) Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. root@localhost [(none)]> use test Database changed root@localhost [test]> SELECT * FROM articles WHERE MATCH(body) AGAINST('ea' IN BOOLEAN MODE); +----+------------------+ | id | body | +----+------------------+ | 2 | east area | | 1 | east | | 3 | east job | | 4 | eastnation | | 5 | eastway, try try | +----+------------------+ 5 rows in set (0.01 sec) root@localhost [test]> show variables like 'ngram_token_size'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 1 | +------------------+-------+ 1 row in set (0.00 sec)
[3 Apr 2017 11:53]
Sveta Smirnova
Issue with changing INNODB_FT_DEFAULT_STOPWORD table is that other methods of full-text search are affected as well. For example, in this case is not possible to use mecab in one index and ngram in another. Fix should either exclude INNODB_FT_DEFAULT_STOPWORD table for ngram search or create separate table for ngram.
[15 Mar 2019 7:02]
MySQL Verification Team
Bug #94640 marked as duplicate of this one