Bug #84400 | FullText can not found some word including ', for example: "Xi'an" and "Yan'an" | ||
---|---|---|---|
Submitted: | 4 Jan 2017 6:26 | Modified: | 4 Jan 2017 13:10 |
Reporter: | Daniel Kou | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S3 (Non-critical) |
Version: | 5.7 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[4 Jan 2017 6:26]
Daniel Kou
[4 Jan 2017 13:10]
MySQL Verification Team
Hello Daniel Kou, Thank you for the report and feedback. Verifying this as FR to support single quote('). Thanks, Umesh
[4 Jan 2017 13:11]
MySQL Verification Team
-- 5.7.17 ## "Xi'an" would tokenize to 2 character "Xi" and "an" -- "an" is stop word root@localhost [test]> CREATE TABLE `tbl_district_en` ( -> `name` varchar(255) CHARACTER SET utf8mb4 NOT NULL DEFAULT '', -> `name_EN` varchar(128) CHARACTER SET utf8mb4 DEFAULT NULL, -> FULLTEXT KEY `full_name` (`name`,`name_EN`) -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; Query OK, 0 rows affected (0.02 sec) root@localhost [test]> select @@innodb_ft_min_token_size; +----------------------------+ | @@innodb_ft_min_token_size | +----------------------------+ | 3 | +----------------------------+ 1 row in set (0.00 sec) root@localhost [test]> set global innodb_ft_aux_table="test/tbl_district_en"; Query OK, 0 rows affected (0.00 sec) root@localhost [test]> insert into tbl_district_en values('', 'Xi\'an'),('', 'Yan\'an'),('', 'Xi\'ao'); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 root@localhost [test]> select * from information_schema.innodb_ft_index_cache; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | yan | 3 | 3 | 1 | 3 | 0 | +------+--------------+-------------+-----------+--------+----------+ 1 row in set (0.00 sec) root@localhost [test]> select * from tbl_district_en where match(name,name_en) against('+Xi\'an*' in boolean mode); Empty set (0.01 sec) root@localhost [test]> select * from tbl_district_en where match(name,name_en) against('+Xi\'ao*' in boolean mode); Empty set (0.00 sec) root@localhost [test]> select * from tbl_district_en; +------+---------+ | name | name_EN | +------+---------+ | | Xi'an | | | Yan'an | | | Xi'ao | +------+---------+ 3 rows in set (0.00 sec) -- restart with --innodb_ft_min_token_size=1 root@localhost [test]> select @@innodb_ft_min_token_size; +----------------------------+ | @@innodb_ft_min_token_size | +----------------------------+ | 1 | +----------------------------+ 1 row in set (0.00 sec) root@localhost [test]> set global innodb_ft_aux_table="test/tbl_district_en"; Query OK, 0 rows affected (0.00 sec) root@localhost [test]> select * from information_schema.innodb_ft_index_cache; Empty set (0.00 sec) root@localhost [test]> optimize table tbl_district_en; +----------------------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +----------------------+----------+----------+-------------------------------------------------------------------+ | test.tbl_district_en | optimize | note | Table does not support optimize, doing recreate + analyze instead | | test.tbl_district_en | optimize | status | OK | +----------------------+----------+----------+-------------------------------------------------------------------+ 2 rows in set (0.03 sec) root@localhost [test]> select * from information_schema.innodb_ft_index_cache; +------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +------+--------------+-------------+-----------+--------+----------+ | ao | 4 | 4 | 1 | 4 | 3 | | xi | 2 | 4 | 2 | 2 | 0 | | xi | 2 | 4 | 2 | 4 | 0 | | yan | 3 | 3 | 1 | 3 | 0 | +------+--------------+-------------+-----------+--------+----------+ 4 rows in set (0.00 sec) root@localhost [test]> select * from tbl_district_en where match(name,name_en) against('+Xi\'an*' in boolean mode); Empty set (0.00 sec) root@localhost [test]> select * from tbl_district_en where match(name,name_en) against('+Xi\'ao*' in boolean mode); +------+---------+ | name | name_EN | +------+---------+ | | Xi'ao | +------+---------+ 1 row in set (0.00 sec) root@localhost [test]> select * from tbl_district_en where match(name,name_en) against('+Xi*' in boolean mode); +------+---------+ | name | name_EN | +------+---------+ | | Xi'an | | | Xi'ao | +------+---------+ 2 rows in set (0.00 sec)