| Bug #69931 | Full-text search on InnoDB failed on searches for words containing hyphens | ||
|---|---|---|---|
| Submitted: | 6 Aug 2013 3:25 | Modified: | 21 Aug 2013 16:23 |
| Reporter: | Chito Angeles | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: FULLTEXT search | Severity: | S2 (Serious) |
| Version: | 5.6.13 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | fulltext search boolean innodb hyphen | ||
[6 Aug 2013 16:05]
MySQL Verification Team
Thank you for the bug report.
C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --prompt="mysql 5.6 > "
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.6.14 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql 5.6 > create database xx;
Query OK, 1 row affected (0.01 sec)
mysql 5.6 > use xx
Database changed
mysql 5.6 > CREATE TABLE `catalog` (
-> `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
-> `title` text NOT NULL,
-> PRIMARY KEY (`id`),
-> FULLTEXT KEY `title_ftx` (`title`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (1.59 sec)
mysql 5.6 > INSERT INTO catalog(`title`) VALUES
-> ('A teacher\'s pre-school Japan'),
-> ('A teacher\'s handbook in health and science for grade III');
Query OK, 2 rows affected (0.06 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql 5.6 > SELECT * FROM catalog where match(title) against ('+pre-school' IN BOOLEAN MODE);
Empty set (0.01 sec)
mysql 5.6 > SELECT * FROM catalog where match(title) against ('pre-school' IN BOOLEAN MODE);
Empty set (0.00 sec)
mysql 5.6 > alter table catalog engine MyISAM;
Query OK, 2 rows affected (0.56 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql 5.6 > SELECT * FROM catalog where match(title) against ('pre-school' IN BOOLEAN MODE);
+----+------------------------------+
| id | title |
+----+------------------------------+
| 1 | A teacher's pre-school Japan |
+----+------------------------------+
1 row in set (0.00 sec)
[16 Aug 2013 3:04]
Jimmy Yang
Ok, this is really not a bug. MyISAM never indexes "-", it only either treat it as negate operator or ignores it.
Also, please avoid "pre" in the search, since MyISAM do not index < 4 character words by default.
Let's insert a couple of more rows to show how MyISAM deals with the "-":
mysql> CREATE TABLE `catalog` (
-> `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
-> `title` text NOT NULL,
-> PRIMARY KEY (`id`),
-> FULLTEXT KEY `title_ftx` (`title`)
-> ) ENGINE=myisam DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.10 sec)
mysql> INSERT INTO catalog(`title`) VALUES
-> ('A teacher\'s pre-school Japan'),
-> ('A teacher\'s handbook in health and science for grade III');
Query OK, 2 rows affected (0.04 sec)
Records: 2 Duplicates: 0 Warnings: 0
Add 2 more rows:
mysql> INSERT INTO catalog(`title`) VALUES
-> ('a teacher-school');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog(`title`) VALUES ('a teacher mmmn school');
Query OK, 1 row affected (0.00 sec)
SELECT * FROM catalog where match(title) against ('teacher-school' IN
BOOLEAN MODE);
+----+----------------------------------------------------------+
| id | title |
+----+----------------------------------------------------------+
| 1 | A teacher's pre-school Japan |
| 2 | A teacher's handbook in health and science for grade III |
| 3 | a teacher-school |
| 4 | a teacher mmmn school |
+----+----------------------------------------------------------+
You can see MyISAM really not include hyphen in the search either. It is look
for "teacher" and "school", not "teacher-school".
The only trick for MyISAM is that if the "-" is not immediately preceeds to
the word, then it ignores it :
mysql> SELECT * FROM catalog where match(title) against ('teacher - school'
IN BOOLEAN MODE);
+----+----------------------------------------------------------+
| id | title |
+----+----------------------------------------------------------+
| 1 | A teacher's pre-school Japan |
| 2 | A teacher's handbook in health and science for grade III |
| 3 | a teacher-school |
| 4 | a teacher mmmn school |
+----+----------------------------------------------------------+
4 rows in set (0.00 sec)
So in this sense, InnoDB is better than MyISAM. And result in your case is
expected, since we catch the "-" no matter how you place it. (Please note
MyISAM either ignores it, or treat it as negate,it never indexes it or look
for it)
[16 Aug 2013 7:07]
Jimmy Yang
But we will deal with hyphens that has no space before/after it, and treat them as ignored.

Description: Search for words with hyphens(-) in an INNODB Table with full-text index fails How to repeat: SELECT * FROM catalog where match(title) against ('+pre-school' IN BOOLEAN MODE); Suggested fix: The Hyphen (also used as Boolean Operator NOT) should be ignored or escaped if it is part of a word, meaning no space before the hyphen(-). This may also be true and should be considered for words with the Plus(+) sign, e.g., K+12 or C++.