Bug #69931 Full-text search on InnoDB failed on searches for words containing hyphens
Submitted: 6 Aug 2013 3:25 Modified: 21 Aug 2013 16:23
Reporter: Chito Angeles Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:5.6.13 OS:Any
Assigned to: CPU Architecture:Any
Tags: fulltext search boolean innodb hyphen

[6 Aug 2013 3:25] Chito Angeles
Description:
Search for words with hyphens(-) in an INNODB Table with full-text index fails

How to repeat:
SELECT * FROM catalog where match(title) against ('+pre-school' IN BOOLEAN MODE);

Suggested fix:
The Hyphen (also used as Boolean Operator NOT) should be ignored or escaped if it is part of a word, meaning no space before the hyphen(-). This may also be true and should be considered for words with the Plus(+) sign, e.g., K+12 or C++.
[6 Aug 2013 16:05] MySQL Verification Team
Thank you for the bug report.

C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.6.14 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > create database xx;
Query OK, 1 row affected (0.01 sec)

mysql 5.6 > use xx
Database changed
mysql 5.6 > CREATE TABLE `catalog` (
    ->   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    ->   `title` text NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   FULLTEXT KEY `title_ftx` (`title`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (1.59 sec)

mysql 5.6 > INSERT INTO catalog(`title`) VALUES
    ->   ('A teacher\'s pre-school Japan'),
    ->   ('A teacher\'s handbook in health and science for grade III');
Query OK, 2 rows affected (0.06 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.6 > SELECT * FROM catalog where match(title) against ('+pre-school' IN BOOLEAN MODE);
Empty set (0.01 sec)

mysql 5.6 > SELECT * FROM catalog where match(title) against ('pre-school' IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql 5.6 > alter table catalog engine MyISAM;
Query OK, 2 rows affected (0.56 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.6 > SELECT * FROM catalog where match(title) against ('pre-school' IN BOOLEAN MODE);
+----+------------------------------+
| id | title                        |
+----+------------------------------+
|  1 | A teacher's pre-school Japan |
+----+------------------------------+
1 row in set (0.00 sec)
[16 Aug 2013 3:04] Jimmy Yang
Ok, this is really not a bug. MyISAM never indexes "-", it only either treat it as negate operator or ignores it.

Also, please avoid "pre" in the search, since MyISAM do not index < 4 character words by default.

Let's insert a couple of more rows to show how MyISAM deals with the "-":

mysql> CREATE TABLE `catalog` (
    ->   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    ->    `title` text NOT NULL,
    ->    PRIMARY KEY (`id`),
    ->    FULLTEXT KEY `title_ftx` (`title`)
    ->  ) ENGINE=myisam DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.10 sec)

mysql> INSERT INTO catalog(`title`) VALUES
    ->   ('A teacher\'s pre-school Japan'),
    ->   ('A teacher\'s handbook in health and science for grade III');
Query OK, 2 rows affected (0.04 sec)
Records: 2  Duplicates: 0  Warnings: 0

Add 2 more rows:

mysql> INSERT INTO catalog(`title`) VALUES
    -> ('a teacher-school');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO catalog(`title`) VALUES ('a teacher mmmn school');
Query OK, 1 row affected (0.00 sec)

 SELECT * FROM catalog where match(title) against ('teacher-school' IN
BOOLEAN MODE);
+----+----------------------------------------------------------+
| id | title                                                    |
+----+----------------------------------------------------------+
|  1 | A teacher's pre-school Japan                             |
|  2 | A teacher's handbook in health and science for grade III |
|  3 | a teacher-school                                         |
|  4 | a teacher mmmn school                                    |
+----+----------------------------------------------------------+

You can see MyISAM really not include hyphen in the search either. It is look
for "teacher" and "school", not "teacher-school".

The only trick for MyISAM is that if the "-" is not immediately preceeds to
the word, then it ignores it :

mysql> SELECT * FROM catalog where match(title) against ('teacher - school'
IN BOOLEAN MODE);
+----+----------------------------------------------------------+
| id | title                                                    |
+----+----------------------------------------------------------+
|  1 | A teacher's pre-school Japan                             |
|  2 | A teacher's handbook in health and science for grade III |
|  3 | a teacher-school                                         |
|  4 | a teacher mmmn school                                    |
+----+----------------------------------------------------------+
4 rows in set (0.00 sec)

So in this sense, InnoDB is better than MyISAM. And result in your case is
expected, since we catch the "-" no matter how you place it. (Please note
MyISAM either ignores it, or treat it as negate,it never indexes it or look
for it)
[16 Aug 2013 7:07] Jimmy Yang
But we will deal with hyphens that has no space before/after it, and treat them as ignored.