Bug #21909 Need workaround for word parsing behavior in fulltext searches (esp for hyphens)
Submitted: 29 Aug 2006 21:27 Modified: 30 Aug 2006 14:31
Reporter: Ryan Kaldari Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S4 (Feature request)
Version:3.23+ OS:Windows (Win2K, Linux)
Assigned to: CPU Architecture:Any

[29 Aug 2006 21:27] Ryan Kaldari
Description:
This problem was previously reported as Bug #2095. The reply to that bug was "it's a feature, not a bug" and the bug was closed. Whether you officially want to call it a bug or not is irrelevent. It is however, certainly a significant problem that needs some type of solution (besides hacking mySQL and recompiling, which was the suggested workaround). Right now fulltext searching is useless unless you know that your data isn't going to contain any hyphens whatsoever. Ths is because mySQL "uses a very simple parser to split text into words. A ``word'' is any
sequence of characters consisting of letters, digits, `'', and `_'." Why hyphens aren't considered in the same way as underscores is a mystery to me. Regardless, there needs to be a way to override this behavior (or else the default parsing behavior needs to be changed). Otherwise fulltext searching is pretty much useless. You can't use it to search a table of names (since many last names are hyphenated); you can't use it seach a table of Movies (X-Men); you can't use it search a table of Newspapers (Clarion-Ledger); Do I really need to go on?

How to repeat:
mysql> CREATE TABLE articles (
    ->   id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
    ->   title VARCHAR(200),
    ->   body TEXT,
    ->   FULLTEXT (title,body)
    -> );

mysql> INSERT INTO articles VALUES
    -> (NULL,'MySQL Tutorial', 'DBMS stands for DataBase ...'),
    -> (NULL,'MySQL vs. YourSQL', 'In the following database comparison ...'),
    -> (NULL,'Searching for Hyphenized Words', 'POL-BN: test entry');

mysql> SELECT * FROM articles
    ->          WHERE MATCH (title,body) AGAINST ('POL-BN');
Empty set 0.00 sec
[30 Aug 2006 0:18] MySQL Verification Team
Thank you for the bug report feature request.
[30 Aug 2006 9:27] Sergey Vojtovich
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

If you want to change parser behavior, please use mysql 5.1 fulltext parser plugin feature. More information is available at: http://dev.mysql.com/doc/refman/5.1/en/plugin-full-text-plugins.html
[30 Aug 2006 14:29] Ryan Kaldari
Why is there no mention of that in the fulltext documentation??
[30 Aug 2006 14:31] Ryan Kaldari
Nevermind, I didn't look in the Version 5 documentation :P