Bug #2095 | full-text search for words containing hyphens won't work | ||
---|---|---|---|
Submitted: | 11 Dec 2003 5:15 | Modified: | 11 Dec 2003 9:43 |
Reporter: | [ name withheld ] | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.0.16 | OS: | Windows (Win2K, Linux) |
Assigned to: | Sergei Golubchik | CPU Architecture: | Any |
[11 Dec 2003 5:15]
[ name withheld ]
[11 Dec 2003 7:11]
Sergei Golubchik
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php Additional info: According to the manual: http://www.mysql.com/doc/en/Fulltext_Search.html "MySQL uses a very simple parser to split text into words. A ``word'' is any sequence of characters consisting of letters, digits, `'', and `_'." That is this behavious is not a bug.
[11 Dec 2003 8:44]
[ name withheld ]
Ok, you're right. Do you know a workaround for a full-text search for terms like "POL-BN" ?
[11 Dec 2003 9:43]
Sergei Golubchik
Yes. If you can rebuild mysqld, you can the desired behaviour fairy easily. In the myisam/ftdefs.h file there is #define HYPHEN_IS_DELIM you need to comment it out.
[2 Aug 2006 14:10]
Rob Bach
Rebuilding for me is not a suitable workaround for this problem. Is there any other workaround? Searching for model numbers such as (ms-100) with mysql fulltext search produces undesired results. Is there a simplier solution such as editing a config file or something? Fulltext search is very powerful and easy to implement but this hyphen problem limits it's use for me. See it in action at: http://www.marblefountains.com , check the model number issue and see if it resolved.
[29 Aug 2006 21:08]
Ryan Kaldari
How can this not be a bug? It makes fulltext searching useless. Very few sets of data are guarenteed to not have any words with hyphens in them. Let's see, you can't use it to search a table of names (since many last names are hyphenated); you can't use it seach a table of Movies (X-Men); you can't use it search a table of Newspapers (Clarion-Ledger); Do I really need to go on? Obviously there needs to be an easy workaround to override the default word parsing behavior here.
[15 Sep 2006 19:17]
Sergei Golubchik
There is another workaround. It was recently added to the manual: " Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes. "
[6 Dec 2006 15:54]
Dale Greer
Please explain this fix. The <ctype><map> of which you speak is very cryptic. Lots of people are talking about this on the web, and I've not seen anyone who knows how to make the edit you're suggesting. Someone else said you shouldn't do that at all, but should make your own parser plugin, as if just anyone would know how to do that.
[10 Aug 2008 9:58]
hans paijmans
full text search with hyphens definitly is broken. I changed the charactermap as avised, to allow for hyphens, and then in *mostly* works. BUT... in a database with dutch city names, in which all three examples exist: handler lijst read asciiname=("etten-leur"); // works handler lijst read asciiname=("'s gravenmoer"); // works handler lijst read asciiname=("'s-hertogenbosch"); // does *not* work. handler lijst read asciiname=("'s-gravenhage"); // does *not* work. but again: handler lijst read asciiname=("station s'hertogenbosch"); // works !!! I am at my wits end (which, admittedly, is not very far) Paai
[4 Nov 2010 14:52]
Sheeri Cabral
I think this is worth revisiting, and I put my vote in for an easy way to change whether or not a hyphen is a delimiter/word boundary or a word character.
[4 Nov 2010 16:36]
Julien Lamarche
My employer could also use a change in this behavior (or an easier way to change this behavior). We have one customer upset because they are searching terms with hypens in them ("F-35") and getting too many results back.