Bug #19583 Fulltext boolean mode search stopword issue
Submitted: 6 May 2006 16:05 Modified: 4 Aug 2006 9:32
Reporter: Jonathan Fiene Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.14 OS:Linux (Linux)
Assigned to: Sergey Vojtovich CPU Architecture:Any

[6 May 2006 16:05] Jonathan Fiene
Description:
I am trying to do a fulltext search for a set of words.  If none of the words are in the stopword list, this works wonderfully.  However, stopwords seem to be causing it to fail.  I'm fairly certain that it has to do with placing the wildcard (*) after the stopword.  If I run:

SELECT id FROM catalog WHERE MATCH ('category') AGAINST ('+dainty* +one*' IN BOOLEAN MODE);

it returns 0 rows, but I know there are 50 rows that contain both 'dainty' and 'one'.  If I remove the wildcard after 'one':

SELECT id FROM catalog WHERE MATCH ('category,subcat') AGAINST ('+dainty* +one' IN BOOLEAN MODE);

the result returns 50 rows, matching only the word 'dainty' because 'one' is a stopword.  I need to be able to must (+) and wildcard (*) each word, but it doesn't seem to handle this correctly when any of the words are in the stopword list.

I've tried this in version 5 as wel, and got the same results.

How to repeat:
Run a fulltext search in boolean mode using both '+' before and '*' after each word in the search string.  Then try including a stopword as one of the words and watch the results disappear...

Suggested fix:
The behavior of the stopword should not be affected by the presence of the wildcard.  I would expect it to ignore all instances of the stopword, but the stopword plus anything else should be included in the results.  

Is it possible to disable stopwords on a table-by-table basis?  Otherwise, is it possible to retrieve a list of the stopwords into PHP from MySQL so that I can pre-condition the search string?
[11 May 2006 17:11] Valeriy Kravchuk
Thank you for a problem report. Please, send exact CREATE TABLE statement and some test data to demonstrate the bug. Please, check also on newer version, 4.1.19.
[11 Jun 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[4 Aug 2006 9:32] Sergey Vojtovich
For now '+one*' means at least one non-stopword that starts with 'one' must be present in document. As a workaround you can either remove this word from stopwords or switch stopwords feature off. Unfortunately there is no way to do it on table-by-table basis.

There is possible way to retreive stopwords. MySQL has ft_stopword_file server variable. If it's value is built-in you can find array of stopwords in myisam/ft_static.c, otherwise you can read stopwords file.

This problem is to be fixed in future and will be tracked by WL#2573.