Bug #1589 Possible Bug working with MySQL's Full-text search.
Submitted: 17 Oct 2003 10:20 Modified: 17 Oct 2003 14:54
Reporter: Aaron Schmidt Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.13 OS:Windows (Windows 2000 SP3)
Assigned to: Sergei Golubchik CPU Architecture:Any

[17 Oct 2003 10:20] Aaron Schmidt
Description:
Possible Bug working with MySQL's Full-text search.

[mysqld]
basedir=C:/MySQL
ft_min_word_len=3

=== WHAT WORKS ===
SELECT * FROM articles
WHERE MATCH (keywords, problem_title, problem_desc, solution_title, solution_desc)
AGAINST ('"E.INI"' IN BOOLEAN MODE)

> This query properly retrieves 17 rows (out of 1682) that contain the string "E.INI"

=== WHAT DOESN'T WORK ===
SELECT * FROM articles
WHERE MATCH (keywords, problem_title, problem_desc, solution_title, solution_desc)
AGAINST ('"E.IN"' IN BOOLEAN MODE)

> Does not work.  O rows retrieved.

Testing with a like statement:
SELECT * FROM articles
WHERE keywords LIKE "%E.IN%" OR problem_title LIKE "%E.IN%" OR problem_desc LIKE "%E.IN%" OR solution_title LIKE "%E.IN%" OR solution_desc LIKE "%E.IN%";

> Retrieves 23 rows (out of 1682).  So the rule of 50% does not apply here.

=== REASON ? ===
I believe the problem is that the search is taking the text "E.IN" and breaking it into word boundaries, (even though it is in quotes).  The word lengths are then calculated as 1 for 'E' and 2 for 'IN', and both are under the minimum word length which is currently set at 3 (see above).

=== PREFERRED FUNCTIONALITY ===
Within a Boolean search, anything in quotes should be treated as a single "word" and not be broken apart to be checked seperately against the minimum word length

Any ideas or suggestions are welcome...

Thanks,

aSa

How to repeat:
See Description for examples.

In a nutshell: use a boolean search on string (in quotes) greater than the minimum word limit but seperated by a non-word character.

Example:
AGAINST ('"22:45:78"' IN BOOLEAN MODE)
AGAINST ('"v5.3"' IN BOOLEAN MODE)

Suggested fix:
Treat any string in quotes as an entire word and check that length against the minimum word length.

AGAINST ('"22:45:78"' IN BOOLEAN MODE)
> length=8 (not 2:2:2)
AGAINST ('"v5.3"' IN BOOLEAN MODE)
> length=4 (not 2.1)
[17 Oct 2003 14:54] Sergei Golubchik
Yes, you're right in guessing the reason.

Unfortunately it cannot be fixed by "treating anything in quotes as a single word", as this "word" will not be found in the index (as index contains only valid words).

All I can suggest here is to set ft_min_word_len to 1 and use empty stopword list (as both "e" and "in" are stopwords by default).