Bug #46325 | Fulltext search engine does not find matches with a small number of rows. | ||
---|---|---|---|
Submitted: | 21 Jul 2009 17:32 | Modified: | 21 Jul 2009 18:44 |
Reporter: | Tom Kleinpeter | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S2 (Serious) |
Version: | 5.1.34 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | full-text |
[21 Jul 2009 17:32]
Tom Kleinpeter
[21 Jul 2009 18:10]
Valeriy Kravchuk
Please, check the manual, http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html: "A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators. The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match. Full-text searches are natural language searches if the IN NATURAL LANGUAGE MODE modifier is given or if no modifier is given." With only 2 rows in table whatever word you search for is either not present in any row or is present in 50% or more of rows, hence the result. I'd say this is not a bug. Just use at least 3 rows in your unit tests.
[21 Jul 2009 18:17]
Tom Kleinpeter
I agree that this is conforming to the letter of the law, but does it make sense? The 50% limitation is to restrict words that are present in "too many" rows. The idea that word is present in only a single row, but is being filtered because it is present in "too many" rows doesn't seem right. Obviously, I will have to fix my unit tests, because it would take months to get a fix for this. But for this number of rows, this is a bad rule. It makes it hard for developers. It makes unit tests more complicated than they should be. It will waste developer's time and make them think your product is flaky, which is not the perception you want.
[21 Jul 2009 18:36]
Valeriy Kravchuk
Maybe you should just use IN BOOLEAN MODE (or just LIKE?) for cases like this: mysql> select * from temp where match (str) against ('+aaaa+bbbb+cccc' in boolean mode); +----------------+ | str | +----------------+ | aaaa bbbb cccc | +----------------+ 1 row in set (0.00 sec) Check http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html, please. And surely, even if we will prove and agree that current behavior of natural language search in MySQL is NOT something that is beneficial for the majority of users, request to change it will not be satisfied really soon, taking clear documentation and possible workarounds into account.
[21 Jul 2009 18:44]
Tom Kleinpeter
Unfortunately, I can't use boolean mode -- I need to find rows that might not contain every word I'm searching for. I've worked around this in my unit tests, but if you want make MySQL a more developer-friendly piece of software, you'll fix problems like this.