Bug #46325 Fulltext search engine does not find matches with a small number of rows.
Submitted: 21 Jul 2009 17:32 Modified: 21 Jul 2009 18:44
Reporter: Tom Kleinpeter Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:5.1.34 OS:Any
Assigned to: CPU Architecture:Any
Tags: full-text

[21 Jul 2009 17:32] Tom Kleinpeter
Description:
I was trying to build a set of unit tests for my system.  The first step was to delete all the rows from my test table, insert a single row, and make sure that the fulltext search engine finds it.  This does not work.  

How to repeat:
Set up the table:

create table temp (str text, fulltext str (str));
insert temp (str) values ('aaaa bbbb cccc'), ('bbbb cccc dddd' );

This should work, but it finds 0 rows:
mysql> select * from temp where match (str) against ('aaaa bbbb cccc');
Empty set (0.00 sec)

Now add one more row:
insert temp (str) values ('cccc dddd eeee');

And now the select works as expected:
mysql> select * from temp where match (str) against ('aaaa bbbb cccc');
+----------------+
| str            |
+----------------+
| aaaa bbbb cccc | 
+----------------+
1 row in set (0.00 sec)

Suggested fix:
Make the search work, no matter how many rows are in the table.  This is crazy that my unit tests will need to fill the table in with a bunch of random values before the index works as expected.
[21 Jul 2009 18:10] Valeriy Kravchuk
Please, check the manual, http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html:

"A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators. The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match. Full-text searches are natural language searches if the IN NATURAL LANGUAGE MODE modifier is given or if no modifier is given."

With only 2 rows in table whatever word you search for is either not present in any row or is present in 50% or more of rows, hence the result. I'd say this is not a bug. Just use at least 3 rows in your unit tests.
[21 Jul 2009 18:17] Tom Kleinpeter
I agree that this is conforming to the letter of the law, but does it make sense?  The 50% limitation is to restrict words that are present in "too many" rows.  The idea that word is present in only a single row, but is being filtered because it is present in "too many" rows doesn't seem right.  

Obviously, I will have to fix my unit tests, because it would take months to get a fix for this.  But for this number of rows, this is a bad rule.  It makes it hard for developers.  It makes unit tests more complicated than they should be.  It will waste developer's time and make them think your product is flaky, which is not the perception you want.
[21 Jul 2009 18:36] Valeriy Kravchuk
Maybe you should just use IN BOOLEAN MODE (or just LIKE?) for cases like this:

mysql>  select * from temp where match (str) against ('+aaaa+bbbb+cccc' in boolean mode);
+----------------+
| str            |
+----------------+
| aaaa bbbb cccc | 
+----------------+
1 row in set (0.00 sec)

Check http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html, please.

And surely, even if we will prove and agree that current behavior of natural language search in MySQL is NOT something that is beneficial for the majority of users, request to change it will not be satisfied really soon, taking clear documentation and possible workarounds into account.
[21 Jul 2009 18:44] Tom Kleinpeter
Unfortunately, I can't use boolean mode -- I need to find rows that might not contain every word I'm searching for.  

I've worked around this in my unit tests, but if you want make MySQL a more developer-friendly piece of software, you'll fix problems like this.