Bug #4152 Fulltext 50% rule
Submitted: 15 Jun 2004 20:37 Modified: 28 Aug 2009 21:11
Reporter: Timothy Crider Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S4 (Feature request)
Version:5.0.0 OS:Any (Any)
Assigned to: CPU Architecture:Any

[15 Jun 2004 20:37] Timothy Crider
Description:
I was wondering how difficult it would be to make 2 pieces of the 50% rule variables that could be set. The two variables I would like to see are.

FS_MIN_ROWS

and

FS_LIMIT_PERCENT

 or something of that nature. The first variable (FS_MIN_ROWS) would disable the 50% rule if the total number of rows in the table was below it.

For example if FS_MIN_ROWS was set to 100. And my table had 50 rows, the 50% rule would not effect the result and return 0.

The second variable would allow a user to define a percent other than 50% for the fulltext ceiling.

 If you have any questions please let me know.

How to repeat:
This is taken from the manual: http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html

mysql> SELECT * FROM articles
    -> WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)

The search result is empty because the word ``MySQL'' is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large datasets, this is the most desirable behavior--a natural language query should not return every second row from a 1GB table. For small datasets, it may be less desirable. 

A word that matches half of rows in a table is less likely to locate relevant documents. In fact, it will most likely find plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular dataset in which they occur. A given word may exceed the 50% threshold in one dataset but not another. 

The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more.
[16 Jun 2004 15:53] Timothy Crider
changing mysql version.
[28 Aug 2009 21:11] Sveta Smirnova
Thank you for the reasonable feature request.