Bug #18695 Fine grain full text min_word_len specification
Submitted: 31 Mar 2006 17:03 Modified: 21 Jul 2006 14:27
Reporter: Freek Dijkstra Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S4 (Feature request)
Version:5.1 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[31 Mar 2006 17:03] Freek Dijkstra
Description:
Currently, full text searches uses indexes that ignore words smaller then 4 characters. It is possible to tune this length using the ft_min_word_len in an option file.

Searches in many databases with technical documentation rely on shorter (e.g. 3 or even 2 charcacter abbreviations). However, this change can only be made on a per-server basis, which is often undesirable on production servers.

How to repeat:
This is documented behaviour; see http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html

Suggested fix:
Make control of the full text min_word_len finer grained. For example on a per database, per table, per column or per index basis. The most logical choice would be to allow specification while defining the index (e.g. in the create index specs, if possible).

Alternatively, a more radical solution is to remove arbitrary and language specific details like min_word_len and the ignore list. They do have a valid reason to increase performance by limit the length of the index file, but the fact that these short or frequently used words polute the search result should not be of concern to the MySQL backend; that is a application-specific problem (since fine tuning search is indeed very application-specific!), and should thus be handled by for example code that filters the search parameters given by the user.
[21 Jul 2006 11:44] Freek Dijkstra
This is a duplicate of #12657. My apology for not detecting that earlier.
[21 Jul 2006 14:27] MySQL Verification Team
Thank you for the feedback.