MySQL Bugs: #18695: Fine grain full text min_word

Bug #18695	Fine grain full text min_word_len specification
Submitted:	31 Mar 2006 17:03	Modified:	21 Jul 2006 14:27
Reporter:	Freek Dijkstra	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server	Severity:	S4 (Feature request)
Version:	5.1	OS:	Linux (Linux)
Assigned to:		CPU Architecture:	Any

Description:
Currently, full text searches uses indexes that ignore words smaller then 4 characters. It is possible to tune this length using the ft_min_word_len in an option file.

Searches in many databases with technical documentation rely on shorter (e.g. 3 or even 2 charcacter abbreviations). However, this change can only be made on a per-server basis, which is often undesirable on production servers.

How to repeat:
This is documented behaviour; see http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html

Suggested fix:
Make control of the full text min_word_len finer grained. For example on a per database, per table, per column or per index basis. The most logical choice would be to allow specification while defining the index (e.g. in the create index specs, if possible).

Alternatively, a more radical solution is to remove arbitrary and language specific details like min_word_len and the ignore list. They do have a valid reason to increase performance by limit the length of the index file, but the fact that these short or frequently used words polute the search result should not be of concern to the MySQL backend; that is a application-specific problem (since fine tuning search is indeed very application-specific!), and should thus be handled by for example code that filters the search parameters given by the user.

This is a duplicate of #12657. My apology for not detecting that earlier.

Thank you for the feedback.