Bug #14194 Problem with fulltext boolean search and apostrophe
Submitted: 20 Oct 2005 23:09 Modified: 18 Jan 2006 0:49
Reporter: jj a
Status: Closed
Category:Server Severity:S3 (Non-critical)
Version:4.1.11 OS:Microsoft Windows (Windows)
Assigned to: Sergey Vojtovich Target Version:

[20 Oct 2005 23:09] jj a
Description:
SELECT titre,texte FROM editorial WHERE MATCH (titre,texte) AGAINST ('étude' IN BOOLEAN
MODE)

returns no match whereas 

SELECT titre,texte FROM editorial WHERE MATCH (titre,texte) AGAINST ('l\'étude' IN
BOOLEAN MODE)

returns a match. It is a big problem in french, for instance,  where l' m' n' s' are
common prefixes to a lot of words like in the following sentence:

" l'ouragan m'asperge d'embruns de l'océan "

You cannot find any of these words in fulltext search unless you look for `l'ouragan` or
`d'embruns`... You do not want do that. Really annoying...

How to repeat:
CREATE TABLE `editorial` (
  `title` varchar(255) default NULL,
  `searchtext` text NOT NULL,
  FULLTEXT KEY `titre` (`titre`,`searchtext`)
) TYPE=MyISAM;

-- 
-- Contenu de la table `editorial`
-- 
INSERT INTO `editorial` 
VALUES 
('Des effets de la poésie sur le génie des langues', 
'l''ouragan m''asperge d''embruns de l''océan');

Perform the following query:
SELECT titre FROM editorial WHERE MATCH (titre,searchtext) AGAINST ('ouragan' IN BOOLEAN
MODE);

No match.........ggggggrrrr................

Suggested fix:
None
[20 Oct 2005 23:50] Miguel Solorzano
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

The MySQL FULLTEXT implementation regards any sequence of true word characters (letters,
digits, and underscores) as a word. That sequence may also contain apostrophes ('), but
not more than one in a row. This means that aaa'bbb is regarded as one word, but aaa''bbb
is regarded as two words. Apostrophes at the beginning or the end of a word are stripped
by the FULLTEXT parser; 'aaa'bbb'  would be parsed as aaa'bbb.

So l'ouragan is regarded as one word and not just ouragan.
[21 Oct 2005 8:43] jj a
I definitely think that it is a BUG from a linguistic standpoint.

Take the sentence : "I smashed Jerry's car" , you will not be able to find "Jerry" in the
text unless you KNOW that it followed by the apostrophe. I do not see the point of looking
at that as a feature; fulltext search becomes almost useless in languages making heavy use
of the apostrophe (Hey I am french!). Every french word that starts with a vowel is a
candidate for not being searchable. I found tons of forum messages on the web of puzzled
italians, french,etc... Even in english, it is a problem, come on! 

Solution: If people want to search -Jerry's- in the text, they should enclose it in
double-quotes as stated in the doc -> "Jerry's". Doesn't that seem logical?
[21 Oct 2005 13:55] Sergei Golubchik
This change cannot be done in 4.1, but we'll consider doing it in 5.0
[22 Oct 2005 12:11] jj a
Fixing that in 5.0 would be great and a good reason to upgrade to that version for a
number of latin languages developers
[13 Jan 2006 18:40] Sergey Vojtovich
This bug cannot be fixed in 5.0, as 5.0 is now stable.
Will be fixed in 5.1.
[13 Jan 2006 20:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1056
[16 Jan 2006 20:37] Sergey Vojtovich
Fixed in 5.1.6.
[16 Jan 2006 23:17] Mike Hillyer
Added to 5.1.6 changelog:

<listitem>
        <para>
          Words with apostrophes are now matched in a FULLTEXT search
          against  non-apostrophe words (i.e. a search for
          <literal>Jerry</literal> will match against the term
          <literal>Jerry's</literal>). (Bug #14194)
        </para>
      </listitem>
[17 Jan 2006 11:51] jj a
Thanks guys for the 5.1.6 fix! One more reason to upgrade.
I think not only me will appreciate that.
JJ
[17 Jan 2006 11:57] Sergey Vojtovich
Mike, I'm sorry, but I forgot one thing that deserves to be mentioned in changelog.
Upgrade to version, that include this changeset, will most likely result in broken
fulltext index, since parsing rules has been changed. REPAIR TABLE statement should
be issued for tables having fulltext indexes.

jj a: you're always welcome! :)
[18 Jan 2006 0:49] Mike Hillyer
REPAIR TABLE warning added to changelog entry.