Bug #69043 InnoDB FULLTEXT Phrase Logic Still Incorrect
Submitted: 23 Apr 2013 19:40 Modified: 12 Jun 2013 16:20
Reporter: Greg Dietz Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.6.11 OS:Windows
Assigned to: Assigned Account CPU Architecture:Any
Tags: fulltext, innodb

[23 Apr 2013 19:40] Greg Dietz
Description:
Despite my previous Bug #68150 "InnoDB FULLTEXT Phrase Logic Incorrect" being closed with the 5.6.11 release - apparently the person assigned this bug only tested the literal cases in that issue and didn't really test the capabilities fully.

In the previous bug I reported that 2 issues - the + sign in front of a phrase was mishandled and a phrase containing stop words would return 0 rows. Technically these two issues do appear to be partially fixed in 5.6.11 but this functionality is still very broken. 

I installed the 5.6.11 and rebuilt the tables so the fulltext index was rebuilt and ran a series of simple boolean mode queries.

1) Simple Required Phrase
select count(*) from article_sun where match(headline, byline, content)
against ('+"Immigration and Customs Enforcement"' in boolean mode);
--- Returns 8 records

2) Compound Required Term and Required Phrase
select count(*) from article_sun where match(headline, byline, content)
against ('+secure +"Immigration and Customs Enforcement"' in boolean mode);
--- Returns 3 records

3) Compound Required Phrase and Required Term
select count(*) from article_sun where match(headline, byline, content)
against ('+"Immigration and Customs Enforcement" +secure' in boolean mode);
--- Returns 4000 records

Queries 2 and 3 are logically identical in both the phrase and the term are required (+) but in query 3 there are 4000 records located even though only 8 in the entire database have that phrase.

4) Simple Phrase with a short non-indexed word
select count(*) from article_sun where match(headline, byline, content)
against ('"Department of Homeland Security"' in boolean mode);
--- Returns 72 records

5) Simple Partial phrase backed up by a LIKE clause
select count(*) from article_sun where match(headline, byline, content)
against ('+"Homeland Security"' in boolean mode)
and concat(headline, ' ', ifnull(byline, ''), ' ', content) like '% Department of Homeland Security%';
--- Returns 77 records

I could understand if query 5 returned more rows if there were cases where the word Security was not a full word in the content. But I evaluated all 5 records and they should have all shown up with query 4.

How to repeat:
This is base functionality - I shouldn't have to create a special script to test what should be in the basic unit tests for this feature.
[12 Jun 2013 16:20] Erlend Dahl
Fixed in 5.6.13, 5.7.2 as a duplicate of bug#68720