Bug #68150 InnoDB FULLTEXT Phrase Logic Incorrect
Submitted: 22 Jan 2013 21:39 Modified: 12 Feb 2013 18:32
Reporter: Greg Dietz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.7 OS:Windows
Assigned to: CPU Architecture:Any
Tags: fulltext, innodb

[22 Jan 2013 21:39] Greg Dietz
Description:
I have run into a series of issues with the handling of phrases in the InnoDB FULLTEXT searches. I haven't seen any release notes indicating these have been addressed in 5.6.8 or later, if I missed them please advise.

See the How to Repeat below for details but at a high level:

IN BOOOLEAN MODE
* Cannot put a + (AND) sign in front of a phrase - completely confuses the parser and results in zero rows
* If the phrase contains a stop word or short word - will return 0 rows

Phrase matches typically involve finding the articles that match the individual key words - then cross referencing them together and verifying the words appear in a specific order. So in the first case it finds all articles that match "this" AND "phrase" AND "here" and then takes that set and filters it to exact matches.

My guess is in the second phrase it is doing "This" AND "is" AND "my" AND "phrase". Because "is" and "my" are not indexed it is returning 0 records. When it should be logically looking for "This" AND "phrase" and then taking those results and looking for the entire "This is my phrase" section. If my guess is right then hopefully this is an easy fix to the logic to drop the stop words/short words in pass one of the search.

VS MYISAM FULLTEXT

I have tested the same queries against a MyISAM FULLTEXT index and get the expected behavior so this is some basic bugs in the processing of InnoDB queries.

How to repeat:
-- Create a simple test table
CREATE TABLE `testinnodbfull` (
  `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  `content` TEXT NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT INDEX `IDX_CONTEXT_FULLTEXT`(`content`)
)
ENGINE = InnoDB;

-- Insert 3 short rows
insert into testinnodbfull (content)
values
('This is a story which has has a complicated phrase structure here in the middle'),
('This is a story which doesn''t have that text'),
('This is a story that has complicated the phrase structure');

-- Test phrases

-- CORRECT: Returns 1 row (Boolean Mode - no stop words or short words)
select *
from testinnodbfull
where match(content) against('"complicated phrase structure"' in boolean mode);

-- BROKEN: Returns 0 row +Phrase syntax broken - should return 1 row
select *
from testinnodbfull
where match(content) against('+"complicated phrase structure"' in boolean mode);

-- BROKEN: Returns 0 row ("the" in the middle is breaking the entire match) - should return 1 row
select *
from testinnodbfull
where match(content) against('"complicated the phrase structure"' in boolean mode);
[23 Jan 2013 1:12] Jesper wisborg Krogh
Verified

mysql> SHOW CREATE TABLE testinnodbfull\G
*************************** 1. row ***************************
       Table: testinnodbfull
Create Table: CREATE TABLE `testinnodbfull` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `content` text NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `IDX_CONTEXT_FULLTEXT` (`content`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

Same tests works in MyISAM:

mysql> CREATE TABLE testmyisamfull LIKE testinnodbfull;
Query OK, 0 rows affected (0.24 sec)

mysql> ALTER TABLE testmyisamfull ENGINE=MyISAM;
Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> INSERT INTO testmyisamfull SELECT * FROM testinnodbfull;
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select *
    -> from testmyisamfull
    -> where match(content) against('"complicated phrase structure"' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content                                                                         |
+----+---------------------------------------------------------------------------------+
|  1 | This is a story which has has a complicated phrase structure here in the middle |
+----+---------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> select *
    -> from testmyisamfull
    -> where match(content) against('+"complicated phrase structure"' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content                                                                         |
+----+---------------------------------------------------------------------------------+
|  1 | This is a story which has has a complicated phrase structure here in the middle |
+----+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select *
    -> from testmyisamfull
    -> where match(content) against('"complicated the phrase structure"' in boolean mode);
+----+-----------------------------------------------------------+
| id | content                                                   |
+----+-----------------------------------------------------------+
|  3 | This is a story that has complicated the phrase structure |
+----+-----------------------------------------------------------+
1 row in set (0.01 sec)
[12 Feb 2013 18:32] John Russell
Added to changelog for 5.6.11: 

A full-text query using Boolean mode could return zero results in
some cases where the search term was a quoted phrase: 

* If the quoted
phrase was preceded by a + sign. For example, this combination of a
Boolean + operator and a phrase would return zero results: 

where match(content) against(+"required term due to plus sign" in boolean
mode)

* If the quoted phrase contained any stopwords. For example, the
stopword "the" inside the phrase caused the query to return zero
results: 

where match(content) against("stopword inside the phrase" in boolean mode)