| Bug #68150 | InnoDB FULLTEXT Phrase Logic Incorrect | ||
|---|---|---|---|
| Submitted: | 22 Jan 2013 21:39 | Modified: | 12 Feb 2013 18:32 |
| Reporter: | Greg Dietz | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
| Version: | 5.6.7 | OS: | Windows |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | fulltext, innodb | ||
[23 Jan 2013 1:12]
Jesper wisborg Krogh
Verified
mysql> SHOW CREATE TABLE testinnodbfull\G
*************************** 1. row ***************************
Table: testinnodbfull
Create Table: CREATE TABLE `testinnodbfull` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`content` text NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `IDX_CONTEXT_FULLTEXT` (`content`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
1 row in set (0.01 sec)
Same tests works in MyISAM:
mysql> CREATE TABLE testmyisamfull LIKE testinnodbfull;
Query OK, 0 rows affected (0.24 sec)
mysql> ALTER TABLE testmyisamfull ENGINE=MyISAM;
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> INSERT INTO testmyisamfull SELECT * FROM testinnodbfull;
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select *
-> from testmyisamfull
-> where match(content) against('"complicated phrase structure"' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content |
+----+---------------------------------------------------------------------------------+
| 1 | This is a story which has has a complicated phrase structure here in the middle |
+----+---------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> select *
-> from testmyisamfull
-> where match(content) against('+"complicated phrase structure"' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content |
+----+---------------------------------------------------------------------------------+
| 1 | This is a story which has has a complicated phrase structure here in the middle |
+----+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select *
-> from testmyisamfull
-> where match(content) against('"complicated the phrase structure"' in boolean mode);
+----+-----------------------------------------------------------+
| id | content |
+----+-----------------------------------------------------------+
| 3 | This is a story that has complicated the phrase structure |
+----+-----------------------------------------------------------+
1 row in set (0.01 sec)
[12 Feb 2013 18:32]
John Russell
Added to changelog for 5.6.11:
A full-text query using Boolean mode could return zero results in
some cases where the search term was a quoted phrase:
* If the quoted
phrase was preceded by a + sign. For example, this combination of a
Boolean + operator and a phrase would return zero results:
where match(content) against(+"required term due to plus sign" in boolean
mode)
* If the quoted phrase contained any stopwords. For example, the
stopword "the" inside the phrase caused the query to return zero
results:
where match(content) against("stopword inside the phrase" in boolean mode)

Description: I have run into a series of issues with the handling of phrases in the InnoDB FULLTEXT searches. I haven't seen any release notes indicating these have been addressed in 5.6.8 or later, if I missed them please advise. See the How to Repeat below for details but at a high level: IN BOOOLEAN MODE * Cannot put a + (AND) sign in front of a phrase - completely confuses the parser and results in zero rows * If the phrase contains a stop word or short word - will return 0 rows Phrase matches typically involve finding the articles that match the individual key words - then cross referencing them together and verifying the words appear in a specific order. So in the first case it finds all articles that match "this" AND "phrase" AND "here" and then takes that set and filters it to exact matches. My guess is in the second phrase it is doing "This" AND "is" AND "my" AND "phrase". Because "is" and "my" are not indexed it is returning 0 records. When it should be logically looking for "This" AND "phrase" and then taking those results and looking for the entire "This is my phrase" section. If my guess is right then hopefully this is an easy fix to the logic to drop the stop words/short words in pass one of the search. VS MYISAM FULLTEXT I have tested the same queries against a MyISAM FULLTEXT index and get the expected behavior so this is some basic bugs in the processing of InnoDB queries. How to repeat: -- Create a simple test table CREATE TABLE `testinnodbfull` ( `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, `content` TEXT NOT NULL, PRIMARY KEY (`id`), FULLTEXT INDEX `IDX_CONTEXT_FULLTEXT`(`content`) ) ENGINE = InnoDB; -- Insert 3 short rows insert into testinnodbfull (content) values ('This is a story which has has a complicated phrase structure here in the middle'), ('This is a story which doesn''t have that text'), ('This is a story that has complicated the phrase structure'); -- Test phrases -- CORRECT: Returns 1 row (Boolean Mode - no stop words or short words) select * from testinnodbfull where match(content) against('"complicated phrase structure"' in boolean mode); -- BROKEN: Returns 0 row +Phrase syntax broken - should return 1 row select * from testinnodbfull where match(content) against('+"complicated phrase structure"' in boolean mode); -- BROKEN: Returns 0 row ("the" in the middle is breaking the entire match) - should return 1 row select * from testinnodbfull where match(content) against('"complicated the phrase structure"' in boolean mode);