Bug #74845 InnoDB fulltext boolean search incorrectly handles parentheses
Submitted: 13 Nov 2014 23:18 Modified: 4 Dec 2014 15:10
Reporter: Matt Swanson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:5.6.19 OS:Linux
Assigned to: CPU Architecture:Any
Tags: boolean, fulltext, innodb

[13 Nov 2014 23:18] Matt Swanson
Description:
InnoDB fulltext boolean queries incorrectly handle plus combined with parentheses. Eg

+word1 +(>word2 <word3)

The query is matching any rows containing "word1" and ignoring the values in parentheses. This is contrary to the documentation and result with MyISAM. The expected result from http://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html is: 

'+apple +(>turnover <strudel)'

Find rows that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”. 

How to repeat:
-- Create a test table
CREATE TABLE `bool_test` (
    `id` INT NOT NULL AUTO_INCREMENT,
    `content` TEXT NOT NULL,
    PRIMARY KEY (`id`),
    FULLTEXT (`content`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- Insert data
INSERT INTO `bool_test` (`content`) VALUES
('Pumpkin soup with cheese bread'),
('Yellow chicken curry'),
('Fresh green vegetables with garlic');

-- These queries behave as expected and correctly match the first row
SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+pumpkin' IN BOOLEAN MODE);
SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+cheese' IN BOOLEAN MODE);
SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+(pumpkin cheese)' IN BOOLEAN MODE);

-- These queries incorrect match the first row
SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+pumpkin +(souffle)' IN BOOLEAN MODE);
SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+pumpkin +(souffle tart)' IN BOOLEAN MODE);
SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+pumpkin +(>souffle <tart)' IN BOOLEAN MODE);

Suggested fix:
Ideally the InnoDB fulltext search would be changed to mirror the MyISAM results in this instance. If that's not possible perhaps the documentation could be updated to explain the difference.
[13 Nov 2014 23:30] MySQL Verification Team
Thank you for the bug report.

C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --debug-info --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.22 Source distribution

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > USE test
Database changed
mysql 5.6 > -- These queries behave as expected and correctly match the first row
mysql 5.6 > SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+pumpkin' IN BOOLEAN MODE);
+----+--------------------------------+
| id | content                        |
+----+--------------------------------+
|  1 | Pumpkin soup with cheese bread |
+----+--------------------------------+
1 row in set (0.00 sec)

mysql 5.6 > -- These queries incorrect match the first row
mysql 5.6 > SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+pumpkin +(souffle)' IN BOOLEAN MODE);
+----+--------------------------------+
| id | content                        |
+----+--------------------------------+
|  1 | Pumpkin soup with cheese bread |
+----+--------------------------------+
1 row in set (0.00 sec)

mysql 5.6 > ALTER TABLE bool_test ENGINE = MyISAM;
Query OK, 3 rows affected (0.30 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql 5.6 > -- These queries incorrect match the first row
mysql 5.6 > SELECT * FROM `bool_test` WHERE MATCH(`content`) AGAINST('+pumpkin +(souffle)' IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql 5.6 >
[4 Dec 2014 15:10] Paul DuBois
Noted in 5.6.23, 5.7.6 changelogs.

InnoDB boolean full-text searches incorrectly handled + combined with
parentheses; for example, +word1 +(>word2 <word3).
[12 Feb 2015 13:23] Laurynas Biveinis
$ git show -s 867a3c2b
commit 867a3c2b2f20884702c78432f42013a4717d028b
Author: Annamalai Gurusami <annamalai.gurusami@oracle.com>
Date:   Thu Dec 4 14:23:52 2014 +0530

    Bug #20028323 INNODB FULLTEXT BOOLEAN SEARCH INCORRECTLY HANDLES PARENTHESES
    
    Problem:
    
    InnoDB fulltext boolean queries incorrectly handle plus combined with
    parentheses. For example,
    
    +word1 +(>word2 <word3)
    
    The query is matching any rows containing "word1" and ignoring the values in
    parentheses.
    
    The documentation mentions about the following behaviour:
    
     '+apple +(>turnover <strudel)'
    
    Find rows that contain the words “apple” and “turnover”, or “apple” and
    “strudel” (in any order), but rank “apple turnover” higher than “apple
    strudel”.
    
    Solution:
    
    Analysis showed that the result of the subexpression was being ignored if
    it was empty.  This behaviour is incorrect.  Allow the subexpression to be
    empty, which is necessary to handle the above case.
    
    rb#7435 approved by Jimmy.