Bug #75755 Fulltext search behaviour with MyISAM vs. InnoDB (wrong result with InnoDB)
Submitted: 3 Feb 2015 21:32 Modified: 19 Feb 2015 13:07
Reporter: Elena Stepanova Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.6, 5.7 OS:Any
Assigned to: CPU Architecture:Any

[3 Feb 2015 21:32] Elena Stepanova
Description:
Filing on behalf of our user, initially reported as https://mariadb.atlassian.net/browse/MDEV-7523 . I'm sorry if it's a duplicate of an existing public bug, I couldn't find any.

InnoDB has troubles with empty string ('') in fulltext search. 

Below are results of the test case provided in 'How to repeat' section.

MySQL [test]> select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);
Empty set (0.34 sec)

MySQL [test]> 
MySQL [test]> alter table content engine=MyISAM;
Query OK, 2 rows affected (1.74 sec)
Records: 2  Duplicates: 0  Warnings: 0

MySQL [test]> select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);
+------------+-------+---------------------------------+
| content_id | title | story                           |
+------------+-------+---------------------------------+
|          1 |       | VQLTITThe VQLTITgood VQLTITwife |
|          2 |       | VQLTITThe VQLTITgood VQLTITwife |
+------------+-------+---------------------------------+
2 rows in set (0.00 sec)

MySQL [test]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 5.7.5-m15-debug |
+-----------------+
1 row in set (0.00 sec)

How to repeat:
CREATE TABLE `content` (
  `content_id` int(11) NOT NULL,
  `title` tinytext,
  `story` text,
  PRIMARY KEY (`content_id`),
  FULLTEXT KEY `key_all` (`title`,`story`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
;

insert into content values 
(1,'','VQLTITThe VQLTITgood VQLTITwife'),
(2,'','VQLTITThe VQLTITgood VQLTITwife');

select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);

alter table content engine=MyISAM;
select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);
[3 Feb 2015 21:56] Miguel Solorzano
Thank you for the bug report.

C:\dbs>net start mysqld56
The MySQLD56 service is starting.
The MySQLD56 service was started successfully.

C:\dbs>56

C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --debug-info --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.24 Source distribution

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > USE test
Database changed
mysql 5.6 > CREATE TABLE `content` (
    ->   `content_id` int(11) NOT NULL,
    ->   `title` tinytext,
    ->   `story` text,
    ->   PRIMARY KEY (`content_id`),
    ->   FULLTEXT KEY `key_all` (`title`,`story`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    -> ;
Query OK, 0 rows affected (1.25 sec)

mysql 5.6 >
mysql 5.6 > insert into content values
    -> (1,'','VQLTITThe VQLTITgood VQLTITwife'),
    -> (2,'','VQLTITThe VQLTITgood VQLTITwife');
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.6 >
mysql 5.6 >
mysql 5.6 > select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql 5.6 > alter table content engine=MyISAM;
Query OK, 2 rows affected (0.22 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.6 > select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);
+------------+-------+---------------------------------+
| content_id | title | story                           |
+------------+-------+---------------------------------+
|          1 |       | VQLTITThe VQLTITgood VQLTITwife |
|          2 |       | VQLTITThe VQLTITgood VQLTITwife |
+------------+-------+---------------------------------+
2 rows in set (0.00 sec)

mysql 5.6 > exit
Bye

C:\dbs>net start mysqld57
The MySQLD57 service is starting...
The MySQLD57 service was started successfully.

C:\dbs>57

C:\dbs>c:\dbs\5.7\bin\mysql -uroot --port=3570 --prompt="mysql 5.7 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.7.6-m16-debug Source distribution

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.7 > USE test
Database changed
mysql 5.7 > CREATE TABLE `content` (
    ->   `content_id` int(11) NOT NULL,
    ->   `title` tinytext,
    ->   `story` text,
    ->   PRIMARY KEY (`content_id`),
    ->   FULLTEXT KEY `key_all` (`title`,`story`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    -> ;
Query OK, 0 rows affected (2.00 sec)

mysql 5.7 >
mysql 5.7 > insert into content values
    -> (1,'','VQLTITThe VQLTITgood VQLTITwife'),
    -> (2,'','VQLTITThe VQLTITgood VQLTITwife');
Query OK, 2 rows affected (0.05 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.7 >
mysql 5.7 > select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);
Empty set (0.03 sec)

mysql 5.7 > alter table content engine=MyISAM;
Query OK, 2 rows affected (0.86 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql 5.7 > select * from content where MATCH (`title`,`story`) AGAINST ('("VQLTITThe VQLTITgood VQLTITwife")' IN BOOLEAN MODE);
+------------+-------+---------------------------------+
| content_id | title | story                           |
+------------+-------+---------------------------------+
|          1 |       | VQLTITThe VQLTITgood VQLTITwife |
|          2 |       | VQLTITThe VQLTITgood VQLTITwife |
+------------+-------+---------------------------------+
2 rows in set (0.00 sec)

mysql 5.7 >
[5 Feb 2015 8:23] Shaohua Wang
Posted by developer:
 
the root cause is that we don't handle empty string properly when tokenizing document in phrase match.
[6 Feb 2015 3:31] Shaohua Wang
Posted by developer:
 
The root cause is that we don't handle empty string correctly when tokenizing a newly inserted row. The positions of a token
are wrong, so it causes no matching in phrase search.
[19 Feb 2015 13:07] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.6.24, 5.7.6, 5.8.0 release, and here's the changelog entry:

A full-text phrase search returned an incorrect result. An empty string
was handled incorrectly when tokenizing a newly inserted row.

Thank you for the bug report.
[27 Apr 2015 11:21] Laurynas Biveinis
commit 037505703337930bedd97f6e5e36530b32c3fcd2
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date:   Fri Feb 6 10:48:12 2015 +0800

    BUG#20465273 - INNODB FTS: WRONG RESULT WITH PHRASE SEARCH
    
    The root cause is that we don't handle empty string correctly
    when tokenizing a newly inserted row. The positions of tokens
    are wrong, so it causes no matching in phrase search.
    
    Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
    RB: 7933