Bug #69762 InnoDB fulltext match against in boolean mode misses results on join
Submitted: 17 Jul 2013 0:36 Modified: 18 Sep 2013 8:57
Reporter: Marc Peterson Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.6.9/5.6.12 OS:Any
Assigned to: CPU Architecture:Any

[17 Jul 2013 0:36] Marc Peterson
Description:
An InnoDB fulltext search in boolean mode that uses indices on two tables will miss results, while MyISAM finds them.

Summary:
SELECT a.col_a, b.col_b FROM a
LEFT JOIN b ON a.id = b.id
WHERE MATCH(a.col_a, b.col_b) AGAINST('+keyword' IN BOOLEAN MODE);

InnoDB will only show results if the keyword is in col_a and col_b.  MyISAM will show results if keyword is in either.

How to repeat:
# Create InnoDB tables:
CREATE TABLE `fruit` (
  `fruit_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(45) CHARACTER SET latin1 NOT NULL DEFAULT '',
  PRIMARY KEY (`fruit_id`),
  FULLTEXT KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `review` (
  `review_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `fruit_id` int(10) unsigned NOT NULL DEFAULT '0',
  `content` varchar(255) CHARACTER SET latin1 NOT NULL DEFAULT '',
  PRIMARY KEY (`review_id`),
  FULLTEXT KEY `idx_content` (`content`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

# Create matching MyISAM tables:
CREATE TABLE `fruit_m` (
  `fruit_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(45) CHARACTER SET latin1 NOT NULL DEFAULT '',
  PRIMARY KEY (`fruit_id`),
  FULLTEXT KEY `idx_name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `review_m` (
  `review_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `fruit_id` int(10) unsigned NOT NULL DEFAULT '0',
  `content` varchar(255) CHARACTER SET latin1 NOT NULL DEFAULT '',
  PRIMARY KEY (`review_id`),
  FULLTEXT KEY `idx_content` (`content`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

# Insert identical rows in each table:
INSERT INTO `fruit` (`name`) VALUES ('apple');
INSERT INTO `fruit` (`name`) VALUES ('banana');
INSERT INTO `review` (`fruit_id`, `content`) VALUES (1, 'Round and juicy.');
INSERT INTO `review` (`fruit_id`, `content`) VALUES (2, 'Far better than apple.');
INSERT INTO `review` (`fruit_id`, `content`) VALUES (1, 'I like a good piece of apple.');
INSERT INTO `review` (`fruit_id`, `content`) VALUES (2, 'I always choose banana over apple.');
INSERT INTO `review` (`fruit_id`, `content`) VALUES (1, 'Not my favorite.');
INSERT INTO `review` (`fruit_id`, `content`) VALUES (2, 'I prefer something else.');

INSERT INTO `fruit_m` (`name`) VALUES ('apple');
INSERT INTO `fruit_m` (`name`) VALUES ('banana');
INSERT INTO `review_m` (`fruit_id`, `content`) VALUES (1, 'Round and juicy.');
INSERT INTO `review_m` (`fruit_id`, `content`) VALUES (2, 'Far better than apple.');
INSERT INTO `review_m` (`fruit_id`, `content`) VALUES (1, 'I like a good piece of apple.');
INSERT INTO `review_m` (`fruit_id`, `content`) VALUES (2, 'I always choose banana over apple.');
INSERT INTO `review_m` (`fruit_id`, `content`) VALUES (1, 'Not my favorite.');
INSERT INTO `review_m` (`fruit_id`, `content`) VALUES (2, 'I prefer something else.');

# Query InnoDB
SELECT f.fruit_id, f.name, r.content
FROM fruit f
LEFT JOIN review r ON f.fruit_id = r.fruit_id
WHERE MATCH(f.name, r.content) AGAINST('+apple' IN BOOLEAN MODE);
+----------+--------+------------------------------------+
| fruit_id | name   | content                            |
+----------+--------+------------------------------------+
|        1 | apple  | I like a good piece of apple.      |
|        2 | banana | Far better than apple.             |
|        2 | banana | I always choose banana over apple. |
+----------+--------+------------------------------------+
3 rows in set (0.01 sec)

# Query MyISAM
SELECT f.fruit_id, f.name, r.content
FROM fruit_m f
LEFT JOIN review_m r ON f.fruit_id = r.fruit_id
WHERE MATCH(f.name, r.content) AGAINST('+apple' IN BOOLEAN MODE);
+----------+--------+------------------------------------+
| fruit_id | name   | content                            |
+----------+--------+------------------------------------+
|        1 | apple  | Round and juicy.                   |
|        1 | apple  | I like a good piece of apple.      |
|        1 | apple  | Not my favorite.                   |
|        2 | banana | Far better than apple.             |
|        2 | banana | I always choose banana over apple. |
+----------+--------+------------------------------------+
5 rows in set (0.00 sec)

As you can see, InnoDB only shows results if "apple" is in the column in both tables.  In MyISAM results are returned if "apple" is found in either column.  This makes a straight migration difficult.

I did find a workaround.  Use ORs in the InnoDB match against:

SELECT f.fruit_id, f.name, r.content
FROM fruit f
LEFT JOIN review r ON f.fruit_id = r.fruit_id
WHERE MATCH(f.name) AGAINST('+apple' IN BOOLEAN MODE)
  OR MATCH(r.content) AGAINST('+apple' IN BOOLEAN MODE);
+----------+--------+------------------------------------+
| fruit_id | name   | content                            |
+----------+--------+------------------------------------+
|        1 | apple  | Round and juicy.                   |
|        1 | apple  | I like a good piece of apple.      |
|        1 | apple  | Not my favorite.                   |
|        2 | banana | Far better than apple.             |
|        2 | banana | I always choose banana over apple. |
+----------+--------+------------------------------------+
5 rows in set (0.00 sec)

Suggested fix:
InnoDB should treat MATCH AGAINST IN BOOLEAN MODE the same way MyISAM does if indices in two tables are in the same statement, as it is more accurate.
[17 Jul 2013 1:04] MySQL Verification Team
Thank you for the bug report.

mysql> # Query InnoDB
mysql> SELECT f.fruit_id, f.name, r.content
    -> FROM fruit f
    -> LEFT JOIN review r ON f.fruit_id = r.fruit_id
    -> WHERE MATCH(f.name, r.content) AGAINST('+apple' IN BOOLEAN MODE);
+----------+--------+------------------------------------+
| fruit_id | name   | content                            |
+----------+--------+------------------------------------+
|        1 | apple  | I like a good piece of apple.      |
|        2 | banana | Far better than apple.             |
|        2 | banana | I always choose banana over apple. |
+----------+--------+------------------------------------+
3 rows in set (0.08 sec)

mysql> SELECT f.fruit_id, f.name, r.content
    -> FROM fruit_m f
    -> LEFT JOIN review_m r ON f.fruit_id = r.fruit_id
    -> WHERE MATCH(f.name, r.content) AGAINST('+apple' IN BOOLEAN MODE);
+----------+--------+------------------------------------+
| fruit_id | name   | content                            |
+----------+--------+------------------------------------+
|        1 | apple  | Round and juicy.                   |
|        1 | apple  | I like a good piece of apple.      |
|        1 | apple  | Not my favorite.                   |
|        2 | banana | Far better than apple.             |
|        2 | banana | I always choose banana over apple. |
+----------+--------+------------------------------------+
5 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 5.6.12                       |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| version                 | 5.6.12                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Win64                        |
+-------------------------+------------------------------+
7 rows in set (0.03 sec)
[24 Jul 2013 12:14] Erlend Dahl
[19 Jun 2013 11:30] Paul Dubois

Noted in 5.6.13, 5.7.2 changelogs.

Unlike MyISAM, InnoDB does not support boolean full-text searches on
nonindexed columns, but this restriction was not enforced, resulting
in queries that returned incorrect results.
[24 Jul 2013 16:22] Marc Peterson
There is a fulltext index on all selected columns.
[25 Jul 2013 1:38] Shaohua Wang
Yes, but we don't support multiple fulltext indexes from one table or more 
tables in optimizer yet.

please see http://dev.mysql.com/doc/refman/5.6/en/fulltext-restrictions.html. 

" The MATCH() column list must match exactly the column list in some FULLTEXT 
index definition for the table, unless this MATCH() is IN BOOLEAN MODE on a 
MyISAM table. For MyISAM tables, boolean-mode searches can be done on 
nonindexed columns, although they are likely to be slow. " 

so we strongly recommend users use the workaround.

BTW: you will get error as below:
1210: Incorrect arguments to MATCH

SELECT f.fruit_id, f.name, r.content 
FROM fruit f 
LEFT JOIN review r ON f.fruit_id = r.fruit_id 
WHERE MATCH(f.name, r.content) AGAINST('+apple' IN BOOLEAN MODE)'
[25 Jul 2013 1:39] Shaohua Wang
In 5.6.13 or later.
[25 Jul 2013 3:47] Marc Peterson
I would argue the citation mentioned doesn't describe multiple tables, and certainly doesn't mention a workaround.  Maybe I'm reading it wrong.  In any case, I'm just reporting a way in which InnoDB fulltext searches report different results from MyISAM searches.  If you all say it's doing what it's supposed to be doing then that's ok by me.
[25 Jul 2013 6:26] Shaohua Wang
for innodb table, we don't support multiple tables on one match, like:
SELECT f.fruit_id, f.name, r.content 
FROM fruit f 
LEFT JOIN review r ON f.fruit_id = r.fruit_id 
WHERE MATCH(f.name, r.content) AGAINST('+apple' IN BOOLEAN MODE);

Even there is a separate index on each column, we can use sql below instead.
SELECT f.fruit_id, f.name, r.content 
FROM fruit f 
LEFT JOIN review r ON f.fruit_id = r.fruit_id 
WHERE MATCH(f.name) AGAINST('+apple' IN BOOLEAN MODE)
or MATCH(r.content) AGAINST('+apple' IN BOOLEAN MODE);

I'd say it's a workaround, and we will discuss with related people to find out a way to solve it.

Actually, "WHERE MATCH(f.name, r.content) AGAINST('+apple' IN BOOLEAN MODE)" is equal to "WHERE MATCH(r.content) AGAINST('+apple' IN BOOLEAN MODE)". Full text search on 'f.name' is ignored.
[14 Aug 2013 5:05] Shaohua Wang
From optimizer's opint of view:
Fixing this in the optimizer by rewriting the query will be very complicated.
For example, given the following query: 

SELECT f.fruit_id, f.name, r.content 
FROM fruit f JOIN review r ON f.fruit_id = r.fruit_id 
WHERE MATCH(f.name, r.content) AGAINST('+apple +banana' IN BOOLEAN MODE); 

MyISAM FTS will return rows where e.g., f.name contains 'apple' and r.content 
contains 'banana'(MyISAM can do fulltext search even there is no fulltext index).
In other words, we will have to process the string 
argument in order to find out how to rewrite this query. This sounds like a 
big job.  If I am not mistaken, the above query needs to be rewritten into 
something like: 

SELECT f.fruit_id, f.name, r.content 
FROM fruit f JOIN review r ON f.fruit_id = r.fruit_id 
WHERE MATCH(f.name) AGAINST('+apple +banana' IN BOOLEAN MODE) OR 
MATCH(r.content) AGAINST('+apple +banana' IN BOOLEAN MODE) OR ( MATCH(f.name) 
AGAINST('apple' IN BOOLEAN MODE) AND MATCH(r.content) AGAINST('banana' IN 
BOOLEAN MODE)) OR (MATCH(f.name) AGAINST('banana' IN BOOLEAN MODE) AND 
MATCH(r.content) AGAINST('apple' IN BOOLEAN MODE));
[18 Sep 2013 8:57] Erlend Dahl
[18 Sep 2013 1:43] Shaohua Wang

Since innodb doesn't support fulltext search on columns without fulltext
index, and it is very complicated to support search on columns in multiple fulltext indexes in optimizer, it won't be fixed.

We admit it's a point innodb fulltext is not compatible with myisam.