Bug #72122 InnoDB Fulltext Index behaves differently than MyISAM
Submitted: 25 Mar 2014 9:12 Modified: 25 Mar 2014 10:36
Reporter: Oli Sennhauser Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.6.12 OS:Linux (5.6.12-log openSUSE packag)
Assigned to: CPU Architecture:Any
Tags: innodb fulltext index boolean mode

[25 Mar 2014 9:12] Oli Sennhauser
Description:
InnoDB Fulltext Index behaves differently than MyISAM in BOOLEAN MODE.

How to repeat:
select ta.interpret,ta.titel,ta.label,ta.labelbez
from tArtikel ta
where MATCH (ta.interpret,ta.titel,ta.label,ta.labelbez) AGAINST ('*me*' in boolean mode)
;

On MyISAM:
5013 rows in set (0.38 sec)

On InnoDB:
Empty set (0.00 sec)

A '*' wild-card in the beginning of the search phrase seems not to work. Positive example: AGAINST ('me*' in boolean mode)

On MyISAM:
5013 rows in set (0.35 sec)

On InnoDB:
Empty set (0.00 sec)
5013 rows in set (0.15 sec)

Suggested fix:
Make InnoDB FT search being the same tolerant as MyISAM FT search.
[25 Mar 2014 9:16] Oli Sennhauser
Tables have been loaded AFTER changing ft_min_word_len / innodb_ft_min_token_size

+--------------------------+----------------+
| Variable_name            | Value          |
+--------------------------+----------------+
| ft_boolean_syntax        | + -><()~*:""&| |
| ft_max_word_len          | 84             |
| ft_min_word_len          | 1              |
| ft_query_expansion_limit | 20             |
| ft_stopword_file         |                |
+--------------------------+----------------+

+---------------------------------+---------+
| Variable_name                   | Value   |
+---------------------------------+---------+
| innodb_ft_aux_table             |         |
| innodb_ft_cache_size            | 8000000 |
| innodb_ft_enable_diag_print     | OFF     |
| innodb_ft_enable_stopword       | ON      |
| innodb_ft_max_token_size        | 84      |
| innodb_ft_min_token_size        | 1       |
| innodb_ft_num_word_optimize     | 2000    |
| innodb_ft_server_stopword_table |         |
| innodb_ft_sort_pll_degree       | 2       |
| innodb_ft_user_stopword_table   |         |
+---------------------------------+---------+
[25 Mar 2014 10:03] Oli Sennhauser
We found possibly the solution: On the MyISAM system the stopwords were disabled on the InnoDB system NOT. An 'me' is possibly a stopword.
[25 Mar 2014 10:07] Oli Sennhauser
It seems like stopword table did not help. It must be combination of '*+' which behaves differently
[25 Mar 2014 10:24] Oli Sennhauser
CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` varchar(64) DEFAULT NULL,
  `ts` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `data` (`data`)
)

mysql> alter table test engine = myisam;
Query OK, 8 rows affected (0.17 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> select * from test where MATCH (data) AGAINST ('*+me*' in boolean mode);
+----+------------+------+
| id | data       | ts   |
+----+------------+------+
|  4 | me is good | NULL |
|  8 | brabbel me | NULL |
|  5 | mene muuh  | NULL |
+----+------------+------+
3 rows in set (0.00 sec)

mysql> select * from test where MATCH (data) AGAINST ('* +me*' in boolean mode);
+----+------------+------+
| id | data       | ts   |
+----+------------+------+
|  4 | me is good | NULL |
|  8 | brabbel me | NULL |
|  5 | mene muuh  | NULL |
+----+------------+------+
3 rows in set (0.00 sec)

mysql> alter table test engine innodb;
Query OK, 8 rows affected (1.32 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> select * from test where MATCH (data) AGAINST ('*+me*' in boolean mode);
Empty set (0.00 sec)

mysql> select * from test where MATCH (data) AGAINST ('* +me*' in boolean mode);
Empty set (0.00 sec)
[25 Mar 2014 10:26] Oli Sennhauser
This query works but needs rewriting of application code. So innodb ft search is not 100% compatible with myisam ft search:

mysql> select * from test where MATCH (data) AGAINST ('+me*' in boolean mode);
+----+------------+------+
| id | data       | ts   |
+----+------------+------+
|  4 | me is good | NULL |
|  5 | mene muuh  | NULL |
|  8 | brabbel me | NULL |
+----+------------+------+
3 rows in set (0.00 sec)
[25 Mar 2014 10:36] Oli Sennhauser
Duplicate of bug: http://bugs.mysql.com/bug.php?id=68948