Bug #68948 InnoDB table fulltext index in boolean mode ignores leading * (MySQL 5.6.10)
Submitted: 13 Apr 2013 14:30 Modified: 2 Jul 2013 15:49
Reporter: Zoltan Fedor Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.10 OS:Linux (CentOS 6.4)
Assigned to:
Tags: boolean mode, fulltext, innodb
Triage: Needs Triage: D3 (Medium)

[13 Apr 2013 14:30] Zoltan Fedor
Description:
I was migrating from MySQL 5.1 to 5.6.10 and turning MyISAM tables to InnoDB now that InnoDB has fulltext search. Unfortunately I found this bug which now forces me to go back to MyISAM.
When running fulltext search in boolean mode then the leading * is ignored:

this works:
match(content) against('complicate*' in boolean mode);

this doesn't:
match(content) against('*omplicate*' in boolean mode);

In MyISAM both works.

Luckily this is only a syntax bug - a difference between MyISAM and InnoDB FT search how the leading * syntax is interpreted. The leading * results the same data in MyISAM and no leading * in InnoDB.

How to repeat:

1. Create table:
CREATE TABLE `testinnodbfull` (
  `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
  `content` TEXT NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT INDEX `IDX_CONTEXT_FULLTEXT`(`content`)
)
ENGINE = InnoDB;

2. Insert data:
insert into testinnodbfull (content)
values
('This is a story which has has a complicated phrase structure here in the middle'),
('This is a story which doesn''t have that text'),
('This is a story that has complicated the phrase structure');

3. Run search:
a. this works:
select *
from testinnodbfull
where match(content) against('complicate*' in boolean mode);

b. this doesn't:
select *
from testinnodbfull
where match(content) against('*complicate*' in boolean mode);

If you repeat the above test case with MyISAM table then you will see that 3.b will return results for MyISAM but not for InnoDB.
[13 Apr 2013 14:32] Zoltan Fedor
Sorry, at line 8 I made a mistake, the line should have been
match(content) against('*omplicate*' in boolean mode);
instead of
match(content) against('*complicate*' in boolean mode);
[13 Apr 2013 16:45] Godofredo Miguel Solorzano
Thank you for the bug report. Verified as described.

miguel@tikal:~/dbs/5.6/data> cd..
miguel@tikal:~/dbs/5.6> bin/mysql -uroot -p test
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.12-debug Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h'miguel@tikal:~/dbs/5.6/data> cd..
miguel@tikal:~/dbs/5.6> bin/mysql -uroot -p test
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.12-debug Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE `testinnodbfull` (
    ->   `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
    ->   `content` TEXT NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   FULLTEXT INDEX `IDX_CONTEXT_FULLTEXT`(`content`)
    -> )
    -> ENGINE = InnoDB;
Query OK, 0 rows affected (1.40 sec)

mysql> 
mysql> insert into testinnodbfull (content)
    -> values
    -> ('This is a story which has has a complicated phrase structure here in the middle'),
    -> ('This is a story which doesn''t have that text'),
    -> ('This is a story that has complicated the phrase structure');
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> 
mysql> select *
    -> from testinnodbfull
    -> where match(content) against('complicate*' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content                                                                         |
+----+---------------------------------------------------------------------------------+
|  1 | This is a story which has has a complicated phrase structure here in the middle |
|  3 | This is a story that has complicated the phrase structure                       |
+----+---------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> 
mysql> select *
    -> from testinnodbfull
    -> where match(content) against('*complicate*' in boolean mode)
    -> ;
Empty set (0.00 sec)

mysql> ALTER TABLE testinnodbfull ENGINE MyISAM;
Query OK, 3 rows affected (0.32 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from testinnodbfull where match(content) against('*complicate*' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content                                                                         |
+----+---------------------------------------------------------------------------------+
|  1 | This is a story which has has a complicated phrase structure here in the middle |
|  3 | This is a story that has complicated the phrase structure                       |
+----+---------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> ALTER TABLE testinnodbfull ENGINE InnoDB;
Query OK, 3 rows affected (1.81 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from testinnodbfull where match(content) against('*complicate*' in boolean mode);
Empty set (0.00 sec)
 for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE `testinnodbfull` (
    ->   `id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
    ->   `content` TEXT NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   FULLTEXT INDEX `IDX_CONTEXT_FULLTEXT`(`content`)
    -> )
    -> ENGINE = InnoDB;
Query OK, 0 rows affected (1.40 sec)

mysql> 
mysql> insert into testinnodbfull (content)
    -> values
    -> ('This is a story which has has a complicated phrase structure here in the middle'),
    -> ('This is a story which doesn''t have that text'),
    -> ('This is a story that has complicated the phrase structure');
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> 
mysql> select *
    -> from testinnodbfull
    -> where match(content) against('complicate*' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content                                                                         |
+----+---------------------------------------------------------------------------------+
|  1 | This is a story which has has a complicated phrase structure here in the middle |
|  3 | This is a story that has complicated the phrase structure                       |
+----+---------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> 
mysql> select *
    -> from testinnodbfull
    -> where match(content) against('*complicate*' in boolean mode)
    -> ;
Empty set (0.00 sec)

mysql> ALTER TABLE testinnodbfull ENGINE MyISAM;
Query OK, 3 rows affected (0.32 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from testinnodbfull where match(content) against('*complicate*' in boolean mode);
+----+---------------------------------------------------------------------------------+
| id | content                                                                         |
+----+---------------------------------------------------------------------------------+
|  1 | This is a story which has has a complicated phrase structure here in the middle |
|  3 | This is a story that has complicated the phrase structure                       |
+----+---------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> ALTER TABLE testinnodbfull ENGINE InnoDB;
Query OK, 3 rows affected (1.81 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from testinnodbfull where match(content) against('*complicate*' in boolean mode);
Empty set (0.00 sec)
[18 Jun 2013 8:48] Shaohua Wang
Zoltan, what do you mean by this bug? 

1. you can get result by ('*omplicate*' in boolean mode)? I've tested in myisam, but got no result.
2. you can get result by ('*complicate*' in boolean mode)? yes, In innodb, we return syntax error, other than ignoring the leading '*' like myisam. I don't think it's the right behavior.
[18 Jun 2013 8:49] Shaohua Wang
Zoltan, what do you mean by this bug? 

1. you can get result by ('*omplicate*' in boolean mode)? I've tested in myisam, but got no result.
2. you can get result by ('*complicate*' in boolean mode)? yes, In innodb, we return syntax error, other than ignoring the leading '*' like myisam. I think it's the right behavior.

--modified.
[18 Jun 2013 15:14] Zoltan Fedor
Thanks Shaohua!

I meant #2, namely that

select *
from testinnodbfull
where match(content) against('*complicate*' in boolean mode);

returns two rows in MyISAM
does not return any value in InnoDB

Sorry for the confusion I caused!
[19 Jun 2013 1:14] Shaohua Wang
Zoltan,

Thanks for you quick reply!

I am still confused that why you put a leading '*' here? since it has no meaning.
[19 Jun 2013 2:24] Shaohua Wang
Zoltan,

our syntax check will regard this as a syntax error. And it serves no purpose, unless you have something in mind.
[20 Jun 2013 15:20] Zoltan Fedor
Hi Shaohua,
I didn't put the leading '*' in there, it is there for historical reasons.

Basically this is a legacy system created many years ago, first to search via " LIKE '%string%' ". Then as the dataset was growing the performance of the LIKE search became worse and worse, so this was changed to a 
match(content) against('*string*' in boolean mode)
where the * just replaced the earlier %.

Yes, the leading * might not made any difference in the result of the search, but MyISAM Fulltext engine allows it.

Then as the InnoDB engine became better and better we wanted to upgrade the whole system to use InnoDB tables, but couldn't due to the fulltext search not being available. Now InnoDB got fulltext search with version 5.6 and we thought that InnoDB's fulltext search is a drop-in replacement for MyISAM fulltext search.
Unfortunately due to this leading '*' InnoDB fulltext search is NOT a drop-in replacement, but requires change to the code.
Obviously we can make a change to the code (remove leading '*'), but also raised this bug pointing out that InnoDB fulltext engine handles leading '*' differently than MyIsam fulltext engine does. MyIsam ignores it, while InnoDB throws the error.
I believe InnoDB should handle it the same way as MyISAM does, so InnoDB fulltext search engine can really become a drop-in replacement for people migrating from MyISAM.
[24 Jun 2013 9:06] Shaohua Wang
Zoltan,

Understand, so do you want to ignore leading * in this case '*complicate' as well? I mean in your application, you only have cases like '*complicate*'.
[24 Jun 2013 13:55] Zoltan Fedor
Shaohua,

I would like InnoDB's fulltext search act exactly like MyISAM's fulltext search, so the two can be used as a drop-in replacement of each other without a code change required on application level.
If you search with '*complicated' for MyISAM with the above example then it brings back 2 results, while doing the same with InnoDB brings back nothing, so there too, there is some difference in how the leading '*' is being handled, which difference should be eliminated - the two should work the same way. As MyISAM's fulltext search is with us for long, InnoDB's fulltext search is the one which should be changed so it matches the syntax of MyISAM's.
[2 Jul 2013 15:49] Bugs System
Added a changelog entry for 5.6.13, 5.7.2:

"When running an "InnoDB" full-text search in boolean mode, prefixing an
asterisk ("*") to a search string ('*string') would result in an error
whereas for "MyISAM", a prefixed asterisk would be ignored. To ensure
compatibility between "InnoDB" and "MyISAM", "InnoDB" now handles a
prefixed asterisk in the same way as "MyISAM"."

Thank you for the bug report.
[25 Mar 2014 11:49] Oli Sennhauser
According to this bug report's last comment InnoDB FT search should behave the same since 5.6.13 like MyISAM FT search.

After we upgraded from 5.6.12 to 5.6.16 we got the following result:

select * from tArtikel ta where MATCH (ta.interpret,ta.titel,ta.label,ta.labelbez) AGAINST ('*+me*' in boolean mode);
ERROR 1064 (42000): syntax error, unexpected '+', expecting FTS_TERM or FTS_NUMB or '*'

This is NOT the expected drop in replacement for InnoDB FT search.

See also our test case in bug #72122