Bug #74042 Fulltext query including @ symbol fails in boolean mode on innodb
Submitted: 23 Sep 2014 20:55 Modified: 22 Jan 2015 15:37
Reporter: Ike Walker Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6./5.7 OS:Any
Assigned to: CPU Architecture:Any

[23 Sep 2014 20:55] Ike Walker
Description:
Some fulltext queries that work on a MyISAM table fail on an InnoDB table. The specific case I have seen is that searching for a string containing the @ symbol fails on an InnoDB table with a fulltext index if the query is executed in boolean mode.

For example:

mysql> select * from at_symbol_test where match(str) against ('ggg@hhh' IN BOOLEAN MODE);
ERROR 1064 (42000): syntax error, unexpected '@', expecting $end

mysql> select * from at_symbol_test where match(str) against ('+twitter +@m*' IN BOOLEAN MODE);
ERROR 1064 (42000): syntax error, unexpected '@'

How to repeat:
-- create test table
drop table if exists at_symbol_test;
create table if not exists at_symbol_test (
  str varchar(255),
  fulltext index fti_index (str)
) engine = innodb;

-- insert some test data
insert into at_symbol_test values ('aaa bbb ccc');
insert into at_symbol_test values ('ccc ddd eee');
insert into at_symbol_test values ('fff ggg hhh');
insert into at_symbol_test values ('fff ggg@hhh iii');
insert into at_symbol_test values ('hello C@S world');
insert into at_symbol_test values ('twitter @mysql');
insert into at_symbol_test values ('twitter @oracle');

-- these queries succeed
select * from at_symbol_test where match(str) against ('ggg' IN BOOLEAN MODE);
select * from at_symbol_test where match(str) against ('ggg@hhh');

-- these queries fail
select * from at_symbol_test where match(str) against ('ggg@hhh' IN BOOLEAN MODE);
select * from at_symbol_test where match(str) against ('+twitter +@m*' IN BOOLEAN MODE);

-- now switch the table to MyISAM
alter table at_symbol_test engine = myisam;

-- In MYISAM all 4 queries succeed
select * from at_symbol_test where match(str) against ('ggg' IN BOOLEAN MODE);
select * from at_symbol_test where match(str) against ('ggg@hhh');
select * from at_symbol_test where match(str) against ('ggg@hhh' IN BOOLEAN MODE);
select * from at_symbol_test where match(str) against ('+twitter +@m*' IN BOOLEAN MODE);
[23 Sep 2014 22:34] MySQL Verification Team
C:\dbs>net start mysqld56
The MySQLD56 service is starting..
The MySQLD56 service was started successfully.

C:\dbs>56

C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --debug-info --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.22 Source distribution

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > USE test
Database changed
mysql 5.6 > drop table if exists at_symbol_test;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql 5.6 > create table if not exists at_symbol_test (
    ->   str varchar(255),
    ->   fulltext index fti_index (str)
    -> ) engine = innodb;
Query OK, 0 rows affected (3.49 sec)

mysql 5.6 > insert into at_symbol_test values ('aaa bbb ccc');
Query OK, 1 row affected (0.06 sec)

mysql 5.6 > insert into at_symbol_test values ('ccc ddd eee');
Query OK, 1 row affected (0.01 sec)

mysql 5.6 > insert into at_symbol_test values ('fff ggg hhh');
Query OK, 1 row affected (0.06 sec)

mysql 5.6 > insert into at_symbol_test values ('fff ggg@hhh iii');
Query OK, 1 row affected (0.03 sec)

mysql 5.6 > insert into at_symbol_test values ('hello C@S world');
Query OK, 1 row affected (0.02 sec)

mysql 5.6 > insert into at_symbol_test values ('twitter @mysql');
Query OK, 1 row affected (0.03 sec)

mysql 5.6 > insert into at_symbol_test values ('twitter @oracle');
Query OK, 1 row affected (0.03 sec)

mysql 5.6 > select * from at_symbol_test where match(str) against ('ggg@hhh' IN BOOLEAN MODE);
ERROR 1064 (42000): syntax error, unexpected '@', expecting $end
mysql 5.6 > select * from at_symbol_test where match(str) against ('+twitter +@m*' IN BOOLEAN MODE);
ERROR 1064 (42000): syntax error, unexpected '@'
mysql 5.6 > alter table at_symbol_test engine = myisam;
Query OK, 7 rows affected (0.44 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql 5.6 > SHOW CREATE TABLE at_symbol_test\G
*************************** 1. row ***************************
       Table: at_symbol_test
Create Table: CREATE TABLE `at_symbol_test` (
  `str` varchar(255) DEFAULT NULL,
  FULLTEXT KEY `fti_index` (`str`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.02 sec)

mysql 5.6 > select * from at_symbol_test where match(str) against ('ggg@hhh' IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql 5.6 > select * from at_symbol_test where match(str) against ('+twitter +@m*' IN BOOLEAN MODE);
+-----------------+
| str             |
+-----------------+
| twitter @mysql  |
| twitter @oracle |
+-----------------+
2 rows in set (0.01 sec)

mysql 5.6 >
[23 Sep 2014 22:37] MySQL Verification Team
mysql 5.7 > -- these queries fail
mysql 5.7 > select * from at_symbol_test where match(str) against ('ggg@hhh' IN BOOLEAN MODE);
ERROR 1064 (42000): syntax error, unexpected '@', expecting $end
mysql 5.7 > select * from at_symbol_test where match(str) against ('+twitter +@m*' IN BOOLEAN MODE);
ERROR 1064 (42000): syntax error, unexpected '@'
mysql 5.7 > SHOW VARIABLES LIKE "%VERSION%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| innodb_version          | 5.7.6               |
| protocol_version        | 10                  |
| slave_type_conversions  |                     |
| version                 | 5.7.6-m16           |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | Win64               |
+-------------------------+---------------------+
7 rows in set (0.00 sec)

mysql 5.7 >
[23 Sep 2014 22:53] MySQL Verification Team
Thank you for the bug report.
[23 Sep 2014 22:58] MySQL Verification Team
Please see http://bugs.mysql.com/bug.php?id=72605.
[23 Sep 2014 23:07] Ike Walker
Thanks for the link. I see that @ is used to search for words a certain distance apart in InnoDB:

"@distance

This operator works on InnoDB tables only. It tests whether two or more words all start within a specified distance from each other, measured in words. Specify the search words within a double-quoted string immediately before the @distance operator, for example, MATCH(col1) AGAINST('"word1 word2 word3" @8' IN BOOLEAN MODE)"
[23 Sep 2014 23:08] Ike Walker
Is it possible to search for a literal "@" character using FULLTEXT search in InnoDB IN BOOLEAN MODE?
[23 Dec 2014 20:48] Sveta Smirnova
Actually it is not reliable to search for "@" symbol with MyISAM too.

mysql> drop table if exists at_symbol_test;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create table if not exists at_symbol_test (
    ->   str varchar(255),
    ->   fulltext index fti_index (str)
    -> ) engine = myisam;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into at_symbol_test values ('twitter @mysql');
Query OK, 1 row affected (0.00 sec)

mysql> insert into at_symbol_test values ('twitter @oracle');
Query OK, 1 row affected (0.00 sec)

mysql> insert into at_symbol_test values ('twitter mysql');
Query OK, 1 row affected (0.00 sec)

mysql> insert into at_symbol_test values ('twitter oracle');
Query OK, 1 row affected (0.00 sec)

mysql> select * from at_symbol_test where match(str) against ('+twitter +@*' IN BOOLEAN MODE);
+-----------------+
| str             |
+-----------------+
| twitter @mysql  |
| twitter @oracle |
| twitter mysql   |
| twitter oracle  |
+-----------------+
4 rows in set (0.00 sec)

mysql> select * from at_symbol_test where match(str) against ('+twitter +@m*' IN BOOLEAN MODE);
+-----------------+
| str             |
+-----------------+
| twitter @mysql  |
| twitter @oracle |
| twitter mysql   |
| twitter oracle  |
+-----------------+
4 rows in set (0.00 sec)

You see: all rows returned, independently if they have "@" sign or not.
[22 Jan 2015 15:37] Daniel Price
Posted by developer:
 
Added the following statement to:

http://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html
http://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

"InnoDB full-text search does not support the use of the "@" symbol in
boolean full-text searches. The "@" symbol is reserved for use by the
"@distance" proximity search operator."
[3 Jan 2018 16:57] teo teo
This is ridiculous!

If some character has a special meaning, then a way MUST be provided to escape it. That is an obvious basic requirement in ANY language.
And there's no way to escape a "@" in a match..against boolean query, or if there is, it's not documented. That IS a bug.

This bug should be reopened.
[28 Nov 2018 14:00] Ole Martin Handeland
I originally agreed completely with teo teo above, but after some further research I see that escaping the @ character does not make sense. The fulltext index operates on words, and (I'm guessing) will not index the @ character as it is not considered a word character. This is the default setting at least [1], and if the @ character isn't ever in the index it doesn't make sense to do a literal search for it either. I've replaced the @ character (and all other non-word characters) in our search input with spaces now, as there's no point searching for something that's not in the index. This will of course look strange when searching for email@example.org and getting results for something that contained those words, but not in that order (as the search string will be converted to "email example org"), but quoting the search string will produce results that look more like what the user is looking for.
[28 Nov 2018 23:37] teo teo
> I see that escaping the @ character does not make sense. 

That's not true. It may not yield results, but that doesn't mean it doesn't make sense.

> The fulltext index operates on words, and (I'm guessing) will not index the @ character as it is not considered a word character. This is the default setting at least [1], 

That default can be changed. https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html#fulltext-modify-characte...
The set of characters that are considered word characters can be changed. And if "@" can't be added, then that's another bug (or wrong design decision).

You should be allowed to have "@" be considered a word character, and hence to index words containing it, and hence to search for words containing it, by escaping it.

> and if the @ character isn't ever in the index it doesn't make sense to do a literal search for it either

And even in that case, you should be allowed to escape it and get a valid query that doesn't get any result but doesn't produce a syntax error either.