Bug #80432 No results in fulltext search for top level domain in domain part of email
Submitted: 18 Feb 2016 15:58 Modified: 16 Mar 2016 6:15
Reporter: Mev-Rael ‪ Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S1 (Critical)
Version:5.7.10 OS:Microsoft Windows (InnoDB, utf8mb4_unicode_ci)
Assigned to: CPU Architecture:Any
Tags: domain, email, escape, fulltext, statement

[18 Feb 2016 15:58] Mev-Rael ‪
Description:
Windows, MySQl 5.7.10, InnoDB full-text search, utf8mb4_unicode_ci

Users have email - foo.bar@domain.com

When perfoming boolean mode fulltext match against email
1) domain part is treated as a single word ("domain.com" but not two words "domain" and "com")
2) and when specifing full domain name no results shown.

P.S. There is also error when passing @ or other special characters to fulltext search string. Prepared statements and \ don't help. Because of that I am formatting string on backend with application code, replacing all non-alpha charachters to spaces and that why I have found '+foo* +bar* +domain* +com*' return no results, but '+foo* +bar* +domain*' does.

Syntax error or access violation: 1064 syntax error, unexpected '@', expecting $end

How to repeat:
CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  `last_name` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  `email` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `users_email_unique` (`email`),
  FULLTEXT KEY `first_last_name_email_fulltext` (`email`,`last_name`,`first_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

INSERT INTO `users` (`first_name`, `last_name`, `email`) VALUES ('John', 'Smith', 'foo.bar@domain.com');

1) select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+com*' IN BOOLEAN MODE);
// no results because com is part of domain.com and there are no words starting from com.

select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+bar*' IN BOOLEAN MODE);
// however email part before @ works ok. foo is one word and bar is another. This one query where I am searching by second word in foo.bar@domain.com works ok.

2)
select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+domain*' IN BOOLEAN MODE);
// this works ok

select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+domain.co*' IN BOOLEAN MODE);
// and this works ok

select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+domain.com*' IN BOOLEAN MODE);
// added last character to domain name and this doesn't work, 0 rows returned.

P.S. later I've changed .com to some cusotm domain like .tld and now it works as expected, I could find all rows by 'tld*'. Looks like MySQL is doing something with official top level domain names like .com

Suggested fix:
I think treating domain.com as two words (domain and com) like foo.bar would be easest approach. Is there any meaning for "." in fulltext search?

I also would like to be able passing @ and other special chars to fulltext search string.
[18 Feb 2016 19:04] Miguel Solorzano
Thank you for the bug report.

C:\dbs>c:\dbs\5.7\bin\mysql -uroot -p --port=3570 --prompt="mysql 5.7 > "
Enter password: ******
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.12 Source distribution PULL: 2016-FEB-06

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.7 > use test
Database changed
mysql 5.7 > CREATE TABLE `users` (
    ->   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    ->   `first_name` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
    ->   `last_name` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
    ->   `email` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   UNIQUE KEY `users_email_unique` (`email`),
    ->   FULLTEXT KEY `first_last_name_email_fulltext` (`email`,`last_name`,`first_name`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (2.30 sec)

mysql 5.7 >
mysql 5.7 > INSERT INTO `users` (`first_name`, `last_name`, `email`) VALUES ('John', 'Smith', 'foo.bar@domain.com');
Query OK, 1 row affected (0.03 sec)

mysql 5.7 > select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+com*' IN BOOLEAN MODE);
Empty set (0.09 sec)

mysql 5.7 > select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+bar*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+
1 row in set (0.00 sec)

mysql 5.7 > select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+domain*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+
1 row in set (0.00 sec)

mysql 5.7 > select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+domain.co*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+
1 row in set (0.00 sec)

mysql 5.7 > select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+domain.com*' IN BOOLEAN MODE);
Empty set (0.00 sec)

mysql 5.7 > alter table users engine MyISAM;
Query OK, 1 row affected (0.25 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql 5.7 > select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+domain.com*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+
1 row in set (0.02 sec)

mysql 5.7 > show create table users;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table

                                                                                                                                                |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------+
| users | CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  `last_name` varchar(32) COLLATE utf8mb4_unicode_ci NOT NULL,
  `email` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `users_email_unique` (`email`),
  FULLTEXT KEY `first_last_name_email_fulltext` (`email`,`last_name`,`first_name`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

mysql 5.7 >
[4 Mar 2016 11:35] Aditya A
Posted by developer:
 
Actually this is not a bug because "com" word is present in the stop word list .Please refer to 

http://dev.mysql.com/doc/refman/5.7/en/fulltext-stopwords.html

I ran the test with removing "com" as one of the stop words.

mysql>  select * from `users` where MATCH (email, last_name, first_name) AGAINST ('+com*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+
1 row in set (4.76 sec)

mysql> select * from `users` where MATCH (email, last_name, first_name) AGAINST
    -> ('+bar*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+
1 row in set (3.15 sec)

mysql> select * from `users` where MATCH (email, last_name, first_name) AGAINST
    -> ('+domain*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+

mysql> select * from `users` where MATCH (email, last_name, first_name) AGAINST
    -> ('+domain.co*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+

mysql> select * from `users` where MATCH (email, last_name, first_name) AGAINST
    -> ('+domain.com*' IN BOOLEAN MODE);
+----+------------+-----------+--------------------+
| id | first_name | last_name | email              |
+----+------------+-----------+--------------------+
|  1 | John       | Smith     | foo.bar@domain.com |
+----+------------+-----------+--------------------+
1 row in set (5.17 sec)

After removing from the stop word list "domain" and "com" are treated like different words.

Referring to your other question on w hy we cannot use "@" in searching pattern please refer

http://clustra.no.oracle.com/orabugs/bug.php?id=19673945

In light of these i am closing this as "not a bug".
[8 Mar 2016 11:37] Aditya A
Posted by developer:
 
the external bug number for above posted internal bug id is 
http://bugs.mysql.com/bug.php?id=74042
[9 Mar 2016 20:36] Mev-Rael ‪
I see, it's realy stopwords, however, for small data like users table I would like to remove stopwords and keep for text containing tables.

Is it possible to set different stopwords for different tables or any sql query flag/option to ignore stop words only in specifig queries? Something like SELECT SQL_CACHE id, name FROM customer?
[16 Mar 2016 6:15] Erlend Dahl
Posted by developer:

[15 Mar 2016 22:27] Aditya A

Though you cannot specify in the query to ignore stopwords ,
there is a easy workaround if you want different stop words
for different tables. The stop word table is taken into
consideration while creating the fts index,so in case of your
example set innodb_ft_server_stopword_table =
'test/stop_word_table_for_users'
and then create user table and then set
 innodb_ft_server_stopword_table = 'test/stop_word_table_for_text_tables'
and then create text tables. once created they no longer depend upon where
innodb_ft_server_stopword_table is
pointing to ,they always follow the stop word table they where created with
This will survive a restart too