Bug #95434 Order of terms in boolean mode fulltext search affects the result set
Submitted: 20 May 2019 21:09 Modified: 21 May 2019 6:00
Reporter: Denis Fatkhudinov Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:Aurora MySQL 5.7.12, 5.6.44, 5.7.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: fulltext search index

[20 May 2019 21:09] Denis Fatkhudinov
Description:
Depending on the sequence of terms in the boolean mode different results are returned. Wildcard and double quotes are used in the query.

How to repeat:
SQL to create a table:

create table df_test
(
  t1 text,
  t2 text,
  FULLTEXT df_test_idx (t1 ASC, t2 ASC)
);

Insert test data statement:

INSERT INTO df_test
VALUES ('1word2 1sec71 5word5', 'word2 1sec8 6word6'),
       ('word1 sec8 5word5', 'word2 sec7 word6'),
       ('word1 sec8 5word5', 'word2 1sec17 word6'),
       ('word1 sec7 5word5', 'word2 1sec8 6word6'),
       ('word1 sec7 5word5', 'word2 sec8 word6'),
       ('word1 sec7 5word5', 'word2 sec7 word6'),
       ('sometrash9', 'sometrash10'),
       ('sometrash7', 'sometrash8'),
       ('sometrash5', 'sometrash6'),
       ('sometrash3', 'sometrash4'),
       ('sometrash13', 'sometrash14'),
       ('sometrash11', 'sometrash12'),
       ('sometrash1', 'sometrash2')

The query select * from df_test where match(t1, t2) against('6word*' IN BOOLEAN MODE) returns 2 rows, which is expected
t1                    | t2
-------------------------------------------
1word2 1sec71 5word5  | word2 1sec8 6word6
word1 sec7 5word5     | word2 1sec8 6word6

The query select * from df_test where match(t1, t2) against('"word1 sec8"' IN BOOLEAN MODE) also returns 2 rows, which is also expected
t1                    | t2
-------------------------------------------
word1 sec8 5word5     | word2 sec7 word6
word1 sec8 5word5     | word2 1sec17 word6

The query select * from df_test where match(t1, t2) against('"word1 sec8" 6word*' IN BOOLEAN MODE) is expected to merge the results but it returns the same 2 rows as above

t1                    | t2
-------------------------------------------
word1 sec8 5word5     | word2 sec7 word6
word1 sec8 5word5     | word2 1sec17 word6

Nevertheless, if I change the sequence of patterns, it returns as expected. The query select * from df_test where match(t1, t2) against('6word*"word1 sec8"' IN BOOLEAN MODE). The result:

t1                    | t2
-------------------------------------------
1word2 1sec71 5word5  | word2 1sec8 6word6
word1 sec7 5word5     | word2 1sec8 6word6
word1 sec8 5word5     | word2 sec7 word6
word1 sec8 5word5     | word2 1sec17 word6
[20 May 2019 21:10] Denis Fatkhudinov
Update: INNODB is used
[21 May 2019 6:00] Umesh Shastry
Hello Denis,

Thank you for the report and test case.

regards,
Umesh