Bug #94709 Regression behavior for full text index
Submitted: 19 Mar 2019 19:41 Modified: 20 Mar 2019 7:57
Reporter: Carlos Tutte Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.7.21,5.7.25, 8.0.15, 5.6.43 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[19 Mar 2019 19:41] Carlos Tutte
Description:
Creating a table with full text index gives different query result than adding fulltext index after table creation when using that index.

The behavior changed from 5.7.20 to 5.7.21

How to repeat:
The following 2 test scenarios give 1 result each for MySQL versions up to 5.7.20, and only 1st test case returns a result after 5.7.21

---------------------------------------------------------- 

use test;

----------------------------------------------------------
------------------------ 1 result ------------------------

DROP TABLE IF EXISTS `posts_translations_copy2`;

CREATE TABLE `posts_translations_copy2` ( 
`id` int(11) unsigned NOT NULL, 
`content` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin, 
PRIMARY KEY (`id`), 
FULLTEXT KEY `searchablepc` (`content`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `posts_translations_copy2` (`id`, `content`) 
VALUES 
(1,'content Spring fling');

SELECT count(*) 
FROM `test`.`posts_translations_copy2` `PostsTranslations` 
USE INDEX (searchablepc) 
WHERE ((MATCH(PostsTranslations.content) AGAINST('"Spring fling"' in boolean mode))) 
ORDER BY `PostsTranslations`.`id` DESC LIMIT 20 OFFSET 0;
 
----------------------------------------------------------
------------------------ 0 result ------------------------

DROP TABLE IF EXISTS `posts_translations_copy2`;

CREATE TABLE `posts_translations_copy2` ( 
`id` int(11) unsigned NOT NULL, 
`content` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin, 
PRIMARY KEY (`id`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `posts_translations_copy2` (`id`, `content`) 
VALUES 
(1,'content Spring fling');

CREATE FULLTEXT INDEX searchablepc ON posts_translations_copy2(content);

SELECT count(*) 
FROM `test`.`posts_translations_copy2` `PostsTranslations` 
USE INDEX (searchablepc) 
WHERE ((MATCH(PostsTranslations.content) AGAINST('"Spring fling"' in boolean mode))) 
ORDER BY `PostsTranslations`.`id` DESC LIMIT 20 OFFSET 0;
------------------------------------------------------------

Suggested fix:
This is a regression as it changes how query behaves. I suggest returning same result as prior versions for version >= MySQL 5.7.21 , or add a documentation bug showing this incompatible change.
[20 Mar 2019 7:57] MySQL Verification Team
Hello Carlos Tutte,

Thank you for the report and test case.

Thanks,
Umesh
[20 Mar 2019 8:00] Nikolai Ikhalainen
Hi,

5.7.21+ contains the fix:
InnoDB: Full-text search on indexed columns that use a binary collation did not return case-sensitive matches. (Bug #21625016, Bug #78048)

SET GLOBAL innodb_ft_aux_table='test/posts_translations_copy2';
mysql> select * from  INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE where word like '%ring';
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| Spring |            2 |           2 |         1 |      2 |        8 |

In 5.7.20:
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| spring |            2 |           2 |         1 |      2 |        8 |

After
alter table posts_translations_copy2 modify column content mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| spring |            2 |           2 |         1 |      2 |        8 |

Even after set names utf8mb4 COLLATE utf8mb4_bin;

It looks like a full text search in boolean mode for double quotes is trying to find lowercase phrase even if we are providing a correct case in AGAINST.