Bug #106569 Full-text search get stuck at "FULLTEXT INITIALIZATION" stage for exact phrases
Submitted: 25 Feb 2022 7:35 Modified: 25 Feb 2022 10:25
Reporter: Pratyush Mittal Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:8.0.28, 5.7.37 OS:Any
Assigned to: CPU Architecture:Any
Tags: Boolean Full-Text Search, full text

[25 Feb 2022 7:35] Pratyush Mittal
Description:
The phrase (double quote) searches in full-text search are pretty prone to errors.
They take a long time and are often stuck in the "FULLTEXT INITIALIZATION" stage.

**Setting max_execution_time has no effect in such cases.**

The problem with phrase searches arises when:
1. The data size is huge (100,000+)
2. and the searched phrase is present in most of the rows

How to repeat:
-- create a dummy table
CREATE TABLE `posts` (
	`id` INT NOT NULL AUTO_INCREMENT,
	`opening_line` TEXT CHARACTER SET 'utf8mb4',
	PRIMARY KEY (`id`),
	FULLTEXT idx (opening_line)
) ENGINE = InnoDB;

-- insert initial 5 row
INSERT INTO posts (opening_line)
VALUES ('Please help foo'), ('Please help bar'), ('Please help baz'), ('Please help foobar'), ('Please help foobaz');

-- repeat insertion 16 times to create 100,000 rows
INSERT INTO posts (opening_line) SELECT posts.opening_line FROM posts;

-- try full text searches
SELECT *
from posts
where MATCH (opening_line) AGAINST ('+help +bar' IN BOOLEAN MODE)
LIMIT 25;
-- returns in 0.278s

SELECT *
from posts
where MATCH (opening_line) AGAINST ('"help bar"' IN BOOLEAN MODE)
LIMIT 25;
-- takes over 5 seconds
-- gets stuck at FULLTEXT initialization stage for larger data

SHOW PROFILE;
-- 'starting','0.000111'
-- Executing hook on transaction ','0.000006'
-- 'starting','0.000012'
-- 'checking permissions','0.000012'
-- 'Opening tables','0.000059'
-- 'init','0.000008'
-- 'System lock','0.000012'
-- 'optimizing','0.000015'
-- 'statistics','0.000038'
-- 'preparing','0.000009'
-- 'FULLTEXT initialization','5.292861'
-- 'executing','0.011673'
-- 'end','0.000005'
-- 'query end','0.000002'
-- 'waiting for handler commit','0.000006'
-- 'closing tables','0.000011'
-- 'freeing items','0.003400'
-- 'cleaning up','0.000017'

Suggested fix:
The problem happens probably because the system first tries to filter documents relevant for phrase search.
It scans through documents for words as well as their positions.
This is why the query gets stuck in the "FULLTEXT INITIALIZATION" stage.

A better solution would be to first use the index to search the rows with matching words. Irrespective of their positions. And then try to filter rows based on proximity.

This suggestion is based on the observation that LIKE phrase searches are much faster after filtering initial results:

-- filtering for phrases using LIKE works much better after we have initial results
SELECT *
from posts
where MATCH (opening_line) AGAINST ('+help +bar"' IN BOOLEAN MODE)
	AND opening_line LIKE '%help bar%'
LIMIT 25;
-- takes 0.278s
[25 Feb 2022 7:40] Pratyush Mittal
Updated the title
[25 Feb 2022 9:40] MySQL Verification Team
Hello Pratyush Mittal,

Thank you for the report and test case.

regards,
Umesh
[25 Feb 2022 9:41] MySQL Verification Team
- 8.0.28

mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
    ->        FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%limit 25%'
    -> ;
+----------+----------+------------------------------------------------------------------------------------------------+
| EVENT_ID | Duration | SQL_TEXT                                                                                       |
+----------+----------+------------------------------------------------------------------------------------------------+
|      458 |   0.5537 | SELECT *
from posts
where MATCH (opening_line) AGAINST ('+help +bar' IN BOOLEAN MODE)
LIMIT 25 |
|      478 |   7.7637 | SELECT *
from posts
where MATCH (opening_line) AGAINST ('"help bar"' IN BOOLEAN MODE)
LIMIT 25 |
+----------+----------+------------------------------------------------------------------------------------------------+
2 rows in set (0.01 sec)

mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
    ->        FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=458;
+------------------------------------------------+----------+
| Stage                                          | Duration |
+------------------------------------------------+----------+
| stage/sql/starting                             |   0.0001 |
| stage/sql/Executing hook on transaction begin. |   0.0000 |
| stage/sql/starting                             |   0.0000 |
| stage/sql/checking permissions                 |   0.0000 |
| stage/sql/Opening tables                       |   0.0000 |
| stage/sql/init                                 |   0.0000 |
| stage/sql/System lock                          |   0.0000 |
| stage/sql/optimizing                           |   0.0000 |
| stage/sql/statistics                           |   0.0000 |
| stage/sql/preparing                            |   0.0000 |
| stage/sql/FULLTEXT initialization              |   0.5351 |
| stage/sql/executing                            |   0.0130 |
| stage/sql/end                                  |   0.0000 |
| stage/sql/query end                            |   0.0000 |
| stage/sql/waiting for handler commit           |   0.0000 |
| stage/sql/closing tables                       |   0.0000 |
| stage/sql/freeing items                        |   0.0053 |
| stage/sql/cleaning up                          |   0.0000 |
+------------------------------------------------+----------+
18 rows in set (0.00 sec)

mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration        FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=478;
+------------------------------------------------+----------+
| Stage                                          | Duration |
+------------------------------------------------+----------+
| stage/sql/starting                             |   0.0054 |
| stage/sql/Executing hook on transaction begin. |   0.0000 |
| stage/sql/starting                             |   0.0000 |
| stage/sql/checking permissions                 |   0.0000 |
| stage/sql/Opening tables                       |   0.0000 |
| stage/sql/init                                 |   0.0000 |
| stage/sql/System lock                          |   0.0000 |
| stage/sql/optimizing                           |   0.0000 |
| stage/sql/statistics                           |   0.0000 |
| stage/sql/preparing                            |   0.0000 |
| stage/sql/FULLTEXT initialization              |   7.7249 |
| stage/sql/executing                            |   0.0269 |
| stage/sql/end                                  |   0.0000 |
| stage/sql/query end                            |   0.0000 |
| stage/sql/waiting for handler commit           |   0.0000 |
| stage/sql/closing tables                       |   0.0000 |
| stage/sql/freeing items                        |   0.0062 |
| stage/sql/cleaning up                          |   0.0000 |
+------------------------------------------------+----------+
18 rows in set (0.00 sec)
[25 Feb 2022 10:25] Pratyush Mittal
Thanks Umesh for verifying it.

The problem with phrase search is pretty widespread. There are lots of questions around it on StackOverflow. Sharing some of them for reference:
- https://dba.stackexchange.com/q/87089/36350
- https://dba.stackexchange.com/q/72188/36350
- https://stackoverflow.com/q/24939527/232649