| Bug #106569 | Full-text search get stuck at "FULLTEXT INITIALIZATION" stage for exact phrases | ||
|---|---|---|---|
| Submitted: | 25 Feb 2022 7:35 | Modified: | 25 Feb 2022 10:25 |
| Reporter: | Pratyush Mittal | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: FULLTEXT search | Severity: | S2 (Serious) |
| Version: | 8.0.28, 5.7.37 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | Boolean Full-Text Search, full text | ||
[25 Feb 2022 7:40]
Pratyush Mittal
Updated the title
[25 Feb 2022 9:40]
MySQL Verification Team
Hello Pratyush Mittal, Thank you for the report and test case. regards, Umesh
[25 Feb 2022 9:41]
MySQL Verification Team
- 8.0.28
mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
-> FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%limit 25%'
-> ;
+----------+----------+------------------------------------------------------------------------------------------------+
| EVENT_ID | Duration | SQL_TEXT |
+----------+----------+------------------------------------------------------------------------------------------------+
| 458 | 0.5537 | SELECT *
from posts
where MATCH (opening_line) AGAINST ('+help +bar' IN BOOLEAN MODE)
LIMIT 25 |
| 478 | 7.7637 | SELECT *
from posts
where MATCH (opening_line) AGAINST ('"help bar"' IN BOOLEAN MODE)
LIMIT 25 |
+----------+----------+------------------------------------------------------------------------------------------------+
2 rows in set (0.01 sec)
mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
-> FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=458;
+------------------------------------------------+----------+
| Stage | Duration |
+------------------------------------------------+----------+
| stage/sql/starting | 0.0001 |
| stage/sql/Executing hook on transaction begin. | 0.0000 |
| stage/sql/starting | 0.0000 |
| stage/sql/checking permissions | 0.0000 |
| stage/sql/Opening tables | 0.0000 |
| stage/sql/init | 0.0000 |
| stage/sql/System lock | 0.0000 |
| stage/sql/optimizing | 0.0000 |
| stage/sql/statistics | 0.0000 |
| stage/sql/preparing | 0.0000 |
| stage/sql/FULLTEXT initialization | 0.5351 |
| stage/sql/executing | 0.0130 |
| stage/sql/end | 0.0000 |
| stage/sql/query end | 0.0000 |
| stage/sql/waiting for handler commit | 0.0000 |
| stage/sql/closing tables | 0.0000 |
| stage/sql/freeing items | 0.0053 |
| stage/sql/cleaning up | 0.0000 |
+------------------------------------------------+----------+
18 rows in set (0.00 sec)
mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=478;
+------------------------------------------------+----------+
| Stage | Duration |
+------------------------------------------------+----------+
| stage/sql/starting | 0.0054 |
| stage/sql/Executing hook on transaction begin. | 0.0000 |
| stage/sql/starting | 0.0000 |
| stage/sql/checking permissions | 0.0000 |
| stage/sql/Opening tables | 0.0000 |
| stage/sql/init | 0.0000 |
| stage/sql/System lock | 0.0000 |
| stage/sql/optimizing | 0.0000 |
| stage/sql/statistics | 0.0000 |
| stage/sql/preparing | 0.0000 |
| stage/sql/FULLTEXT initialization | 7.7249 |
| stage/sql/executing | 0.0269 |
| stage/sql/end | 0.0000 |
| stage/sql/query end | 0.0000 |
| stage/sql/waiting for handler commit | 0.0000 |
| stage/sql/closing tables | 0.0000 |
| stage/sql/freeing items | 0.0062 |
| stage/sql/cleaning up | 0.0000 |
+------------------------------------------------+----------+
18 rows in set (0.00 sec)
[25 Feb 2022 10:25]
Pratyush Mittal
Thanks Umesh for verifying it. The problem with phrase search is pretty widespread. There are lots of questions around it on StackOverflow. Sharing some of them for reference: - https://dba.stackexchange.com/q/87089/36350 - https://dba.stackexchange.com/q/72188/36350 - https://stackoverflow.com/q/24939527/232649

Description: The phrase (double quote) searches in full-text search are pretty prone to errors. They take a long time and are often stuck in the "FULLTEXT INITIALIZATION" stage. **Setting max_execution_time has no effect in such cases.** The problem with phrase searches arises when: 1. The data size is huge (100,000+) 2. and the searched phrase is present in most of the rows How to repeat: -- create a dummy table CREATE TABLE `posts` ( `id` INT NOT NULL AUTO_INCREMENT, `opening_line` TEXT CHARACTER SET 'utf8mb4', PRIMARY KEY (`id`), FULLTEXT idx (opening_line) ) ENGINE = InnoDB; -- insert initial 5 row INSERT INTO posts (opening_line) VALUES ('Please help foo'), ('Please help bar'), ('Please help baz'), ('Please help foobar'), ('Please help foobaz'); -- repeat insertion 16 times to create 100,000 rows INSERT INTO posts (opening_line) SELECT posts.opening_line FROM posts; -- try full text searches SELECT * from posts where MATCH (opening_line) AGAINST ('+help +bar' IN BOOLEAN MODE) LIMIT 25; -- returns in 0.278s SELECT * from posts where MATCH (opening_line) AGAINST ('"help bar"' IN BOOLEAN MODE) LIMIT 25; -- takes over 5 seconds -- gets stuck at FULLTEXT initialization stage for larger data SHOW PROFILE; -- 'starting','0.000111' -- Executing hook on transaction ','0.000006' -- 'starting','0.000012' -- 'checking permissions','0.000012' -- 'Opening tables','0.000059' -- 'init','0.000008' -- 'System lock','0.000012' -- 'optimizing','0.000015' -- 'statistics','0.000038' -- 'preparing','0.000009' -- 'FULLTEXT initialization','5.292861' -- 'executing','0.011673' -- 'end','0.000005' -- 'query end','0.000002' -- 'waiting for handler commit','0.000006' -- 'closing tables','0.000011' -- 'freeing items','0.003400' -- 'cleaning up','0.000017' Suggested fix: The problem happens probably because the system first tries to filter documents relevant for phrase search. It scans through documents for words as well as their positions. This is why the query gets stuck in the "FULLTEXT INITIALIZATION" stage. A better solution would be to first use the index to search the rows with matching words. Irrespective of their positions. And then try to filter rows based on proximity. This suggestion is based on the observation that LIKE phrase searches are much faster after filtering initial results: -- filtering for phrases using LIKE works much better after we have initial results SELECT * from posts where MATCH (opening_line) AGAINST ('+help +bar"' IN BOOLEAN MODE) AND opening_line LIKE '%help bar%' LIMIT 25; -- takes 0.278s