Bug #113846 Fulltext Index has Missing Rows If They are from Concurrent Inserts
Submitted: 1 Feb 2024 8:11 Modified: 2 Feb 2024 11:54
Reporter: Sunny Chung Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[1 Feb 2024 8:11] Sunny Chung
Description:
When rows are inserted concurrently, fulltext index would not index all those rows. The index does not repair over time. The ngram parser was used.

The issue does not happen if rows are inserted sequentially with only one concurrent connection.

In real use cases, data usually coming in concurrently. Missing lots of data means this functionality cannot be used at all.

How to repeat:
Please run this reproducer: https://github.com/sunny-chung/mysql-fulltext-index-incomplete-on-concurrent-inserts

The steps are documented in README.md.

This is not a SQL-only reproducer, because I could not simulate concurrent writes with only SQL statements. But this reproducer would print all SQLs used to the console.

In the reproducer, 50 records were inserted concurrently every time, and then a fulltext query is performed to query back all those records. It is expected to retrieve back all 50 records, but in the test it sometimes returns 33 records, sometimes 44, and never return all.

Please let me know if anything extra needed.

Suggested fix:
Perhaps a lock can be enforced while updating the fulltext index?
[1 Feb 2024 11:54] MySQL Verification Team
Hi Mr. Chung,

Thank you for your bug report.

However, we do need a test case in SQL.

We can also accept a test case as a shell script. We can also accept a test case in multi-threaded execution.

We have several tools that we can use for testing.

Hence, take a look at our mysqlslap client program and even sysbench and see if you can come up with a repeatable test case.

This way, we can not tell whether this is a bug in C/J or our Docker Operator or in the server.

Can't repeat.
[2 Feb 2024 8:33] Sunny Chung
Thanks for letting me know the tools. I have translated the reproducer into a sh script.

In the sh version, the issue is much less serious, but still exists. 5000 relevant rows were inserted, and only 4994 rows can be queried back via the fulltext index.

Same MySQL server setting as the docker compose file was used, i.e. the command options are --ft_stopword_file="" --ngram_token_size=1.
[2 Feb 2024 8:33] Sunny Chung
Reproducer

Attachment: test.sh (application/x-sh, text), 23.45 KiB.

[2 Feb 2024 11:54] MySQL Verification Team
Hi Mr. Chung,

Thank you for your new test case.

We managed to repeat the behaviour that you reported:

Preparing
Start

Benchmark
	Average number of seconds to run all queries: 0.001 seconds
	Minimum number of seconds to run all queries: 0.000 seconds
	Maximum number of seconds to run all queries: 0.020 seconds
	Number of clients running queries: 50
	Average number of queries per client: 1

Finish insert.

count
0

We even have added a sleep of 4 seconds before SELECT , but still got the same results.

This is now a verified bug report.

Do let us inform you that this may turn out as a documentation bug and not as a code bug, but it is on others to decide on that.

Thank you for your effort.
[1 Apr 2024 1:50] Chahatpreet Grewal
Search results could be improved if we maximize the resource for generating auxiliary index tables for full text index search. Using the following parameters:

innodb_ft_total_cache_size = 1600000000
innodb_ft_sort_pll_degree = 12

This indicates that issue might be in building the auxiliary index table itself.
[2 Apr 2024 10:16] MySQL Verification Team
Thank you, Mr. Grewal.

Your comment will be copied to our internal bug database.