Description:
Hi,
as explained in my blog post [1], Duplicate Check is a multithreaded process that uses at most 2 threads per core, and at most 16 threads overall. See function fil_get_scan_threads for details (call: [2]; definition: [3]; THREADS_PER_CORE limit: [4]; THREADS_PER_CORE definition: [5]; MAX_THREADS limit: [6]; MAX_THREADS definition: [7]).
[1]: https://jfg-mysql.blogspot.com/2024/11/understanding-innodb-tablespace-duplicate-check.htm...
[2]: https://github.com/mysql/mysql-server/blob/mysql-9.1.0/storage/innobase/fil/fil0fil.cc#L11...
[3]: https://github.com/mysql/mysql-server/blob/mysql-9.1.0/storage/innobase/fil/fil0fil.cc#L13...
[4]: https://github.com/jfg956/mysql-server/blob/mysql-9.1.0/storage/innobase/fil/fil0fil.cc#L1...
[5]: https://github.com/jfg956/mysql-server/blob/mysql-9.1.0/storage/innobase/include/fil0fil.h...
[6]: https://github.com/mysql/mysql-server/blob/mysql-9.1.0/storage/innobase/fil/fil0fil.cc#L15...
[7]: https://github.com/mysql/mysql-server/blob/mysql-9.1.0/storage/innobase/include/fil0fil.h#...
I claim that 2 threads per core and 16 threads overall is far from optimal. Because computing an optimal is complex, this value should be configurable. Some justification for this claim in How to repeat. I will soon provide a patch and show test results.
Many thanks for looking into this (and if not "verifiable" in the current state, keeping opened in "needs feedback" while I am completing the patch),
Jean-François Gagné
How to repeat:
Presented in [1.1], we have below System Analysis for Duplicate Check.
[1.1]: https://jfg-mysql.blogspot.com/2024/11/understanding-innodb-tablespace-duplicate-check.htm...
> for a disk capacity of c (example 64k iops) and an IO latency of l (example 0.5 ms), c*l threads are needed (32 for our example) to saturate IO capacity
Above already gives quite a realistic example where more than 16 threads would lead to a quicker startup.
Another example is below (quote from [1]), where more than 2 threads per core should give a potentially quicker startup (few cores and lots of iops).
> t threads (example 4) for an IO latency of l (example 0.5 ms) are able to use t/l iops (8000 for our example)
>
> with a latency of 0.5 ms and 2 cores (so 4 threads), we would only be able to use 8000 iops, while AWS gp3 EBS volumes can scale up to 16k iops