MySQL Bugs: #113482: innodb_parallel_read_threads>1 makes simple select count more expensive

Bug #113482	innodb_parallel_read_threads>1 makes simple select count more expensive
Submitted:	21 Dec 2023 9:23	Modified:	24 Jun 2024 20:52
Reporter:	Przemyslaw Malkowski	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	8.0.35, 8.2.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	innodb_parallel_read_threads, select count

Description:
While the parallel reads feature was supposed to speed up simple count selects (WL#11720), in fact, it does the opposite in some very simple use cases. 
With a generic sysbench table of 10M rows, the select count(*) query is way more expensive with more than one innodb_parallel_read_threads. 
For example, despite the fact that 4 CPU cores are used at 100% during the execution with the default innodb_parallel_read_threads=4, the query takes over two times longer than with a setting of "1" where just a single core is utilized! With two threads, it's just terribly slow, and it's even slower with eight threads, while the CPU has 16 CPU cores:

mysql > set innodb_parallel_read_threads=1;
Query OK, 0 rows affected (0.00 sec)

mysql > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (3.07 sec)

mysql > set innodb_parallel_read_threads=8;
Query OK, 0 rows affected (0.00 sec)

mysql > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (4.03 sec)

How to repeat:
Setup test MySQL instance with all default settings, and populate one table with rows:
$ sysbench /usr/share/sysbench/oltp_common.lua --mysql_storage_engine=innodb --tables=1 --table-size=10000000 --mysql-db=db1 --mysql-user=msandbox --mysql-password=msandbox --db-driver=mysql --mysql-socket=/tmp/mysql_sandbox8200.sock --mysql-host=localhost prepare
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Initializing worker threads...

Creating table 'sbtest1'...
Inserting 10000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...

Tested with:

mysql > select @@version,@@innodb_read_io_threads,@@innodb_use_native_aio,@@innodb_parallel_read_threads;
+-----------+--------------------------+-------------------------+--------------------------------+
| @@version | @@innodb_read_io_threads | @@innodb_use_native_aio | @@innodb_parallel_read_threads |
+-----------+--------------------------+-------------------------+--------------------------------+
| 8.2.0     |                        4 |                       1 |                              4 |
+-----------+--------------------------+-------------------------+--------------------------------+
1 row in set (0.00 sec)

Test run:

$ for i in {1..8}; do msb_8_2_0/use -vvv db1 -e "SET innodb_parallel_read_threads=$i; select count(*) from sbtest1;"; done
--------------
SET innodb_parallel_read_threads=1
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (3.25 sec)

Bye
--------------
SET innodb_parallel_read_threads=2
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (14.14 sec)

Bye
--------------
SET innodb_parallel_read_threads=3
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (9.38 sec)

Bye
--------------
SET innodb_parallel_read_threads=4
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (7.08 sec)

Bye
--------------
SET innodb_parallel_read_threads=5
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (5.69 sec)

Bye
--------------
SET innodb_parallel_read_threads=6
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (4.98 sec)

Bye
--------------
SET innodb_parallel_read_threads=7
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (4.39 sec)

Bye
--------------
SET innodb_parallel_read_threads=8
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
select count(*) from sbtest1
--------------

+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (3.82 sec)

Bye

Suggested fix:
Fix the implementation to match the initial premise.

Hi,

Thank you for the report.

Posted by developer:
 
Thank you Przemek for the report.
Indeed there is a problem and it was not found during development, as the testing was using data cached in the Buffer Pool, leading to no IO read requests. The most of the work was with processing data so it was nicely parallelized and gave good results.
However, once the data is not in the buffer pool, it gets read in. And in comparison to a single threaded scan, the linear read ahead was not used, leading to much worse performance. There is a lot more waiting for data to be read in as these next 64 pages of linear read ahead are not being read in background.
This is especilly visible on data stored on an old HDD, where threads=1 on your data takes 13.5s, while for threads=2 it takes 62s.
If the code is modified to use linear readahead for parallel scan, the performance improves a lot. However, while one thread with read ahead is enough to saturate a single HDD with ~125MB/s, it only gets worse if multiple threads are used, exactly two times slower in my case.
The fix will be evaluated more.
Could you confirm that you have a small BufferPool (smaller than the 2GB that the input data file has), or start with empty BP before each test?
What type of drive do you use for tests?

Hi Marcin,

> Could you confirm that you have a small BufferPool (smaller than the 2GB that the input data file has), or start with empty BP before each test?
Yes, I used default BP, so 128MB.

> What type of drive do you use for tests?
It's NVMe drive, it does ~8.8k read IOPS during the select count.

Posted by developer:
 
Fixed as of the upcoming MySQL Server 8.0.37 and 8.4.0 releases, and here's the proposed changelog entry from the documentation team:

When using innodb_parallel_read_threads values greater than 1, InnoDB
would unnecessarily disable read-ahead heuristics which resulted in stalls
when pages were not already in the buffer pool.

Thank you for the detailed bug report.

Excellent, great job! Now 8.0.37 is much faster in my quick test in both single-thread and multi-threaded mode!

mysql > select @@version,@@innodb_buffer_pool_size,@@innodb_parallel_read_threads;
+-----------+---------------------------+--------------------------------+
| @@version | @@innodb_buffer_pool_size | @@innodb_parallel_read_threads |
+-----------+---------------------------+--------------------------------+
| 8.0.37    |                 134217728 |                              2 |
+-----------+---------------------------+--------------------------------+
1 row in set (0.00 sec)

mysql > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.64 sec)

Thanks for fixing this one quickly!