Bug #116005 | With NUMA and Read Committed, performance worsens under high concurrency. | ||
---|---|---|---|
Submitted: | 4 Sep 2024 23:43 | Modified: | 9 Feb 14:10 |
Reporter: | Bin Wang (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | all versions | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[4 Sep 2024 23:43]
Bin Wang
[6 Sep 2024 7:34]
Jakub Lopuszanski
Hello Bin Wang! Great insights, in particular: * that being too long in critical section can risk time slice running out, and then all threads which wait for the latch will need to hope for the scheduler to guess that the only way to unblock them is to schedule the thread which holds the latch again which might be very difficult to do by chance if there are thousands of spinning threads * that the chance of running of time is greater if you have to access a lot of memory locations from distant NUMA node It so happens that I work on the same topic/area ATM and have a similar, yet different (in theory: faster) solution. Also, both of these approaches seem to take inspiration from Paweł's Olchawa idea of separating "sparse" and "dense" regions of active_trx_ids. One problem I am facing with my patch (which takes these ideas further), is that on REPEATABLE READ (which doesn't create/copy read views so often as READ COMMITTED so doesn't benefit much from improvements in this) on UPDATE KEY and UPDATE NO KEY workloads (which unlike OLTP RW do not perform SELECTS thus don't need read-views at all or rarely, yet commits and thus updates active trx ids set often) on machine which has 2 Sockets and many CPUs (and thus pays higher price for any form communication, be it writes, cache misses, or atomic operations) on scenario with many Clients...I see a slow down. I am still investigating what is the exact culprit (looks like bottleneck shifts to some other place which handles congestion even worse?). I see you've mostly tested on TPC-C. Have you tried BMK/sb_exec/sb11-OLTP_RW_10M_8tab-uniform-upd_idx1-notrx.sh 1024 ? In particular on a config like: --user=root --log_error_verbosity=3 --back-log=0 --core-file --disable-log-bin --innodb-adaptive-hash-index=OFF --innodb-buffer-pool-instances=8 --innodb-flush-method=O_DIRECT --innodb-io-capacity=10000 --innodb-io-capacity-max=12000 --innodb-page-cleaners=8 --innodb-purge-threads=4 --innodb-read-io-threads=4 --innodb-change-buffering=none --innodb-numa-interleave=ON --innodb-undo-log-truncate=OFF --performance-schema=ON --max_connections=2000 --max_prepared_stmt_count=50000 --datadir=/nvm/jlopusza/data --innodb-redo-log-capacity=90G --innodb-write-io-threads=4 --innodb-log-group-home-dir=/ssd/jlopusza --innodb-undo-directory=/ssd/jlopusza --innodb-buffer-pool-size=128G --thread_cache_size=1200 --performance_schema=ON --innodb_monitor_enable=% --range_alloc_block_size=16384 --loose_temptable_use_mmap=OFF --loose_temptable_max_ram=4294967296 --tls-version= --require_secure_transport=OFF --tmpdir=/tmp/
[6 Sep 2024 8:16]
Bin Wang
I’ll test it out when I get the chance. Unusual issues are valuable to us because they are interesting. Regarding the transaction system, our strategy is to limit the number of threads interacting with it, which is why we avoided more complex solutions. Our current fix requires only 200 lines of code, is easy to validate, and works effectively with transaction throttling mechanisms. This approach was inspired by related research papers. We prefer BenchmarkSQL TPC-C tests to meet POC requirements.
[7 Feb 7:36]
Rahul Sisondia
The link shared in the bug report is broken, could you please share the updated link, just curious.
[9 Feb 14:10]
Bin Wang
https://enhancedformysql.github.io/blogs/innodb_storage.html