| Bug #99412 | Threads_running becomes scalability bottleneck on multi-node NUMA topologies | ||
|---|---|---|---|
| Submitted: | 30 Apr 2020 13:24 | Modified: | 6 May 2020 6:29 |
| Reporter: | Sergey Glushchenko | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Compiling | Severity: | S5 (Performance) |
| Version: | 8.0.19 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[30 Apr 2020 13:32]
Sergey Glushchenko
Attached patch removes global atomic_num_thread_running variable. Instead, the number of running threads is counted when p_s.global_status is populated. It brings behavior change. Now session status for threads_running is always 1.
[4 May 2020 12:50]
MySQL Verification Team
Hello Mr. Glushchenko, Thank you for your performance improvement report. I have analysed your patch and it is my opinion that it makes lots of sense. Verified as reported. Thank you, so much, for your contribution.
[6 May 2020 6:29]
Sergey Glushchenko
cleaner version of the patch
Attachment: bug99412.patch (application/octet-stream, text), 8.10 KiB.
[6 May 2020 6:29]
Sergey Glushchenko
Thank you very much Sinisa! I've attached cleaner version of the patch against MySQL 8.0.20
[6 May 2020 12:44]
MySQL Verification Team
Thank you Mr. Glushchenko !!!!

Description: The Threads_running counter is a hotspot in sysbench tests at high concurrency. The counter is modified twice per every SQL command in dispatch_command(): once before a command execution begins, and another time after it finishes. Naturally, modifying a global variable at that rate can easily become a problem with many cores, complex NUMA topologies and short queries like those in sysbench Point Select. The problem manifests itself as dispatch_command() being high in perf reports, for example: 8.28% mysqld [kernel.kallsyms] [k] __wake_up_common_lock 4.89% sysbench [kernel.kallsyms] [k] finish_task_switch 4.75% mysqld [kernel.kallsyms] [k] finish_task_switch 3.09% mysqld mysqld [.] dispatch_command 1.93% sysbench [kernel.kallsyms] [k] prepare_to_wait 1.85% mysqld [kernel.kallsyms] [k] __sys_recvfrom with perf annotate showing increments/decrements as a bottleneck: : /** : Increments thread running statistic variable. : */ : void inc_thread_running() : { : my_atomic_add32(&num_thread_running, 1); 0.00 : c37b24: mov x10, #0x2060 // #8288 0.00 : c37b28: add x24, x21, x10 : my_atomic_add32(): : return __atomic_fetch_add(a, v, __ATOMIC_SEQ_CST); 10.53 : c37b2c: ldaxr w0, [x24] 0.00 : c37b30: add w0, w0, #0x1 22.99 : c37b34: stlxr w1, w0, [x24] 0.59 : c37b38: cbnz w1, c37b2c <dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x20c> : _Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command(): How to repeat: Run in-memory sysbench oltp_ps and use perf to find bottlenecks.