Bug #79455 | Restore get_sched_indexer_t in 5.7 | ||
---|---|---|---|
Submitted: | 30 Nov 2015 11:35 | Modified: | 20 Nov 2016 8:25 |
Reporter: | Alexey Kopytov | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.7 | OS: | Any |
Assigned to: | CPU Architecture: | ARM | |
Tags: | Contribution |
[30 Nov 2015 11:35]
Alexey Kopytov
[8 Feb 2016 17:10]
OCA Admin
Contribution submitted via Github - Bug #79455: Restore get_sched_indexer_t in 5.7 (*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/52#issuecomment-181393955): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_58601311.txt (text/plain), 5.17 KiB.
[10 Nov 2016 5:59]
MySQL Verification Team
Hello Alexey, Thank you for the report and contribution. Thanks, Umesh
[13 Nov 2016 18:00]
Mark Callaghan
We borrowed the 5.6 version of this code for row counters in MyRocks. In 5.6 it was using the address of the value returned by pthread_self() mod 64 which always equals 0. I doubled QPS for sysbench read-only with --oltp-range-size=10000 after fixing that code to use an array per CPU. I agree with Alexey that for row counters we want an array entry per CPU, not random CPUs. I hope this code settles down and I wonder how many sysbench results for InnoDB are hurt by it today. http://smalldatum.blogspot.com/2016/10/make-myrocks-2x-less-slow.html
[13 Nov 2016 18:08]
Mark Callaghan
array entry per CPU, not array per CPU see https://github.com/facebook/mysql-5.6/commit/d83fe82c8f42023cda6a97021361d90bae39805c
[14 Nov 2016 7:37]
Vasil Dimov
Alexey, All this makes sense (restore the per-core counter where appropriate). I just wonder: In the commit message in the provided patch (http://bugs.mysql.com/file.php?id=23673&bug_id=79455) you mention: "This in itself does not introduce any measurable performance improvements, but is rather a groundwork for further changes." but in Bug#79454 you say: "I got a ~40% speedup in those benchmarks by changing the slot choosing policy to get_sched_indexer_t and modifying callers to use the default policy rather than override it with trx id." What do I miss?
[20 Nov 2016 8:25]
Alexey Kopytov
Vasil, Yes, on my hardware (a double-socket ARM server with 96 cores) simply moving to per-core counters was only a part of the solution. I also had to increase the number of slots from 64 to 96. However, in further experiments I discovered that sched_getcpu() is expensive on ARM, unlike x86, where it is implemented via VDSO. So even though the per-core indexer improved scalability in certain workloads, it also decreased single-threaded performance for us. We are currently experimenting with other approaches to implement scalable "update frequently, read infrequently" counters. Using hardware-specific implementations (like Large System Extension instructions on ARM) looks promising. Anyway, I still think restoring get_sched_indexer_t is a good idea for x86 which is also proved by Mark's experiments. The indexer framework just need to be generalized to allow hardware-specific indexers. Having the number of slots calculated automatically on startup based on the number of available CPUs, rather than being hard-coded to 64 would also be nice.
[29 May 2018 16:07]
OCA Admin
Contribution submitted via Github - Bug #79455: Restore get_sched_indexer_t (*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/209#issuecomment-392573345): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_190787766.txt (text/plain), 5.00 KiB.