Bug #101018 High contention on last_value in my_timer_microseconds
Submitted: 1 Oct 2020 9:28 Modified: 9 Oct 2020 15:09
Reporter: Georgy Kirichenko Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: Marc ALFF CPU Architecture:Any
Tags: Contribution, scalability

[1 Oct 2020 9:28] Georgy Kirichenko
Description:
There are two functions: my_timer_microseconds and my_timer_milliceconds in mysys/my_rdtsc.cc. Both functions save last seen value in a last_value static variable which is shared across all running threads. This trick was implemented because of ancient bug report about gettimeofday intermittent errors (https://bugs.mysql.com/bug.php?id=36819).

After Linux kernel source investigation I found that the only way in which gettimeofday can fail with tz=NULL passed is to use an invalid pointer as the first argument but this is definitely not our case.

Even if we assume that intermittent errors of gettimeofday are possible then this failures are handled quite inconsistently throughout the MySQL source. In some cases we run an endless loop until it succeeds, in some cases we just ignore failures, and in some cases we return "last seen" value. The latter is the most expensive method, because it involves updating a global variable.

The price to pay is visible on modern Linux where gettimeofday with tz=NULL may never fail, below are sysbench results on X86 and AARCH64 for vanilla and patched version of my_timer_microseconds where assumed that gettimeofday is reliable:

AARCH64:
|   test  | threads | vanilla     | patched    |  diff |
|:-------:|--------:|------------:|-----------:|------:|
| OLTP_PS |      64 |   791578.16 |  807313.49 | 1.99% |
| OLTP_PS |     128 |   986087.11 | 1007570.61 | 2.18% |
| OLTP_PS |     256 |   986725.61 | 1009114.64 | 2.27% |
| OLTP_RO |      64 |    30258.31 |   30336.54 | 0.26% |
| OLTP_RO |     128 |    34946.59 |   35087.35 | 0.40% |
| OLTP_RO |     256 |    34994.82 |   35150.73 | 0.45% |
| OLTP_RW |      64 |    20159.63 |   20201.69 | 0.21% |
| OLTP_RW |     128 |    18579.80 |   18681.46 | 0.55% |
| OLTP_RW |     256 |    19140.54 |   19244.04 | 0.54% |

X86:
|   test  | threads | vanilla     | patched    |  diff |
|:-------:|--------:|------------:|-----------:|------:|
| OLTP_PS |      64 |   551636.06 |  553277.90 | 0.30% |
| OLTP_PS |     128 |   574358.40 |  579725.62 | 0.93% |
| OLTP_PS |     256 |   559891.53 |  565747.08 | 1.05% |
| OLTP_RO |      64 |    18662.21 |   18783.30 | 0.65% |
| OLTP_RO |     128 |    18827.19 |   18948.55 | 0.64% |
| OLTP_RO |     256 |    18602.29 |   18741.15 | 0.75% |
| OLTP_RW |      64 |     9343.21 |    9380.53 | 0.40% |
| OLTP_RW |     128 |     9880.84 |    9927.12 | 0.47% |
| OLTP_RW |     256 |     9634.62 |    9666.60 | 0.33% |

The code in question is used only in Performance Schema. Even if gettimeofday() may fail, using the expensive "last seen value" method is not any different from any other ways to handle errors: it will only result in skewed statistics, which is not that important, given how unlikely this event is.

How to repeat:
Sysbench with PFS enabled
[1 Oct 2020 12:29] MySQL Verification Team
Hello Georgy Kirichenko,

Thank you for the report and contribution.

regards,
Umesh
[9 Oct 2020 15:09] Paul DuBois
Posted by developer:
 
Fixed in 8.0.23.

Performance overhead of timer code was reduced. This should be of most
benefit to workloads with high concurrency using the Performance Schema.
Thanks to Georgy Kirichenko for the contribution.