Bug #116531 Performance regressions from the diff that adds instant add and drop column
Submitted: 1 Nov 2024 17:02 Modified: 25 May 14:39
Reporter: Mark Callaghan Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:x86

[1 Nov 2024 17:02] Mark Callaghan
Description:
The diff that adds instant add & drop column support in MySQL 8.0.29 also makes the update-index sysbench microbenchmark a lot slower as throughput drops almost in half in the worst case.

This is explained here:
https://smalldatum.blogspot.com/2024/11/too-many-performance-regressions-for.html

The diff is here:
https://github.com/mysql/mysql-server/commit/e8f422de

How to repeat:
I tested this on several servers. On two small PCs (AMD laptop-class CPU, 1 thread) there isn't a large regression. One two of my large servers (2-socket Intel, 48-core AMD EPYC) there is a small regression. On one of my large servers (32-core AMD EPYC) there is a huge regression. I have yet to figure out why this isn't a big problem on all of my large servers.

This occurs when I run sysbench and how I run sysbench is explained here:
https://smalldatum.blogspot.com/2023/03/how-i-run-sysbench.html

There are ~43 (or ~30) microbenchmarks when I run sysbench. The regression for this bug occurs in the update-index microbenchmark, but I did not try to reproduce this by only doing load and then update-index. So I am not sure whether it will repeat in that case -- hopefully it does as that would save time.

Suggested fix:
I don't have a suggestion yet. But from vmstat I see both more CPU overhead and more context switches so I assume the problem is either mutex or memory system contention.
[4 Nov 2024 10:11] MySQL Verification Team
Hi Mr. Callaghan,

Thank you for your bug report.

We have managed to repeat your performance regression.

Verified as reported.
[25 Apr 14:39] MySQL Verification Team
Hi Mark,

We've conducted extensive testing on our side and observed only minor regressions between MySQL 8.0.28 and 8.0.40.

Our tests were run on Intel systems with 48 cores (Hyper-Threading enabled) and 192 GB of RAM, as well as systems with 44 cores and 256 GB of RAM, including VMs. In all these environments, we did not encounter performance degradations comparable to what you reported on the "Ryzen Threadripper PRO 5975WX with 32-Cores".

Initially, we expedited the verification process after noticing some regressions on VMs. However, upon closer inspection, the numbers varied significantly from your report.

Here are some test results from the VM where we observed the most notable regression:

Environment:
MySQL-Oracle-Linux-8-x86_64-2025-02-06
VM.Standard.AMD.Generic – OCPUs: 64, Memory: 128 GB
Filesystem: ext4

8.0.28 Performance:

    24 threads: 105318.57, 100918.31, 103095.57, 106619.9

    40 threads: 101954.7, 101294.19, 112195.95, 113935.08

8.0.40 Performance:

    24 threads: 128229.52, 115669.21, 107692.39, 103426.61

    40 threads: 102865.89, 100332.93, 99822.98, 98677.37

Summary:

    8.0.40 outperforms 8.0.28 by 8.44% at 24 threads

    8.0.28 outperforms 8.0.40 by 6.44% at 40 threads

These results are significantly different from the ~50% regression you observed on your dell32 system. While we do see some performance degradation on VMs under high thread counts, we observed no such issues on bare-metal systems.

Regarding your test configuration, we noticed that you're not using a "production-grade" setup. Specifically, using relaxed REDO fsync and binary logging settings may lead to potential data loss.

It would be extremely helpful if you could retest on your dell32 system using a production configuration. Even better would be testing with the XFS filesystem and MySQL 8.0.42. EXT4 has shown regression issues in the past, as detailed here:
http://dimitrik.free.fr/blog/posts/mysql-80-perf-xfs-vs-ext4.html

Thank you for your continued work and patience. Looking forward to your findings.
[26 May 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".