Bug #116203 Enhance MySQL 8.0 performance under low concurrency to outperform version 5.7.
Submitted: 23 Sep 2024 14:22 Modified: 25 Sep 2024 12:02
Reporter: Bin Wang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: REDO Log optimization

[23 Sep 2024 14:22] Bin Wang
Description:
Through careful performance analysis using testing tools, we identified the root cause of MySQL 8.0's lower performance compared to MySQL 5.7. The issue lies in the redo log handling, where an inefficient wait process was used for low concurrency instead of a more efficient busy-wait process.

Here is the performance comparison test result:
Concurrency  MySQL 5.7	MySQL 8.0.39	Optimized MySQL 8.0.39
1            1407	        1187	                2450
2	     2699	        2155	                4009
4	     4857	        3758	                7082
8	    10433	        7320	               12819
16	    17121	       12070	               25392
32	    34965	       23335	               42809

How to repeat:
In a low I/O environment using sysbench read-write tests, the performance difference is clearly visible.

Suggested fix:
Optimize redo log performance under low concurrency, as low concurrency is the norm and reducing latency is crucial. Having performance lower than 5.7 is unwise.
[23 Sep 2024 14:26] Bin Wang
If anyone is interested in using my program for comparison, you can visit the following link: https://github.com/advancedmysql/mysql-8.0.39.
[23 Sep 2024 14:58] MySQL Verification Team
Hi Mr. Wang,

Thank you so much for your feature request.

Since you are OCA registered user, we will gladly review your patch for the optimised performance under low concurrency in InnoDB storage engine.

However, we need a full patch from you, which would also include conditions under which your algorithm would be applied. Namely, low concurrency is very , very imprecise definition.

We are eagerly waiting for your patch that will improve our InnoDB performance.

Many thanks in advance ........
[23 Sep 2024 15:07] Bin Wang
I have added the patch, which is visible to developers. This performance issue is not unreproducible; it's a problem widely recognized in the MySQL community, where MySQL 8.0 is seen as having lower performance.
[23 Sep 2024 15:10] Bin Wang
MySQL should listen to its users. This issue has been reported by many users, persisting for five years, and the root cause was only recently discovered. Don't dismiss user-reported problems lightly.
[23 Sep 2024 15:19] MySQL Verification Team
Hi Mr. Wang,

We are listening carefully to our users.

However, as you are putting it yourself, this is a patch for reference only.

We will happily accept the patch that works under all circumstances of the low concurrency.

We also need a patch that would pass all our, very strict, internal tests.

We would like to see from you a final patch and not only the one which is there for reference only.

Can you provide us with such a patch ??????
[23 Sep 2024 15:21] MySQL Verification Team
Dear Mr. Wang,

We do agree that your patch can be used for reference only ........

We do need a fully tested patch that would improve our performance under lo concurrency on all types of the storage devices.
[23 Sep 2024 15:22] MySQL Verification Team
Also, why only address redo log and not include the undo logs as well ?????
[23 Sep 2024 15:25] Bin Wang
I initially wasn't going to provide the patch, as it's still not fully mature and has only been tested on my personal machine. However, the final optimization mechanism is the same: focusing on low concurrency rather than following an inefficient process. You claimed 'can't repeat' without the patch, so I provided it. Review the code and tests, and have Paweł Olchawa, the author, go through it—he knows best what needs to be changed.
[23 Sep 2024 15:30] Bin Wang
I'm not very familiar with redo logs either. My optimizations follow a logical approach, making minimal code changes that do not affect logical correctness, and then observing whether the results meet expectations. A comprehensive overhaul is not my optimization method; just like with group replication, optimizing through logic leads to fewer issues.
[23 Sep 2024 15:47] Bin Wang
If a perfect patch is needed, it may take a few months. The improved MySQL will be open-sourced for users to test, at which point there will be ample test results available. My personal resources are limited.
[24 Sep 2024 9:26] MySQL Verification Team
Hi Mr. Wang,

There is a big problem with provisional patches.

Applying them usually leads to dozen of regressions bugs.

We care about our product and we do not want to spoil it by treating millions of our users to completely new bugs, all of which would stem from the provisional patch.

That is why we accept only proper patches.

We are sure that you are capable of providing the one. We shall be very grateful in that case and your name would be mentioned in the Changelog.

Thank you in advance.
[24 Sep 2024 13:09] Bin Wang
What issues have arisen during regression testing? I only used InnoDB's regression test cases, and the regression test errors were similar before and after applying the patch.
[24 Sep 2024 13:11] Bin Wang
According to our testing of InnoDB's test cases, many of the test cases themselves have issues, regardless of whether the patch is applied.
[24 Sep 2024 14:17] MySQL Verification Team
Hi Mr. Wang,

Usually, regression bugs that pop-up after patches that are not thoroughly designed are totally new bugs. Bugs that did not exist until a patch was applied.
[24 Sep 2024 14:57] Bin Wang
From the original code comments and logic, there shouldn't be any issues unless the original program itself has problems or the regression test cases are not rigorous (as seen with numerous non-rigorous test cases in group replication). We need to know what regression testing issues were identified in the official tests.
[24 Sep 2024 15:13] Bin Wang
We analyze the first scenario:
"Don't spin because either cpu usage is too high or it's almost idle so no reason to bother."

Spinning when the CPU is idle should not matter; logically, it should be correct whether or not you choose to spin.

As for the second fix, the comments also indicate that correctness does not depend on this. 
"We might read older value, it just decides on spinning.  Correctness does not depend on this. Only local performance might depend on this but it's anyway heuristic and depends on average which by definition has lag. No reason to make extra barriers here."

Given these two areas where the program spins under low concurrency, I really can't think of what logical problems could arise.
[24 Sep 2024 15:16] Bin Wang
Whether or not to spin should only be related to performance in the design; there shouldn't be any logical issues. Otherwise, there is a problem with the program itself.
[24 Sep 2024 15:20] MySQL Verification Team
Hi Mr. Wang,

Regression bugs are those that never appeared before, but do appear after some improvement (or bug fix) is made in the code. Code improvements produce far more regression bugs than bug fixes.

Hence, there is no such thing as testing of the regression bugs, because we cannot test for bugs that will appear only in the production. 

We cannot test for the totally new bugs.

Such tests do not exist.

It is, however, our experience that only a long , laborious design and coding leads to the fewest regression bugs.

Every week we get one or more bug  reports that occurred due to some code changes. in 80 % of cases,   it is due some improvement, while the rest are because of the old bug fixes.

I hope that we were clear this time.
[25 Sep 2024 12:01] MySQL Verification Team
Hi Mr. Wang,

We have concluded that this is a valid feature request.

Verified as a feature request.