| Bug #116738 | The DDL performance of 8.0.40 has not been restored to 8.0.26 | ||
|---|---|---|---|
| Submitted: | 21 Nov 2024 4:09 | Modified: | 8 May 10:23 | 
| Reporter: | Xizhe Zhang (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: DDL | Severity: | S5 (Performance) | 
| Version: | 8.0.40 | OS: | Any | 
| Assigned to: | CPU Architecture: | Any | |
   [21 Nov 2024 4:09]
   Xizhe Zhang        
  
 
   [29 Nov 2024 10:07]
   MySQL Verification Team        
  Hello Xizhe Zhang, Thank you for the report and feedback. regards, Umesh
   [7 May 7:38]
   MySQL Verification Team        
  Hello Xizhe Zhang, During our internal verification, we observed discrepancy in our test results when compared with that of provided in the report i.e. From your report: Time taken in 8.0.26: 74sec Time taken in 8.0.27: 116sec (56% loss compared to 8.0.26) Time taken in 8.0.40: 130sec (75% loss compared to 8.0.26) What we are seeing internally: Time taken in 8.0.26: 121.49sec Time taken in 8.0.27: 162.91sec (34% loss compared to 8.0.26) Time taken in 8.0.40: 151.74sec (25% loss compared to 8.0.26) We would like to request you to share below details from your verified environment to investigate further: - If is using custom builds such as source build then need exact cmake options used for the build - OS details including Memory, CPU information and file system details on which MySQL Server files residing. - MySQL Server configuration file in use( my.cnf or my.ini or <custom>.cnf whichever is in use) - Any custom InnoDB settings, especially buffer pool size, parallel read threads, and DDL buffer size Thank you. regards, Umesh
   [8 May 6:37]
   Xizhe Zhang        
  Hello, Verification Team!
The performance degradation ratio may be related to the quality of the CPU. I will get different ratios when running the test on a better machine too. I conducted a detailed analysis of this issue some time ago. The degradation ratio is not the key to this issue. The key is why there is such a significant degradation.
By comparing the performance of 8.0.26 and 8.0.27, I found multiple problems that caused performance degradation. In order to avoid making this bug report too lengthy, I will make a summary here and provide analysis and fix patch for each independent problem.
Let me first list a few key options:
For 8.0.27:
innodb_buffer_pool_size=64G # Enough to hold the entire table
innodb_parallel_read_threads=1 # Single-threaded scan clustered index
innodb_ddl_threads=1 # Single-threaded Sort and Build Indexes
innodb_ddl_buffer_size=1M # Default Value
For 8.0.26:
innodb_buffer_pool_size=64G # Enough to hold the entire table
innodb_sort_buffer_size=1M # Default Value
I wrote a small tool based on Perf Probe to count the time and number of times a function is executed. The following is the statistical result of a DDL execution process:
# 8.0.26:
                                             mean          std                min                max    count                sum
mysql_alter_table              77966049794.000000          nan 77966049794.000000 77966049794.000000 1.000000 77966049794.000000
row_merge_build_indexes        77944214368.000000          nan 77944214368.000000 77944214368.000000 1.000000 77944214368.000000
row_merge_read_clustered_index 42416340096.000000          nan 42416340096.000000 42416340096.000000 1.000000 42416340096.000000
row_merge_sort                 24878481077.000000          nan 24878481077.000000 24878481077.000000 1.000000 24878481077.000000
row_merge_insert_index_tuples   7489639265.000000          nan  7489639265.000000  7489639265.000000 1.000000  7489639265.000000
row_merge_file_destroy               51454.250000 96429.121519        2851.000000      196097.000000 4.000000      205817.000000
row_merge_write_redo                  5659.000000  3203.193719        3394.000000        7924.000000 2.000000       11318.000000
flush_observer                  3158910412.000000          nan  3158910412.000000  3158910412.000000 1.000000  3158910412.000000
# 8.0.27:
                                      mean  std                 min                 max    count                 sum
mysql_alter_table      121428803147.000000  nan 121428803147.000000 121428803147.000000 1.000000 121428803147.000000
scan_and_build_indexes 121404012955.000000  nan 121404012955.000000 121404012955.000000 1.000000 121404012955.000000
ddl_scan                51680853370.000000  nan  51680853370.000000  51680853370.000000 1.000000  51680853370.000000
ddl_load                69723129061.000000  nan  69723129061.000000  69723129061.000000 1.000000  69723129061.000000
create_merge_sort_tasks        9888.000000             nan        9888.000000        9888.000000 1.000000        9888.000000
merge_sort              51910233630.000000             nan 51910233630.000000 51910233630.000000 1.000000 51910233630.000000
btree_build             11547696552.000000             nan 11547696552.000000 11547696552.000000 1.000000 11547696552.000000
build_finish             3237135506.500000 29179669.957877  3216502364.000000  3257768649.000000 2.000000  6474271013.000000
The above results show that performance degradation occurred in multiple stages of the DDL execution process. I will provide analysis results in different bug reports and link to this bug report.
+----------------+--------------------------------+----------------------------+
|                |             8.0.26             |           8.0.27           |
+----------------+--------------------------------+----------------------------+
| Scan PK        | row_merge_read_clustered_index | ddl::Parallel_cursor::scan |
|                |              42.4s             |            51.7s           |
+----------------+--------------------------------+----------------------------+
| Merge Sort SK  |         row_merge_sort         |  ddl::Builder::merge_sort  |
|                |              24.9s             |             52s            |
+----------------+--------------------------------+----------------------------+
| Build SK       |  row_merge_insert_index_tuples |  ddl::Builder::btree_build |
|                |              7.5s              |            11.5s           |
+----------------+--------------------------------+----------------------------+
| Flush Observer |      FlushObserver::flush      |    ddl::Builder::finish    |
|                |              3.16s             |            6.4s            |
+----------------+--------------------------------+----------------------------+
| Overall        |               78s              |           121.4s           |
+----------------+--------------------------------+----------------------------+
 
   [8 May 9:48]
   MySQL Verification Team        
  Hello Xizhe Zhang, Thank you for the details, will share this with development and get back to you if anything further needed. regards, Umesh
   [8 May 10:01]
   MySQL Verification Team        
  Related - Bug #118130, Bug #118133
   [8 May 10:23]
   Xizhe Zhang        
  Hello, Verification Team, Thanks for your reply! I have created 3 new Bug Reports, which introduce my analysis process. I will briefly explain it here: -> Analysis for 'Merge Sort' (Bug #118130) -> Analysis for 'Scan PK' (Bug #118133) -> 'Build SK' will benefit from the solution of Problem 2 in Bug #118133. The remaining gap I think comes from the use of 'std::priority_queue' (mentioned in Bug #118130). Since the performance degradation caused by this part is relatively small (less than 1s), I did not submit another bug report. -> Analysis for 'Flush Observer' (Bug #118136)

