Bug #118133 | 8.0.27 DDL performance degradation - Scan PK | ||
---|---|---|---|
Submitted: | 8 May 8:40 | Modified: | 8 May 9:59 |
Reporter: | Xizhe Zhang (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: DDL | Severity: | S5 (Performance) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[8 May 8:40]
Xizhe Zhang
[8 May 8:53]
Xizhe Zhang
Problem 1 After fixing problems introduced in Bug #118130, The execution time of Merge Sort in 8.0.27 and 8.0.26 is unified, but there is still a significant difference in the execution time of Scan PK. After another statistical analysis, it was found that the 'row_merge_buf_sort' function in 8.0.26 and the 'Key_sort_buffer::serialize' function in 8.0.27 were both called 258 times, indicating that 258 secondary index blocks were serialized. However, the 'Key_sort_buffer::sort' function in 8.0.27 was called 1575 times, far exceeding the number of 'Key_sort_buffer::serialize'; in contrast, the 'row_merge_buf_sort' function in 8.0.26 was only called 258 times: 8.0.26: mean std min max count sum row_merge_read_clustered_index 41382720376.000000 nan 41382720376.000000 41382720376.000000 1.000000 41382720376.000000 row_merge_buf_sort 49462751.410853 3693709.059359 21740823.000000 80207136.000000 258.000000 12761389864.000000 row_merge_buf_write 7441377.236434 3627104.992159 3059449.000000 25288639.000000 258.000000 1919875327.000000 8.0.27: mean std min max count sum ddl_scan 55336084466.000000 nan 55336084466.000000 55336084466.000000 1.000000 55336084466.000000 traverse 621370612.955056 54452729.661022 248581529.000000 752919344.000000 89.000000 55301984553.000000 traverse_recs 621359160.280899 54452767.352574 248569598.000000 752907768.000000 89.000000 55300965265.000000 key_buffer_sort 10760531.013968 16842892.023687 192286.000000 73626663.000000 1575.000000 16947836347.000000 key_buffer_serialize 10777398.511628 2450400.547623 4572400.000000 21308115.000000 258.000000 2780568816.000000 Why does 8.0.27 have so many sorts? After reading the code, I found that there is only one entry for the 'Key_sort_buffer::sort' function. The high number of calls is because 8.0.27 also sorts the PK that does not need to be sorted (also mentioned in Bug# 115499). After skipping 'key_buffer_sort' for the PK that does not need to be sorted, the total scan time is reduced to 50s, and the time (12.5s) and number (258) of sorting are aligned with 8.0.26: mean std min max count sum traverse_recs 562168393.044944 50876427.781493 225171406.000000 730896629.000000 89.000000 50032986981.000000 key_buffer_sort 48573790.631783 3800736.935213 20883825.000000 92327768.000000 258.000000 12532037983.000000 key_buffer_serialize 10410354.674419 2072421.397148 4344254.000000 23451838.000000 258.000000 2685871506.000000
[8 May 9:15]
Xizhe Zhang
Hello, Verification Team, this is my patch based on 8.0.42 for Problem 1. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: ddl_scan_pk_problem_1.diff (application/octet-stream, text), 1.50 KiB.
[8 May 9:29]
Xizhe Zhang
Problem 2 Perf result
Attachment: 8.0.26.png (image/png, text), 226.63 KiB.
[8 May 9:30]
Xizhe Zhang
Problem 2 Perf result
Attachment: 8.0.27.png (image/png, text), 319.95 KiB.
[8 May 9:31]
Xizhe Zhang
Problem 2 After solving Problem 1, I conducted several statistical analyses and found that there was a significant difference in the time it took to build a B+ tree for PK (row_merge_insert_index_tuples vs insert_direct): 8.0.26: mean std min max count sum row_merge_read_clustered_index 41437766901.000000 nan 41437766901.000000 41437766901.000000 1.000000 41437766901.000000 row_merge_insert_index_tuples 10919227.215642 6169444.760858 598532.000000 116644576.000000 1317.000000 14380622243.000000 row_merge_buf_sort 50043518.081395 4001944.237946 21731955.000000 80215808.000000 258.000000 12911227665.000000 row_merge_buf_write 7018112.724806 1714566.844070 3112412.000000 28037227.000000 258.000000 1810673083.000000 row_merge_write 2952340.263566 542853.877538 2327920.000000 6181485.000000 258.000000 761703788.000000 8.0.27: mean std min max count sum traverse_recs 49859621441.000000 nan 49859621441.000000 49859621441.000000 1.000000 49859621441.000000 insert_direct 14322413.707669 5810964.895471 1000409.000000 119392136.000000 1317.000000 18862618853.000000 key_buffer_sort 48072937.794574 2435081.179260 20931494.000000 63897465.000000 258.000000 12402817951.000000 key_buffer_serialize 10241637.790698 1945422.596186 4510307.000000 21974752.000000 258.000000 2642342550.000000 Using perf to show the CPU hotspot distribution of the two versions (see the two pictures uploaded earlier), I found that the 'insert_direct' function is hotter in 8.0.27. After checking the source code, I found that the original 'ut_ad(page_validate(m_page, m_index))' in 8.0.26 has become 'ut_a(page_validate(m_page, m_index))' in 8.0.27, which makes each page is validated in Release mode.
[8 May 9:34]
Xizhe Zhang
Hello, Verification Team, this is my patch based on 8.0.42 for Problem 2. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: ddl_scan_pk_problem_2.diff (application/octet-stream, text), 546 bytes.
[8 May 9:38]
Xizhe Zhang
Problem 3 Perf result
Attachment: 8.0.26 problem 3.png (image/png, text), 145.83 KiB.
[8 May 9:38]
Xizhe Zhang
Problem 3 Perf result
Attachment: 8.0.27 problem 3.png (image/png, text), 192.56 KiB.
[8 May 9:44]
Xizhe Zhang
Problem 3 After fixing Problem 2, the time consumption of 'insert_direct' was reduced to 14.6s, which is consistent with 8.0.26, and the scanning time consumption was reduced to 45.34s. There is still a small gap between 8.0.27 and 8.0.26. By checking the Perf sampling results (see the two pictures uploaded earlier), it is found that 8.0.27 has two calls to 'rec_get_offsets', while 8.0.26 has only one. After analyzing the code, it is found that the extra 'rec_get_offsets' call in 8.0.27 is redundant. The previous results can be used to delete the redundant 'rec_get_offsets' call, and the scanning time consumption is reduced to 44.1s. I think the remaining gap is caused by the inability to inline many functions, and the impact is relatively small.
[8 May 9:51]
Xizhe Zhang
Hello, Verification Team, this is my patch based on 8.0.42 for Problem 3. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: ddl_scan_pk_problem_3.diff (application/octet-stream, text), 643 bytes.
[8 May 9:59]
MySQL Verification Team
Hello Xizhe Zhang , Thank you for the report and contribution. regards, Umesh