Bug #114468 | Inconsistent table performance: fresh data import vs after optimization | ||
---|---|---|---|
Submitted: | 24 Mar 2024 21:51 | Modified: | 25 Mar 2024 16:44 |
Reporter: | Przemyslaw Malkowski | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 8.3.0, 8.0.36 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[24 Mar 2024 21:51]
Przemyslaw Malkowski
[25 Mar 2024 12:32]
MySQL Verification Team
Hello Przemyslaw, Thank you for the report and feedback. I'm currently analyzing the issue, it would be really good if you could please share the innodb_buffer_pool_size details if not default. Thank you. Sincerely, Umesh
[25 Mar 2024 13:49]
Przemyslaw Malkowski
Hello Umesh, BP was at default 128M, so much smaller than the table size. The issue is visible better when disk reads are involved :)
[25 Mar 2024 14:10]
MySQL Verification Team
Hello Przemyslaw, Thank you for the details, I'm able to reproduce issue with provided steps. Sincerely, Umesh
[25 Mar 2024 15:38]
MySQL Verification Team
Test results - 8.0.36
Attachment: 114468_8.0.36.results (application/octet-stream, text), 23.94 KiB.
[25 Mar 2024 15:45]
MySQL Verification Team
Hello Przemyslaw, One more question please. Are using SSD or HDD? Thank you. Sorry for asking too much :) Sincerely, Umesh
[25 Mar 2024 16:44]
Przemyslaw Malkowski
No worries :) I did test on both NVMe flash disk, as well as on a RAID array made from HDDs (spinning drives). In both cases, the impact was similar.
[9 Apr 2024 4:22]
huahua xu
Hi all: By analyzing the test result data, I think that the performance difference mainly comes from linear readahead. In addition, the Innob did not specify the expected page number for a new allocating page during rebuilding B-tree index for bulk load, which caused an interesting phenomenon. Add special debugging code: ``` -dberr_t Page_load::init() noexcept { +dberr_t Page_load::init() noexcept { page_t *new_page; page_no_t new_page_no; buf_block_t *new_block; @@ -393,6 +393,8 @@ dberr_t Page_load::init() noexcept { m_cur_rec = page_get_infimum_rec(new_page); ut_ad(m_is_comp == page_is_comp(new_page)); m_free_space = page_get_free_space_of_empty(m_is_comp); + + printf("level: %ld, page_no: %u \n", m_level, m_page_no); ``` output: ... level: 0, page_no: 116 level: 0, page_no: 128 ... level: 0, page_no: 172 level: 0, page_no: 192 ... level: 0, page_no: 64440 level: 0, page_no: 73792 ... level: 0, page_no: 160316 level: 0, page_no: 160832 ... It is very terrible, which works out of order accessed pages and makes linear readahead ineffective. The below code caused discontinuity in page numbers, and is the reason for the increase in the physical size of the tablespace(~12.5%). ``` page_no_t fseg_alloc_page_no(fil_space_t *space, const page_size_t &page_size, fseg_inode_t *seg_inode, page_no_t hint, byte direction, mtr_t *mtr IF_DEBUG(, bool has_done_reservation) { ... if (direction != FSP_NO_DIR && reserved - used < reserved * (fseg_reserve_pct / 100) && used >= FSEG_FRAG_LIMIT) { ret_descr = fseg_alloc_free_extent(seg_inode, space_id, page_size, mtr); if (ret_descr) { /* 3. We take any free extent (which was already assigned above =============================================================== in the if-condition to ret_descr) and take the lowest or ======================================================== highest page in it, depending on the direction ==============================================*/ ret_page = xdes_get_offset(ret_descr); if (direction == FSP_DOWN) { ret_page += FSP_EXTENT_SIZE - 1; } else if (xdes_get_state(ret_descr, mtr) == XDES_FSEG_FRAG) { ret_page += xdes_find_bit(ret_descr, XDES_FREE_BIT, true, 0, mtr); } ut_ad(!has_done_reservation || ret_page != FIL_NULL); /*-----------------------------------------------------------*/ } } ... } ```
[9 Apr 2024 6:25]
huahua xu
The patch would specify the expected page number for a new allocating page during rebuilding B-tree index for bulk load.
Attachment: hint_page_no_for_page_load.patch (application/octet-stream, text), 2.83 KiB.
[10 Apr 2024 8:34]
MySQL Verification Team
Hello huahua xu, Thank you for your contribution, please ensure to re-send the patch via "contribution" tab. Otherwise we would not be able to accept it. Thank you! Sincerely, Umesh
[19 May 12:00]
MySQL Verification Team
Hello huahua xu, This is just a follow-up message in continuation of my earlier message. Thank you very much for your patch contribution, we appreciate it! Please ensure to re-send the patch via "contribution" tab. Thank you! Sincerely, Umesh
[27 May 12:59]
MySQL Verification Team
Hello huahua xu, Thank you very much for your patch contribution, we appreciate it! In order for us to continue the process of reviewing your contribution to MySQL, please send us a signed copy of the Oracle Contributor Agreement (OCA) as outlined in https://oca.opensource.oracle.com Signing an OCA needs to be done only once and it's valid for all other Oracle governed Open Source projects as well. Getting a signed/approved OCA on file will help us facilitate your contribution - this one, and others in the future. Please let me know, if you have any questions. Sincerely, Umesh