Bug #70420 | are these uses of BTR_MODIFY_TREE needed? | ||
---|---|---|---|
Submitted: | 25 Sep 2013 16:15 | Modified: | 8 Jan 2014 7:00 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 5.7.2 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[25 Sep 2013 16:15]
Mark Callaghan
[25 Sep 2013 16:27]
Mark Callaghan
Other uses: #3 - row_ins_index_entry_big_rec_func (insert off page column) btr_cur_search_to_nth_level(index, 0, entry, PAGE_CUR_LE, BTR_MODIFY_TREE, &cursor, 0, file, line, &mtr); #4 - ibuf_insert err = ibuf_insert_low(BTR_MODIFY_PREV, op, no_counter, entry, entry_size, index, space, zip_size, page_no, thr); if (err == DB_FAIL) { err = ibuf_insert_low(BTR_MODIFY_TREE | BTR_LATCH_FOR_INSERT, op, no_counter, entry, entry_size, index, space, zip_size, page_no, thr); #5 - ibuf_delete_rec if (!ibuf_restore_pos(space, page_no, search_tuple, BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE, pcur, mtr)) { #6 - row_ins_clust_index_entry DBUG_RETURN(row_ins_clust_index_entry_low( 0, BTR_MODIFY_TREE, index, n_uniq, entry, n_ext, thr)); #7 - row_ins_sec_index_entry err = row_ins_sec_index_entry_low( 0, BTR_MODIFY_TREE, index, offsets_heap, heap, entry, 0, thr); #8 - row_purge_remove_clust_if_poss if (row_purge_remove_clust_if_poss_low( node, BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE)) { #9 - row_undo_ins_remove_clust_rec success = btr_pcur_restore_position( BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE, &node->pcur, &mtr); #10 - row_undo_ins_remove_sec /* Try then pessimistic descent to the B-tree */ retry: err = row_undo_ins_remove_sec_low( BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE, index, entry); #11 - row_undo_mod_clust err = row_undo_mod_clust_low( node, &offsets, &offsets_heap, heap, &rebuilt_old_pk, thr, &mtr, BTR_MODIFY_TREE); ... err = row_undo_mod_remove_clust_low( node, thr, &mtr, BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE); #12 - row_undo_mod_del_mark_or_remove_sec err = row_undo_mod_del_mark_or_remove_sec_low(node, thr, index, entry, BTR_MODIFY_TREE | BTR_LATCH_FOR_DELETE); #13 - row_undo_mod_del_mark_sec err = row_undo_mod_del_unmark_sec_and_undo_update( BTR_MODIFY_TREE, thr, index, entry); #14 - row_undo_mod_upd_exist_sec err = row_undo_mod_del_unmark_sec_and_undo_update( BTR_MODIFY_TREE, thr, index, entry); And a few more...
[27 Sep 2013 5:58]
Yasufumi Kinoshita
All is needed. Basically, "page allocate/free is needed" == BTR_MODIFY_TREE. If BTR_MODIFY_LEAF is failed, BTR_MODIFY_TREE is needed. If needs to write BLOB, BTR_MODIFY_TREE is needed. If intended to block the other BTR_MODIFY_TREE, BTR_MODIFY_TREE is needed. From 5.7.2, BTR_MODIFY_TREE uses new type of rw_lock 'SX' state for index->lock. | S|SX| X --+--+--+-- S| o| o| x --+--+--+-- SX| o| x| x --+--+--+-- X| x| x| x So, the index->lock fix can be described as "BTR_MODIFY_TREE became not to block other operations for index if for not related pages. But each BTR_MODIFY_TREE conflict each other still." If we need more concurrency (relax index->lock more or shorter holding index->lock), we need to change data file format about segment management for index at first. Even if you simply eliminate index->lock SX, you will meet deadlock or same level conflict at root page block.
[27 Sep 2013 12:50]
MySQL Verification Team
My opinion was that BTR_MODIFY_TREE was required in all listed cases. I needed a confirmation and I am thanking Yasufumi for providing it directly into the bug. Based on some other discussions, SX lock can still get some smaller changes, not yet fully defined. I am not setting this bug to "Not a bug", because you could find some other non-listed examples and there could be some other (smaller) changes to the code.
[2 Oct 2013 17:31]
Mark Callaghan
Yasufumi - thanks for the prompt and useful response
[3 Oct 2013 6:47]
Marko Mäkelä
Mark, First of all, as Yasufumi noted, BLOB allocation needs BTR_MODIFY_TREE. I think that it is a bad design decision to use the two "file segments" in the B-tree root page to keep track of BLOB allocations. If we moved it elsewhere, we should be able relax the latching constraints somewhat. But, this is not something we can do any time soon, because it would require a new table format. Second, while reviewing Yasufumi's work, I was having some trouble understanding the way how he split the index->lock. So, I asked Yasufumi to draw some diagrams and write down examples. Based on them, I formulated the following explanation, which I hope will help you understand this deeper. [begin quote from WL#6326] The acquisition of node pointer page latches is protected by index->lock latch. Before WL#6326, index->lock protected all node pointer pages either in S or X mode, and no individual block->lock were acquired on node pointer pages. After WL#6326, node pointer pages are protected by individual block->lock S-latch or X-latch. The acquisition of node pointer page latches is covered by index->lock, for preventing deadlocks. (0) Definition: B-tree level. (0.1) The leaf pages of the B-tree are at level 0. (0.2) The parent of a page at level L has level L+1. (The level of the root page is equal to the tree height.) (0.3) The B-tree lock (index->lock) has level=HEIGHT+1, i.e., it is the parent of the root page. There are 3 locking modes of index->lock: (1) When holding index->lock S-latch: (1.1) Page latches must be acquired in descending order of tree level. (1.2) When acquiring the first node pointer page latch at a given B-tree level, we must hold the parent latch (at level+1). (1.3) If there is a node pointer page already latched on the same level, we can only acquire its right sibling page latch. (1.4) Node pointer page latches must be released in child-to-parent order. (This intends to prevent deadlocks with the index->lock SX-lock case.) (1.4.1) A node pointer page latch at level L can only be released when not holding any latches at level<L (children), or (1.4.2) All node pointer page latches can be released in one go, without acquiring any latches in between. (1.5) [derived from (1.1), (1.2)] The first node pointer page latch acquired must be the root page latch. (2) When holding index->lock SX-latch: Rules (1.2) and (1.3) are relaxed and merged into (2.2), and rule (1.4) is removed. This means that we can skip page latch acquisition at some tree levels, and can acquire latches in a less restricted order. (2.1) [same as (1.1)]: Page latches must be acquired in descending order of tree level. (2.2) When acquiring a node pointer page latch at level L, we must hold the left sibling page latch (at level L) or some ancestor latch (at level>L). (2.3) [derived from (2.1), (2.2)] The first node pointer page latch acquired can be any node pointer page. (3) When holding index->lock X-latch: We can acquire node pointer page latches in any order. NOTE: WL#6326 does not affect the latching rules of leaf pages: Reads need index->lock S-latch for the node pointer traversal. Once the leaf level is reached, we can release index->lock (and with the WL#6326 changes, all node pointer latches). Once we are on a leaf page, we can safely acquire the right sibling leaf page latch, release the old page latch, and do a left-to-right index scan. Modifications on a single leaf page (BTR_MODIFY_LEAF) are covered by index->lock S-latch throughout the operation. Operations involving page splits or merges (BTR_MODIFY_TREE) and page allocations are covered by index->lock X-latch.
[9 Oct 2013 12:43]
MySQL Verification Team
I am setting a status based on the discussions so far. I will set it to "Verified" as soon as InnoDB team comes with it's own, internal plans for the improvements in the area.
[8 Jan 2014 7:00]
Sunny Bains
I think Yasufumi has answered the question and therefore I would like to close this bug. Improving concurrent access to the indexes is an ongoing task and I would like that to be tracked separately.