Bug #58212 Possible deadlock in change buffer in debug builds
Submitted: 15 Nov 2010 21:20 Modified: 26 Jan 2011 7:12
Reporter: Marko Mäkelä Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.5+ OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any
Tags: delete buffering, innodb_change_buffering

[15 Nov 2010 21:20] Marko Mäkelä
Description:
While testing something else, I hit this assertion in UNIV_SYNC_DEBUG code:

InnoDB: sync levels should be > 350 but a level is 350
-------------
RW-LATCH INFO
RW-LATCH: 0xb67b8f74 
Locked: thread 2984745840 file /home/marko/innobase/dev/mysql-5.5/storage/innobase/ibuf/ibuf0ibuf.c line 3458  X-LOCK

This is triggered in the following call stack:

#0  ut_dbg_assertion_failed (expr=0x85be098 "srv_is_being_started || sync_thread_levels_g(array, SYNC_IBUF_BITMAP, TRUE)", file=0x85bde08 "/home/marko/innobase/dev/mysql-5.5/storage/innobase/sync/sync0sync.c", line=1239) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/ut/ut0dbg.c:55
#1  0x083b724d in sync_thread_add_level (latch=0xb67d019c, level=350) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/sync/sync0sync.c:1237
#2  0x084211f3 in ibuf_bitmap_get_map_page_func (space=<value optimized out>, page_no=<value optimized out>, zip_size=<value optimized out>, line=1143, mtr=0xb1e75384, file=<value optimized out>) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/ibuf/ibuf0ibuf.c:819
#3  0x084219d0 in ibuf_page (space=0, zip_size=0, page_no=485, mtr=0x0) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/ibuf/ibuf0ibuf.c:1143
#4  0x083e9b44 in buf_page_get_gen (space=0, zip_size=0, offset=485, rw_latch=2, guess=0x0, mode=10, file=0x85d47a0 "/home/marko/innobase/dev/mysql-5.5/storage/innobase/ibuf/ibuf0ibuf.c", line=3213, mtr=0xb1e75d58) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/buf/buf0buf.c:2753
#5  0x08424071 in ibuf_set_entry_counter (mode=<value optimized out>, op=<value optimized out>, no_counter=<value optimized out>, entry=0x89c7250, entry_size=539, index=0x89cbc10, space=1, zip_size=0, page_no=2436, thr=0x89b1c38) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/ibuf/ibuf0ibuf.c:3211
#6  ibuf_insert_low (mode=<value optimized out>, op=<value optimized out>, no_counter=<value optimized out>, entry=0x89c7250, entry_size=539, index=0x89cbc10, space=1, zip_size=0, page_no=2436, thr=0x89b1c38) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/ibuf/ibuf0ibuf.c:3494
#7  0x0842476f in ibuf_insert (op=IBUF_OP_INSERT, entry=0x89c7250, index=0x89cbc10, space=1, zip_size=0, page_no=2436, thr=0x89b1c38) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/ibuf/ibuf0ibuf.c:3728
#8  0x083d8abe in btr_cur_search_to_nth_level (index=0x89cbc10, level=0, tuple=0x89c7250, mode=4, latch_mode=<value optimized out>, cursor=0xb1e770d8, has_search_latch=0, file=0x85e31d8 "/home/marko/innobase/dev/mysql-5.5/storage/innobase/row/row0ins.c", line=2002, mtr=0xb1e76a60) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/btr/btr0cur.c:590
#9  0x0845d74c in row_ins_index_entry_low (mode=<value optimized out>, index=0x89cbc10, entry=0x89c7250, n_ext=0, thr=0x89b1c38) at /home/marko/innobase/dev/mysql-5.5/storage/innobase/row/row0ins.c:2000

In frame #6, the bitmap_mtr in ibuf_insert_low(BTR_MODIFY_PREV) holds the page latch on the insert buffer bitmap page that covers the user index page (space=1,page_no=2436). In frame #5, (space=0,prev_page_no=485) is the insert buffer tree page that may contain previously buffered changes for the user page (1,2436). While attempting to fetch that page, buf_page_get_gen() would assert that ibuf_page() holds. Unfortunately, ibuf_page() would attempt to buffer-fix a further insert buffer bitmap page, that one related to the insert buffer tree page, and latching multiple bitmap pages at a time is only allowed while holding the ibuf_bitmap_mutex.

How to repeat:
Run ibtest3, ibtest3a, ibtest3b on a server compiled with UNIV_DEBUG and UNIV_SYNC_DEBUG for a few minutes.

Suggested fix:
Acquire the ibuf_mutex in this code path in debug builds, or adjust the check in some way. The ibuf_page() check could be made redundant by moving the insert buffer to its own tablespace, not part of the system tablespace.
[16 Nov 2010 7:47] Marko Mäkelä
Here is a deadlock scenario:

Connection 1 attempts to buffer an insert for (space=0,page=1234). It latches the insert buffer bitmap page for that user page, which would be at (0,1).

Connection 2 attempts to buffer an insert for (space=0,page=17000). It latches the bitmap page (0,16385) and checks for previously buffered entries in the insert buffer. While doing that, it loads the insert buffer page (space=0,page=123) and the debug assertion ibuf_page() will check that (0,123) really belongs to the insert buffer by visiting the bitmap page (0,1).

Then, Connection 1 checks for previously buffered entries for (0,1234) in the insert buffer. The buffered changes happen to be at (0,18000) in the insert buffer. So, the ut_ad(ibuf_page()) in buf_page_get_gen() will attempt to acquire a page latch on the insert buffer bitmap page that corresponds to the insert buffer page (0,18000), which would be at (0,16385).

Now, Connection 1 is holding a page latch on (0,1) and Connection 2 is holding a page latch on (0,16385), and both are waiting on each other's latches.

The question is how to fix this. We could acquire the ibuf_bitmap_mutex in debug builds, so that the debug check would be deadlock-free, but that could hide a class of bugs that are present in non-debug builds. We could remove the debug check of ibuf_page() altogether, or we could make a 'best effort', returning an OK status when we are unable to latch the page.
[16 Nov 2010 8:22] Marko Mäkelä
Workaround: use innodb_file_per_table=1 or innodb_change_buffering=none when testing the debug server in MySQL 5.5 or later.

It is possible that this bug exists in all versions of MySQL/InnoDB, after all. It is not necessarily related to the delete buffering that was introduced in MySQL 5.5.
[16 Nov 2010 12:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124024
[16 Nov 2010 12:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124025
[30 Nov 2010 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125506
[30 Nov 2010 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125507
[1 Dec 2010 13:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125653
[1 Dec 2010 13:21] Marko Mäkelä
Pushed to mysql-5.5-innodb and mysql-trunk-innodb.
[1 Dec 2010 13:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125662
[22 Dec 2010 21:31] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101222212842-y0t3ibtd32wd9qaw) (version source revid:alexander.nozdrin@oracle.com-20101222212842-y0t3ibtd32wd9qaw) (merge vers: 5.6.1) (pib:24)
[8 Jan 2011 15:09] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:vasil.dimov@oracle.com-20110108150508-gpanhz48z8069qot) (version source revid:vasil.dimov@oracle.com-20110108150048-b1y9m8xe72hay0ch) (merge vers: 5.5.9) (pib:24)
[26 Jan 2011 7:35] Marko Mäkelä
Bug #59733 was filed for similar bugs in non-debug builds.