Bug #63270 Dirty read during crash recovery
Submitted: 15 Nov 2011 20:39 Modified: 31 Jan 2012 21:27
Reporter: Nizameddin Ordulu Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S6 (Debug Builds)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: crash-recovery, debug, innodb

[15 Nov 2011 20:39] Nizameddin Ordulu
Description:
While UNIV_DEBUG is enabled, I got the following assertion failure inside mtr_commit():

185  /* This is a dirty read, for debugging. */
186  ut_ad(!recv_no_log_write); <-- FAILING ASSERTION
187  write_log = mtr->modifications && mtr->n_log_recs;
188
189  if (write_log) {
190    mtr_log_reserve_and_write(mtr);
191  }

The stack trace for this thread looks like this:

(gdb) bt
#0  0x0000003355030155 in raise () from /lib64/libc.so.6
#1  0x0000003355031bf0 in abort () from /lib64/libc.so.6
#2  0x000000000049e8c1 in mtr_commit (mtr=0x42a71b10) at mtr/mtr0mtr.c:186
#3  0x000000000049383c in recv_recover_page_func (just_read_in=1, block=0x2aaaaadcb928) at log/log0recv.c:1860
#4  0x0000000000435a03 in buf_page_io_complete (bpage=0x2aaaaadcb928) at buf/buf0buf.c:3371
#5  0x000000000046a9fc in fil_aio_wait (segment=2) at fil/fil0fil.c:5781
#6  0x00000000004fb6a7 in io_handler_thread (arg=0x8d6190) at srv/srv0start.c:481
#7  0x0000003355c062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x00000033550d1e3d in clone () from /lib64/libc.so.6

recv_no_log_write is set TRUE inside recv_apply_hashed_log_recs(). The code around that sets recv_no_log_write to TRUE:

2028  if (!allow_ibuf) {
2029    /* Flush all the file pages to disk and invalidate them in
2030    the buffer pool */
2031
2032    ut_d(recv_no_log_write = TRUE);
2033    mutex_exit(&(recv_sys->mutex));
2034    mutex_exit(&(log_sys->mutex));
2035
2036    n_pages = buf_flush_batch(BUF_FLUSH_LIST, ULINT_MAX,
2037            IB_ULONGLONG_MAX);
2038    ut_a(n_pages != ULINT_UNDEFINED);
2039
2040    buf_flush_wait_batch_end(BUF_FLUSH_LIST);
2041
2042    buf_pool_invalidate();
2043
2044    mutex_enter(&(log_sys->mutex));
2045    mutex_enter(&(recv_sys->mutex));
2046    ut_d(recv_no_log_write = FALSE);
2047
2048    recv_no_ibuf_operations = FALSE;
2049  }

The stack trace for the thread that set recv_no_log_write to TRUE looks like this:

(gdb) bt
#0  0x000000000051a09e in ut_fold_binary (str=0x2aaab0b00f1e "", len=16338) at /data/users/nizamordulu/mysql/xtra-build/storage/innodb_plugin/include/ut0rnd.ic:221
#1  0x000000000042ea1e in buf_calc_page_new_checksum (page=0x2aaab0b00000 "") at buf/buf0buf.c:392
#2  0x000000000043a8ee in buf_flush_init_for_writing (page=0x2aaab0b00000 "", page_zip_=0x0, newest_lsn=13252547113609) at buf/buf0flu.c:944
#3  0x000000000043ac10 in buf_flush_write_block_low (bpage=0x2aaaaadb7220) at buf/buf0flu.c:1032
#4  0x000000000043b06f in buf_flush_page (bpage=0x2aaaaadb7220, flush_type=BUF_FLUSH_LIST) at buf/buf0flu.c:1163
#5  0x000000000043b2aa in buf_flush_try_neighbors (space=0, offset=120929, flush_type=BUF_FLUSH_LIST, flush_neighbors=1) at buf/buf0flu.c:1246
#6  0x000000000043b6df in buf_flush_batch (flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615) at buf/buf0flu.c:1392
#7  0x0000000000493dda in recv_apply_hashed_log_recs (allow_ibuf=0) at log/log0recv.c:2036
#8  0x00000000004952c3 in recv_scan_log_recs (available_memory=96468992, store_to_hash=1, buf=0x2aaab11f3200 "\006̯,\002", len=65536, start_lsn=13252547204608, contiguous_lsn=0x7fffa25d2540, group_scanned_lsn=0x7fffa25d2548,
    callback=0) at log/log0recv.c:2977
#9  0x00000000004953b4 in recv_group_scan_log_recs (group=0x1c441fb8, contiguous_lsn=0x7fffa25d2540, group_scanned_lsn=0x7fffa25d2548) at log/log0recv.c:3020
#10 0x00000000004958a8 in recv_recovery_from_checkpoint_start_func (min_flushed_lsn=13190913257809, max_flushed_lsn=13190913257809) at log/log0recv.c:3293
#11 0x00000000004fcefa in innobase_start_or_create_for_mysql () at srv/srv0start.c:1566
#12 0x0000000000407cf9 in innodb_init () at xtrabackup.c:2245
#13 0x000000000040d05d in xtrabackup_prepare_func () at xtrabackup.c:6046
#14 0x000000000040f9b6 in main (argc=0, argv=0x1b670270) at xtrabackup.c:6975

Is this a race condition? The mutexes recv_sys->mutex and log_sys->mutex are unlocked inside recv_apply_hashed_log_recs() after setting recv_no_log_write to TRUE. Does this mean that another thread may read pages that this thread is about to flush? It seems to be the case here.

How to repeat:
read the description.

Suggested fix:
no suggested fix.
[31 Jan 2012 21:27] Nizameddin Ordulu
I think this problem was caused by another bug, which I fixed later.