MySQL Bugs: #42648: Maria hang in read

Bug #42648	Maria hang in read_block() on recovery
Submitted:	6 Feb 2009 14:09	Modified:	3 Mar 2009 8:35
Reporter:	Philip Stoev	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Maria storage engine	Severity:	S1 (Critical)
Version:	6.0	OS:	Any
Assigned to:	Guilhem Bichot	CPU Architecture:	Any

Description:
When recovering after a crash on a table containing indexes on many long fields, maria hangs with the recovery thread in this position:

#0  0x000000315b00b309 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000c4d981 in safe_cond_wait (cond=0x22475e8, mp=0x1d82600, file=0xe7b361 "ma_pagecache.c", line=2669) at thr_mutex.c:424
#2  0x0000000000b7ac2f in read_block (pagecache=0x1d82540, block=0x2da19f8, primary=0 '\0') at ma_pagecache.c:2668
#3  0x0000000000b7c229 in pagecache_read (pagecache=0x1d82540, file=0x33cb958, pageno=51, level=0, buff=0x0, type=PAGECACHE_PLAIN_PAGE,
    lock=PAGECACHE_LOCK_WRITE, page_link=0x7fff47e33f80) at ma_pagecache.c:3312
#4  0x0000000000bbcc71 in _ma_apply_redo_insert_row_head_or_tail (info=0x33cb568, lsn=4299035750, page_type=2, new_page=0 '\0', header=0x340efda "3",
    data=0x340efe0 "", data_length=1023) at ma_blockrec.c:6054
#5  0x0000000000c06c59 in exec_REDO_LOGREC_REDO_INSERT_ROW_TAIL (rec=0x7fff47e360f0) at ma_recovery.c:1390
#6  0x0000000000c04f80 in display_and_apply_record (log_desc=0x1d83010, rec=0x7fff47e360f0) at ma_recovery.c:504
#7  0x0000000000c093bb in run_redo_phase (lsn=4294975495, apply=MARIA_LOG_APPLY) at ma_recovery.c:2449
#8  0x0000000000c048f9 in maria_apply_log (from_lsn=4294975495, apply=MARIA_LOG_APPLY, trace_file=0x30d8dd0, should_run_undo_phase=1 '\001',
    skip_DDLs_arg=1 '\001', take_checkpoints=1 '\001', warnings_count=0x7fff47e38ac0) at ma_recovery.c:310
#9  0x0000000000c04698 in maria_recovery_from_log () at ma_recovery.c:213
#10 0x0000000000b814eb in ha_maria_init (p=0x2d85f08) at ha_maria.cc:3052
#11 0x0000000000818e8a in ha_initialize_handlerton (plugin=0x2435760) at handler.cc:450
#12 0x00000000008cf4b4 in plugin_initialize (plugin=0x2435760) at sql_plugin.cc:1008
#13 0x00000000008d2ba6 in plugin_init (argc=0x1318760, argv=0x23a6f70, flags=0) at sql_plugin.cc:1217
#14 0x00000000006c4c52 in init_server_components () at mysqld.cc:4132
#15 0x00000000006c5742 in main (argc=11, argv=0x7fff47e39278) at mysqld.cc:4643

How to repeat:
A test case will be provided shortly.

YY file

Attachment: bug42648.yy (application/octet-stream, text), 884 bytes.

ZZ file for this bug 42648

Attachment: bug42648.zz (text/plain), 382 bytes.

To reproduce with the RQG.

$ perl runall.pl \
  --basedir=/build/bzr/6.0-maria/ \
  --engine=Maria \
  --grammar=conf/bug42648.yy \
  --gendata=conf/bug42648.zz \
  --reporter=Recovery \
  --mysqld=--skip-falcon

This will quickly cause a crash due to an UTF32  bug (filed separately). Then, recovery will hang forever.

This hang afflicts a wide range of workloads, including situations where Maria tables are not used at all.

tested on Linux 32 bit, latest mysql-test-extra-6.0, latest 6.0-maria, debug build, no hang (tried several times, with and without --mem). But we fixed nothing related to such hang I believe.
So, I'm puzzled. Is there any machine where the problem happens, that I can log into?

I am afraid I have not seen it in a while. It was 100% reproducible at the time the bug was filed.

If you are still interested in debugging it, please use an older Maria tree. If not, I will re-open the bug if I see this again.

I was able to repeat the hang by using the old 6.0-maria of:
revision-id:guilhem@mysql.com-20090129200110-yxugmqmjqcwdiey3
This bug was fixed by the change to pagecache_unlock_by_link() made in revision:
monty@mysql.com-20081227020516-bmta8shmtz0hqfhc
and which reached 6.0-maria on 2009-02-13.
This is sure, because when I take the datadir at the time of crash, and let the old 6.0-maria do the recovery, it always hangs in read_block() but if I apply only the change made by the above revision it succeeds.
The per-file description for the change, in the revision, is "Mark page as read when we do a write of a full page. This fixes a bug when we got an error during read and then used direct write to page to update it".
I assume the mentioned "error during read" could happen if Recovery finds in the log:
- first a record which says to create page 2
- then a record which says to create page 1
Applying the first record will create page 2 in memory (in page cache, not on disk), and increase share->state.state.data_file_length.
Thus applying the second record (_ma_apply_redo_insert_row_head_or_tail()) will try to read the page from the page cache (because data_file_length is bigger than page 1, it is possible that page already exists); it does not find it, neither in cache nor on disk (file is still empty on disk) and so page cache says HA_ERR_FILE_TOO_SHORT and automatically creates the page in memory (that is, provides caller with a buffer to fill); then recovery fills this page, then unlocks it (pagecache_unlock_by_link()); at that moment it needs to say that the page is read (PCBLOCK_READ) otherwise a future reader (find_block()) will consider the page is PAGE_TO_BE_READ and thus read_block() will wait forever.
In other words: when page cache automatically creates a missing page, it must mark it as read (because, page is ready in the block's buffer now, does not need to be read from the disk).