MySQL Bugs: #39446: Maria deadlock in pagecache

Bug #39446	Maria deadlock in pagecache_read
Submitted:	15 Sep 2008 8:49	Modified:	28 Oct 2008 12:07
Reporter:	Philip Stoev	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Maria storage engine	Severity:	S1 (Critical)
Version:	6.0-bzr	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When executing an select/insert/update/delete scenario, maria deadlocked with all threads waiting on this thread:

#0  0x0000003ba880a4a6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000a0af9b in pagecache_read (pagecache=0x19d02c0, file=0x7762310, pageno=0, level=0, buff=0x2aaac42820d0 "П\2179д╙*",
    type=PAGECACHE_PLAIN_PAGE, lock=PAGECACHE_LOCK_LEFT_UNLOCKED, page_link=0x4c6f1838) at ma_pagecache.c:2653
#2  0x0000000000a356fa in _ma_scan_block_record (info=0x2aaac43ab730, record=0x2aaac439d208 "Ъ", record_pos=0, skip_deleted=<value optimized out>)
    at ma_blockrec.c:5239
#3  0x0000000000a0f4cd in ha_maria::rnd_next (this=0x2aaac439d018, buf=0x2aaac439d208 "Ъ") at ha_maria.cc:1990
#4  0x000000000072e6d1 in rr_sequential (info=0x2aaac404fbf8) at records.cc:385
#5  0x00000000006adcf9 in sub_select (join=0x7860f50, join_tab=0x2aaac404fb70, end_of_records=5) at sql_select.cc:13706
#6  0x00000000006ae22d in do_select (join=0x7860f50, fields=0x7871158, table=0x0, procedure=0x0) at sql_select.cc:13453
#7  0x00000000006c0e7d in JOIN::exec (this=0x7860f50) at sql_select.cc:2827
#8  0x00000000006bccfe in mysql_select (thd=0x2aaab81e4cb0, rref_pointer_array=0x7871238, tables=<value optimized out>, wild_num=<value optimized out>,
    fields=<value optimized out>, conds=0x79f6a70, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=0, result=0x7860c28,
    unit=0x7870bf0, select_lex=0x7871050) at sql_select.cc:3017
#9  0x00000000006c276c in handle_select (thd=0x2aaab81e4cb0, lex=0x7870b50, result=0x7860c28, setup_tables_done_option=0) at sql_select.cc:300
#10 0x0000000000651267 in execute_sqlcom_select (thd=0x2aaab81e4cb0, all_tables=0x79f6368) at sql_parse.cc:4962
#11 0x0000000000657c09 in mysql_execute_command (thd=0x2aaab81e4cb0) at sql_parse.cc:2167
#12 0x000000000079974f in mysql_open_cursor (thd=0x2aaab81e4cb0, flags=<value optimized out>, result=<value optimized out>, pcursor=0x7baa508)
    at sql_cursor.cc:171
#13 0x00000000007a65dd in sp_cursor::open (this=<value optimized out>, thd=0xfffffffffffffffc) at sp_rcontext.cc:478
#14 0x000000000079d90b in sp_instr_copen::exec_core (this=0x2aaac4528320, thd=0x0, nextp=0x4c6f3328) at sp_head.cc:3486
#15 0x00000000007a2257 in sp_lex_keeper::reset_lex_and_exec_core (this=0x79f6c98, thd=0x2aaab81e4cb0, nextp=0x4c6f3328, open_tables=false,
    instr=0x2aaac4528320) at sp_head.cc:2721
#16 0x00000000007a243e in sp_instr_copen::execute (this=<value optimized out>, thd=0x2aaab81e4cb0, nextp=<value optimized out>) at sp_head.cc:3459
#17 0x00000000007a05b2 in sp_head::execute (this=0x7731838, thd=0x2aaab81e4cb0) at sp_head.cc:1238
#18 0x00000000007a0c8b in sp_head::execute_procedure (this=0x7731838, thd=0x2aaab81e4cb0, args=0x2aaab81e70b8) at sp_head.cc:1969
#19 0x00000000006571a4 in mysql_execute_command (thd=0x2aaab81e4cb0) at sql_parse.cc:4303
#20 0x0000000000659e5f in mysql_parse (thd=0x2aaab81e4cb0, inBuf=0x7b00b28 "call slct_tb0_logs()", length=20, found_semicolon=0x4c6f50a0)
    at sql_parse.cc:5932
#21 0x000000000065a9de in dispatch_command (command=COM_QUERY, thd=0x2aaab81e4cb0, packet=0x2aaab8220521 "call slct_tb0_logs()",
    packet_length=<value optimized out>) at sql_parse.cc:1134
#22 0x000000000064dee2 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1153
#23 0x0000003ba88062f7 in start_thread () from /lib64/libpthread.so.0
#24 0x0000003ba80ce85d in clone () from /lib64/libc.so.6

How to repeat:
If this is repeatable, a test case will be provided.

Suggested fix:
It may be the right time to implement either a run-time deadlock detector which prints stalled threads, or a tool that processes lock/unlock trace information from a test run and produces a list of loops in the lock graph. Falcon and Innodb have such functionality. If Maria also has something similar, please advise on how to enable it.

Hi Philip. Regarding the deadlock detector, Monty has this on his table. But it can be something else than a deadlock. For example, BUG#39210 is not about a deadlock but a thread which never woken up from a wait, though it should. Not all infinite waits are deadlocks. By the way, thanks for the daily bug reports!

To fix hangs we need a repeatable test case;  A stack trace will very seldom be enough to find and fix the problem.  Stack traces for all threads may be of some help but we would greatly prefer to be able to run something that will crash eventually.

Regarding the deadlock handler;  I plan to shortly finish a detector of wrong mutex usage (using mutex in different order) but this is not likely to help fixing this problem.

This bug is now repeatable on pushbuild2 with the test maria_bulk_insert:

see 

http://clustra.norway.sun.com/~bteam/pb2/web.py?action=archive_download&archive_id=103915&...

Please let me know if a better test case is needed and I will try to provide.

It is the same bug with bug#40308. As far as fic commit marked with bug number 40308 I put this one as duplicate and will move statuses to that bug.