Bug #37823 Innodb hang at row/row0vers.c line 443
Submitted: 2 Jul 2008 20:10 Modified: 5 Jun 2009 18:15
Reporter: Philip Stoev Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:6.0 OS:Any
Assigned to: Philip Stoev CPU Architecture:Any

[2 Jul 2008 20:10] Philip Stoev
Description:
When executing a relatively simple mix of DML queries, the server hanged and Innodb reported as follows:

InnoDB: Warning: a long semaphore wait:
--Thread 2825309072 has waited at row/row0vers.c line 443 for 241.00 seconds the semaphore:
S-lock on RW-latch at 0xadb83878 created in file trx/trx0purge.c line 212
number of readers 4, waiters flag 0
Last time read locked in file row/row0vers.c line 443
Last time write locked in file trx/trx0purge.c line 1043
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
InnoDB: ###### Diagnostic info printed to the standard error stream

and so forth until:

InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
080702 18:47:57  InnoDB: Assertion failure in thread 2851875728 in file srv/srv0srv.c line 2131
InnoDB: We intentionally generate a memory trap.

How to repeat:
A core file is available, thread stack dump will be attached shortly.
[2 Jul 2008 20:14] Philip Stoev
Thread stacks for bug 7823

Attachment: bug37823.threads (text/plain), 52.30 KiB.

[2 Jul 2008 21:54] Ken Jacobs
Do you have the test case scripts, Phillip?
[3 Jul 2008 16:19] Philip Stoev
Ken Jacobs:

I am afraid I tried reproducing this situation however was unable to. The test framework that arrived to this deadlock is unfortunately not 100% deterministic.

Can this bug be fixed based on core file and thread backtraces alone?
[7 Jul 2008 16:31] Heikki Tuuri
Philip,

please attach the entire .err log.

There are several threads waiting for memory allocation mutex(es). I wonder if they, too, are hung.

This thread might be in an infinite loop:

Thread 13 (process 9970):
#0  0x08564762 in trx_undo_rec_get_pars (undo_rec=0xa8718608 "\025s\034\036\017", type=0xa8926770, cmpl_info=0xa8926744, updated_extern=0xa8926740, 
    undo_no=0xa8926768, table_id=0xa8926760) at trx/trx0rec.c:277
#1  0x08564c75 in trx_undo_prev_version_build (index_rec=0xad605a56 "\200", index_mtr=0xa8926abc, rec=0xa870b2fe "\200", index=0xadb8e268, 
    offsets=0xa8926928, heap=0xa87185c8, old_vers=0xa89267e4) at trx/trx0rec.c:1332
#2  0x085563d2 in row_vers_build_for_consistent_read (rec=0xad605a56 "\200", mtr=0xa8926abc, index=0xadb8e268, offsets=0xa8926924, view=0xadb9b268, 
    offset_heap=0xa8926ab8, in_heap=0xa186da0, old_vers=0xa8926920) at row/row0vers.c:482
#3  0x0854ce9d in row_sel_build_prev_vers_for_mysql (read_view=0xadb9b268, clust_index=0xadb8e268, prebuilt=0xadbb1c68, rec=0xad605a56 "\200", 
    offsets=0xa8926924, offset_heap=0xa8926ab8, old_vers=0xa8926920, mtr=0xa8926abc) at row/row0sel.c:2774
#4  0x0854e978 in row_search_for_mysql (buf=0xa019cf0 "\020\024", mode=1, prebuilt=0xadbb1c68, match_mode=0, direction=1) at row/row0sel.c:4111

The .err log will tell what has happened.

Regards,

Heikki
[7 Jul 2008 16:41] Philip Stoev
master.err for bug 37823

Attachment: bug37823.err (application/octet-stream, text), 15.50 KiB.

[14 Aug 2008 13:49] Heikki Tuuri
Unfortunately, also the diagnostic thread was hung, and the .err log does not contain any interesting data :(.

We need more .err logs.
[16 Dec 2008 14:14] Axel Schwenke
Heikki,

I just reported bug #41506 which looks similar (other version but probably the same latch). In bug รค41506 we have a full error log.
[5 Jun 2009 18:15] Philip Stoev
Unfortunately this situation happened only once and no further debug information is available. The existing information was not sufficient to fix the issue.