Bug #46975 Thaw crashes with error SerialLog::findWindowGivenOffset -- can't find window
Submitted: 28 Aug 2009 13:36 Modified: 26 May 2010 17:46
Reporter: Olav Sandstå Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.12-alpha OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any

[28 Aug 2009 13:36] Olav Sandstå
Description:
When running the Random Query Generator test falcon_backlog is sometimes fails with the following failure:

SerialLog::findWindowGivenOffset -- can't find window
Serial Log Windows:
  Window #465- blocks 872:904 len 1047816, in-use 0, useCount 1, buffer (nil), v
irtoff 487587840
  Window #466- blocks 905:917 len 1047126, in-use 0, useCount 0, buffer (nil), v
irtoff 488636416
  Window #467- blocks 918:918 len 1047791, in-use 0, useCount 0, buffer (nil), v
irtoff 489684992

<--- cut --->

5200, virtoff 598736896
  Window #572- blocks 1098:1110 len 1048158, in-use 0, useCount 2, buffer 0xaa86
5200, virtoff 599785472
  Window #573- blocks 1111:1111 len 8362, in-use 1, useCount 0, buffer 0xa976520
0, virtoff 600834048
[Falcon] Error: assertion (false) failed at line 947 in file SerialLog.cpp

The call stack from the crashing thread looks like:

#7  0x0856a5b0 in Error::debugBreak () at Error.cpp:94
#8  0x0856a62c in Error::error (
    string=0x89f30d0 "assertion (%s) failed at line %d in file %s\n")
    at Error.cpp:71
#9  0x0856a6d1 in Error::assertionFailed (text=0x8a08759 "false", 
    fileName=0x8a085b2 "SerialLog.cpp", line=947) at Error.cpp:78
#10 0x0862abfd in SerialLog::findWindowGivenOffset (this=0xb7314160, 
    offset=483187822) at SerialLog.cpp:947
#11 0x0861e5a0 in SRLUpdateRecords::thaw (this=0x9ec9c3d8, record=0x9b174ae8, 
    bytesReallocated=0x9ec9ca54) at SRLUpdateRecords.cpp:75
#12 0x0859b464 in Transaction::thaw (this=0xb70bf438, record=0x9b174ae8, 
    bytesReallocated=0x9ec9ca54) at Transaction.cpp:575
#13 0x086006e9 in RecordVersion::thaw (this=0x9b174ae8)
    at RecordVersion.cpp:429
#14 0x08600406 in RecordVersion::hasRecord (this=0x9b174ae8)
    at RecordVersion.cpp:562
#15 0x0858d57f in Table::fetchForUpdate (this=0xb73352c8, 
    transaction=0xb713e7a8, source=0x9b174ae8, usingIndex=true)
    at Table.cpp:3595
#16 0x0857787a in StorageDatabase::nextIndexed (this=0xb6ff0158, 
    storageTable=0x9bd57a00, recordBitmap=0xb70025b8, recordNumber=54170, 
    lockForUpdate=true) at StorageDatabase.cpp:449
#17 0x0857e06d in StorageTable::nextIndexed (this=0x9bd57a00, recordNumber=0, 
    lockForUpdate=true) at StorageTable.cpp:190
#18 0x0855dbe4 in StorageInterface::index_next (this=0xa7c2e98, 
    buf=0xa88ccd8 "ÿÿ") at ha_falcon.cpp:1871
#19 0x0855ca60 in StorageInterface::read_range_first (this=0xa7c2e98, 
    start_key=0xa7c2f38, end_key=0xa7c2f48, eq_range_arg=true, sorted=true)
    at ha_falcon.cpp:1772
#20 0x084067ed in handler::multi_range_read_next (this=0xa7c2e98, 
    range_info=0x9ec9cd00) at handler.cc:4351
#21 0x0855d8c0 in StorageInterface::multi_range_read_next (this=0xa7c2e98, 
    rangeInfo=0x9ec9cd00) at ha_falcon.cpp:2000
#22 0x083e82cf in QUICK_RANGE_SELECT::get_next (this=0xa872c40)
    at opt_range.cc:8579
#23 0x083fffb5 in rr_quick (info=0x9ec9ced8) at records.cc:322
#24 0x08390701 in mysql_update (thd=0xa823478, table_list=0xa835020, 
    fields=@0xa82485c, values=@0xa824a90, conds=0xa8357d0, order_num=0, 
    order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, 
    ignore=false) at sql_update.cc:584
#25 0x082d652d in mysql_execute_command (thd=0xa823478) at sql_parse.cc:3142
#26 0x082dc74f in mysql_parse (thd=0xa823478, 
    inBuf=0xa834df8 "UPDATE `table65536_falcon_int_autoinc` AS X SET `char_255_latin1_not_null` = 'xxdkiugrzoisrxgzccjuhsoqljhntazrccvfqaxbvx' WHERE `pk` = 54170", length=140, found_semicolon=0x9ec9de80) at sql_parse.cc:5979
#27 0x082dd8d1 in dispatch_command (command=COM_QUERY, thd=0xa823478, 
    packet=0xa828f39 "UPDATE `table65536_falcon_int_autoinc` AS X SET `char_255_latin1_not_null` = 'xxdkiugrzoisrxgzccjuhsoqljhntazrccvfqaxbvx' WHERE `pk` = 54170", packet_length=140) at sql_parse.cc:1064
#28 0x082deb62 in do_command (thd=0xa823478) at sql_parse.cc:746
#29 0x082cb507 in handle_one_connection (arg=0xa823478) at sql_connect.cc:1146
#30 0x0089a45b in start_thread () from /lib/libpthread.so.0
#31 0x007f1c4e in clone () from /lib/libc.so.6

How to repeat:
Seen a few times when running the RQG test falcon_backlog on the latest source version from mysql-6.0-falcon-team tree.
[28 Aug 2009 16:25] Kevin Lewis
This is a new code path that showed an existing inconsistency.  SerialLog::findWindowGivenOffset() was written to assert if the given virtualOffset was not found.  But it was called assuming that it could fail.  The reason that it never failed before was that RecordVersion::thaw only tried to thaw from the serial log if Transaction::writePending was false.  

But the following recent change by me for Bug#46377 changed the priority of where we try to thaw by eliminating the writePending check.  The idea is that if there is a virtualOffset in the record, always try to thaw from the serial log.  This is the right approach since what is in the page is only one version of the record, but all versions are in the serial log.

So the correct fix is to delete the assert and allow SerialLog::findWindowGivenOffset() to just return NULL if the window is no longer found.  And all callers should handle that NULL, as SRLUpdateRecords::thaw() would have done if the assert was not there.

This is the change that exposed the bug;

=== modified file 'storage/falcon/RecordVersion.cpp'
--- a/storage/falcon/RecordVersion.cpp	2009-07-14 14:09:58 +0000
+++ b/storage/falcon/RecordVersion.cpp	2009-08-20 14:59:55 +0000
@@ -419,17 +419,17 @@ char *RecordVersion::thaw()
 	if (recordData != recordIsChilled)
 		return recordData;
 
-	// First, try to thaw from the serial log. If transaction->writePending is 
-	// true, then the record data can be restored from the serial log. If writePending
-	// is false, then the record data has been written to the data pages.
-	
-	Transaction *trans = findTransaction();
-	if (trans)
+	// First, try to thaw from the serial log if there is a virtual offset.
+
+	if (virtualOffset)
 		{
-		if (trans->writePending && virtualOffset)
+		Transaction *trans = findTransaction();
+		if (trans)
+			{
 			recordData = trans->thaw(this, &bytesReallocated);
 
-		trans->release();
+			trans->release();
+			}
 		}
[28 Aug 2009 17:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/81902

2775 Kevin Lewis	2009-08-28
      Bug#46975 - SerialLog::findWindowGivenOffset() is expected by its callers to return NULL if the virtualOffset is not within the current portion of the serial log.  But instead, it asserts that the woindow could not be read.  This assert needs to be deleted.
      
      In addition, if this happens for a deferredIndex, SRLUpdateIndex::thaw() should issue a FATAL message instead of continuing with an unthawed index.