Bug #42131 falcon_backlog test crashes in Record::getEncodedValue
Submitted: 15 Jan 2009 13:08 Modified: 26 May 2010 17:46
Reporter: Olav Sandstå Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.10-alpha OS:Linux
Assigned to: Christopher Powers CPU Architecture:Any
Tags: F_CHILL THAW, pb2
Triage: Triaged: D1 (Critical)

[15 Jan 2009 13:08] Olav Sandstå
Description:
Running the randon-query-generator test callesfalcon_backlog sometimes leads to Falcon crashing with the following error written in the log:

[Falcon] Error: assertion (vector[index] < size) failed at line 668 in file Record.cpp

The call stack looks like this:

Program terminated with signal 6, Aborted.
#0  0x004c7402 in __kernel_vsyscall ()
#0  0x004c7402 in __kernel_vsyscall ()
#1  0x0089f067 in pthread_kill () from /lib/libpthread.so.0
#2  0x08813092 in my_write_core (sig=6) at stacktrace.c:309
#3  0x082b9e4c in handle_segfault (sig=6) at mysqld.cc:2672
#4  <signal handler called>
#5  0x004c7402 in __kernel_vsyscall ()
#6  0x008a2181 in raise () from /lib/libpthread.so.0
#7  0x0854db98 in Error::debugBreak () at Error.cpp:94
#8  0x0854dc14 in Error::error (
    string=0x89c4fd8 "assertion (%s) failed at line %d in file %s\n")
    at Error.cpp:71
#9  0x0854dcb9 in Error::assertionFailed (
    text=0x89d5670 "vector[index] < size", fileName=0x89d55f1 "Record.cpp", 
    line=668) at Error.cpp:78
#10 0x085e3d25 in Record::getEncodedValue (this=0xa63dc980, fieldId=5, 
    value=0xa878bbb8) at Record.cpp:668
#11 0x085e4bd9 in Record::getRawValue (this=0xa63dc980, fieldId=5, 
    value=0xa878bbb8) at Record.cpp:370
#12 0x085e4ff3 in Record::getValue (this=0xa63dc980, fieldId=5, 
    value=0xa878bbb8) at Record.cpp:322
#13 0x0855c619 in StorageTable::compareKey (this=0xa5676a00, 
    key=0xbb8ad58 "n¶", keyLength=4) at StorageTable.cpp:442
#14 0x08547a6c in StorageInterface::index_next (this=0xbba1c10, 
    buf=0xbba1ff8 "ÿÿ") at ha_falcon.cpp:1731
#15 0x08546b36 in StorageInterface::read_range_first (this=0xbba1c10, 
    start_key=0xbba1cac, end_key=0xbba1cbc, eq_range_arg=true, sorted=false)
    at ha_falcon.cpp:1620
#16 0x083f8441 in handler::multi_range_read_next (this=0xbba1c10, 
    range_info=0xa878bd70) at handler.cc:4314
#17 0x085476ae in StorageInterface::multi_range_read_next (this=0xbba1c10, 
    rangeInfo=0xa878bd70) at ha_falcon.cpp:1842
#18 0x083da061 in QUICK_RANGE_SELECT::get_next (this=0xbb38410)
    at opt_range.cc:8542
#19 0x083f1c09 in rr_quick (info=0xa878bf48) at records.cc:322
#20 0x08383c68 in mysql_update (thd=0xbb73210, table_list=0xbb84d50, 
    fields=@0xbb74844, values=@0xbb74a74, conds=0xbb854c0, order_num=0, 
    order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, 
    ignore=false) at sql_update.cc:582
#21 0x082cd6a5 in mysql_execute_command (thd=0xbb73210) at sql_parse.cc:3005
#22 0x082d3505 in mysql_parse (thd=0xbb73210, 
    inBuf=0xbb84b48 "UPDATE `table65536_falcon_int_autoinc` AS X SET `int_unsigned` = 'fmburjskoectlfrwkzbgqdogrjskssbawbjoujay' WHERE `pk` = 46702", 
    length=126, found_semicolon=0xa878ce80) at sql_parse.cc:5735
#23 0x082d44a1 in dispatch_command (command=COM_QUERY, thd=0xbb73210, 
    packet=0xbb78c89 "UPDATE `table65536_falcon_int_autoinc` AS X SET `int_unsigned` = 'fmburjskoectlfrwkzbgqdogrjskssbawbjoujay' WHERE `pk` = 46702", 
    packet_length=126) at sql_parse.cc:1007
#24 0x082d576c in do_command (thd=0xbb73210) at sql_parse.cc:690
#25 0x082c2a63 in handle_one_connection (arg=0xbb73210) at sql_connect.cc:1145
#26 0x0089a45b in start_thread () from /lib/libpthread.so.0
#27 0x007f1c4e in clone () from /lib/libc.so.6

This crash is in the following code in Record::getEncodedValue(), line 668 in Record.cpp:

			ASSERT(vector[index] < size);

NOTE: A crash in the same assert was reported in Bug#36734 but with a different call stack and produced with a different test load. This bug was closed as not-reproducable with the given test load. 

How to repeat:
It should be reproducable by running the falcon_backlog test several times.

Suggested fix:
Falcon should not assert.
[16 Jan 2009 13:38] Olav Sandstå
This assert is also hit frequently when running falcon_chill_thaw test but with a different call stack:

#9  0x0854dda5 in Error::assertionFailed (
    text=0x89d5c70 "vector[index] < size", fileName=0x89d5bf1 "Record.cpp", 
    line=668) at Error.cpp:78
#10 0x085e4229 in Record::getEncodedValue (this=0xa6202c78, fieldId=9, 
    value=0xb4be03dc) at Record.cpp:668
#11 0x085e50dd in Record::getRawValue (this=0xa6202c78, fieldId=9, 
    value=0xb4be03dc) at Record.cpp:370
#12 0x085e54f7 in Record::getValue (this=0xa6202c78, fieldId=9, 
    value=0xb4be03dc) at Record.cpp:322
#13 0x085bd712 in Index::makeKey (this=0xb701d990, record=0xa6202c78, 
    key=0xb4be0794) at Index.cpp:607
#14 0x085bd88a in Index::duplicateKey (this=0xb701d990, key=0xb4be3228, 
    record=0xa6202c78) at Index.cpp:669
#15 0x085bda2a in Index::garbageCollect (this=0xb701d990, leaving=0xa6f660a0, 
    staying=0xa6202c78, transaction=0x0, quiet=false) at Index.cpp:627
#16 0x0856f0f1 in Table::garbageCollect (this=0xb7362740, leaving=0xa6f660a0, 
    staying=0xa6202c78, transaction=0x0, quiet=false) at Table.cpp:2134
#17 0x085e6342 in RecordLeaf::pruneRecords (this=0xb706dba0, table=0xb7362740, 
    base=0, recordScavenge=0xb4be5ea4) at RecordLeaf.cpp:159
#18 0x085667e7 in Table::pruneRecords (this=0xb7362740, 
    recordScavenge=0xb4be5ea4) at Table.cpp:1898
#19 0x0859a2b2 in Database::pruneRecords (this=0xb71ef640, 
    recordScavenge=0xb4be5ea4) at Database.cpp:1848
#20 0x0859a3cc in Database::scavengeRecords (this=0xb71ef640)
    at Database.cpp:1798
#21 0x0859a6e5 in Database::scavenge (this=0xb71ef640) at Database.cpp:1759
#22 0x0859a7df in Database::scavengerThreadMain (this=0xb71ef640)
    at Database.cpp:1923
#23 0x0859a88b in Database::scavengerThreadMain (database=0xb71ef640)
    at Database.cpp:1912
#24 0x08576459 in Thread::thread (this=0xb6ff65b0) at Thread.cpp:167
#25 0x08576669 in Thread::thread (parameter=0xb6ff65b0) at Thread.cpp:146
#26 0x0089a45b in start_thread () from /lib/libpthread.so.0
#27 0x007f1c4e in clone () from /lib/libc.so.6
Using host libthread_db library "/lib/libthread_db.so.1".
[17 Jan 2009 8:17] Christopher Powers
The problem occurs during record scavenging, and is caused by an active record being chilled while its prior record chain is being pruned.

Index::garbageCollect() accepts two record chains: 'leaving' and 'staying'. Record versions in the 'leaving' chain have no transactions associated with them, and are not candidates for chilling. Record versions in the 'staying' chain are associated with transactions, and could be chilled during garbage collection.

Table::garbageCollect() uses syncPrior to protect the record chain being pruned, but the chill process does not.

The solution is for the chill process to get an exclusive syncPrior lock before deleting the record data of a record being chilled.
[17 Jan 2009 8:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63498

2964 Christopher Powers	2009-01-17
      Bug #42131 "falcon_backlog test crashes in Record::getEncodedValue"
      
      Transaction:chill() must get an exclusive syncPrior lock before deleting record data.
[17 Jan 2009 8:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63500

2965 Christopher Powers	2009-01-17
      Bug #42131  	falcon_backlog test crashes in Record::getEncodedValue
      
      Handle recRollback in Table::checkUniqueRecordVersion()
[21 Jan 2009 0:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63646

2968 Christopher Powers	2009-01-20
      Bug #42131 falcon_backlog test crashes in Record::getEncodedValue
      
      Record chill now exclusively locks the record version chain before deleting
      the record memory.
[13 Feb 2009 7:25] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:lars-erik.bjork@sun.com-20090121081224-6vr43golf3j01o9m) (merge vers: 6.0.10-alpha) (pib:6)
[13 Feb 2009 9:32] Philip Stoev
This assertion continues to happen in 6.0-falcon-team tree, test falcon_backlog:

http://clustra.norway.sun.com/~bteam/pb2/web.py?action=archive_download&archive_id=322511&...
[14 Mar 2009 11:57] Kevin Lewis
This looks like the same problem as Bug#38375.  The call to StorageTable::compareKey is done without a call to Record::hasRecord() in order to assure that a recently chilled record is thawed first.
[20 Mar 2009 16:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/69934

3069 Christopher Powers	2009-03-20
      Bug #42131, "falcon_backlog test crashes in Record::getEncodedValue"
      Bug #38375, "Falcon crash in Record::isNull"
      
      For sequential operations such as index and table scans, StorageTable
      caches the current record in StorageTable::record. In rare cases, it is
      possible that StorageTable::record may have been chilled. This is a
      problem if an attempt is made to examine the contents of the record.
      
      Calls from the server directly into the storage interface, such as 
      StorageInterface::rnd_next, bypass the internal checks in Falcon that
      ensure a record is thawed before examining it.To avoid crashes in this
      circumstance, StorageTable::compareKey() now thaws StorageTable::record
      if necessary before examining it.
      
      Note that it is not necessary to always thaw StorageTable::record, e.g. in
      StorageTable::setRecord(), because the contents of the record is not always
      examined as it is in compareKey().
      modified:
        storage/falcon/StorageTable.cpp
[28 Mar 2009 18:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/70771

3085 Christopher Powers	2009-03-28
      Bug #42131, "falcon_backlog test crashes in Record::getEncodedValue"
      Bug #38375, "Falcon crash in Record::isNull"
      
      For sequential operations such as index and table scans, StorageTable
      caches the current record in StorageTable::record. In rare cases, it is
      possible that StorageTable::record may have been chilled. This is a
      problem if an attempt is made to examine the contents of the record.
      
      Calls from the server directly into the storage interface, such as
      StorageInterface::rnd_next, bypass the internal checks in Falcon that
      ensure a record is thawed before examining it.To avoid crashes in this
      circumstance, StorageTable::compareKey() now thaws StorageTable::record
      if necessary before examining it.
      
      Note that it is not necessary to always thaw StorageTable::record, for
      example in StorageTable::setRecord(), because the contents of the
      record is not always examined as it is in compareKey().
      
      If a thaw is necessary, and if the thaw fails, then compareKey() throws
      an INTERNAL_ERROR exception.
      modified:
        storage/falcon/StorageTable.cpp
        storage/falcon/StorageTableShare.h
[30 Mar 2009 11:55] Philip Stoev
Unfortunately the vector assertion happened on the falcon_chill_thaw_compare test after Chris' fix was pushed.
[30 Mar 2009 13:05] Kevin Lewis
Philip, let's open a new bug for this assert in the chill-thaw test with a new call stack.  Backlogging is no longer activated.  Let the short desc reflect the specific assert that is failing.
[2 Apr 2009 17:39] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090402144811-yc5kp8g0rjnhz7vy) (version source revid:christopher.powers@sun.com-20090328181017-hadhr0dt1a389301) (merge vers: 6.0.11-alpha) (pib:6)
[19 May 2009 6:41] Olav Sandstå
A similar crash with the same call stack have started to appear. A new bug has been opened for handling this, see Bug#44946.