Bug #39431 Falcon assertion (bitNumber >= 0) failed in Bitmap::setSafe
Submitted: 13 Sep 2008 8:40 Modified: 7 May 2009 14:59
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0.7 OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_RECORD TREE
Triage: Triaged: D1 (Critical)

[13 Sep 2008 8:40] Philip Stoev
Description:
When executing the iuds2 SystemQA test , Falcon asserted as follows:

#4  0x000000000087ef2f in Error::error (string=<value optimized out>) at Error.cpp:94
#5  0x000000000085c1b2 in Bitmap::setSafe (this=0x2aaaaadee080, bitNumber=<value optimized out>) at Bitmap.cpp:635
#6  0x000000000083f578 in Table::insert (this=0x2aaaaad7cbb8, record=0x2aaab20c3160, prior=0x2aaab20c2fe0, recordNumber=-1) at Table.cpp:1939
#7  0x0000000000845a19 in Table::fetchForUpdate (this=0x2aaaaad7cbb8, transaction=0x2aaab05fa888, source=<value optimized out>,
    usingIndex=<value optimized out>) at Table.cpp:3550
#8  0x0000000000831f00 in StorageDatabase::nextRow (this=<value optimized out>, storageTable=0x2aaab8c68028, recordNumber=23864, lockForUpdate=true)
    at StorageDatabase.cpp:288
#9  0x0000000000826cc9 in StorageInterface::rnd_next (this=0x2aaab935f388, buf=0x2aaab935f640 "ЯАж") at ha_falcon.cpp:593
#10 0x000000000072e6d1 in rr_sequential (info=0x4c3a6280) at records.cc:385
#11 0x00000000006d5728 in mysql_update (thd=0x2aaac43c36f0, table_list=0x11fddab8, fields=@0x2aaac43c56f8, values=@0x2aaac43c5af8, conds=0x2aaab8044660,
    order_num=<value optimized out>, order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:573
#12 0x0000000000657d85 in mysql_execute_command (thd=0x2aaac43c36f0) at sql_parse.cc:2998
#13 0x0000000000659e5f in mysql_parse (thd=0x2aaac43c36f0,
    inBuf=0x11fdd628 "UPDATE systest1.tb1_eng1 target\nSET f1 = @connection_id,\nf2 = @operation,\nf3 = ROUND(i1/@max_val,3),\nf4 = @my_now\nWHERE i1 IN (SELECT i1 FROM systest1.t1_tmp source WHERE pk = 4)", length=178, found_semicolon=0x4c3a80a0) at sql_parse.cc:5932
#14 0x000000000065a9de in dispatch_command (command=COM_QUERY, thd=0x2aaac43c36f0,
    packet=0x2aaac44d3b01 "UPDATE systest1.tb1_eng1 target\nSET f1 = @connection_id,\nf2 = @operation,\nf3 = ROUND(i1/@max_val,3),\nf4 = @my_now\nWHERE i1 IN (SELECT i1 FROM systest1.t1_tmp source WHERE pk = 4)", packet_length=<value optimized out>) at sql_parse.cc:1134
#15 0x000000000064dee2 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1153
#16 0x0000003ba88062f7 in start_thread () from /lib64/libpthread.so.0
#17 0x0000003ba80ce85d in clone () from /lib64/libc.so.6

[Falcon] Error: assertion (bitNumber >= 0) failed at line 635 in file Bitmap.cpp

The query was:

 UPDATE systest1.tb1_eng1 target
SET f1 = @connection_id,
f2 = @operation,
f3 = ROUND(i1/@max_val,3),
f4 = @my_now
WHERE i1 IN (SELECT i1 FROM systest1.t1_tmp source WHERE pk = 4)

How to repeat:
If this is repeatable, a test case will be provided. In the meantime, the core file and the binary are available for examination.
[7 Jan 2009 18:48] Kevin Lewis
I can reproduce this regularly with RQG running combinations.yy.  fetchForUpdate is passed a record in which the recordNumber has been set to -1.  This eventually causes the assert in Bitmap::setSafe().  The record Number is only set to -1 when a new record is allocated and placed on the recordLeaf by Table::insert().  But the insert fails, probably in insertIndexes(). So the record is immediately taken off, garbageCollected, and the recordNumber is set to -1.  But while it is there, another thread reads it as a candidate and sends it into fetchForUpdate.
[8 Jan 2009 15:42] Kevin Lewis
What happens is that either

   void Table::insert(Transaction *transaction, int count, Field **fieldVector, Value **values)

or

   uint Table::insert(Transaction *transaction, Stream *stream)

tries to insert a record and calls

   bool Table::insert(Record * record, Record *prior, int recordNumber)

to put this new record into the record tree.  This inner Table::insert() uses Table::syncObject with both shared and exclusive locks so as not to lock the table too long.  Once the record is visible in the record tree, another thread calls Table::fetchNext() which finds this record.  Then the first thread has a problem, throws an exception which is caught in one of the outer two Table::insert functions, and then cleans up this record.  It puts NULL back into the record tree, calls garbageCollect, expungeRecord, ect, and also sets the recordNumber to -1.  In the mean time, StorageDatabase::nextRow() is sending this bad record to fetchForUpdate, which tries to put a lock record for recordNumber=-1 back into the recordtree.  The -1 is caught in a call to BitMap::setSafe().

We cannot send this record into fetchForUpdate(). To prevent that, there are 3 posible fixes;

1)  Expand the lock on Table::syncObject - This is not desirable because it will serialize access to the table longer.  There is a reason that the inner Table::Insert() uses both shared locks and exclusive locks - for concurrency.

2)  Looping in Table::fetchNext() until a record without recordNumber = -1 is found.  This will not work always because the -1 could happen after the loop but before the call to fetchForUpdate().

3) Make Table::fetchNext() can skip records that are not fully 'there' or fully inserted.  A new state value can be added for Record::state similar to recDeleting called recInserting.  If Table::fetchNext() finds a record with this state, it can just keep looking because that record wold not be visible anyway because it is not committed.
[8 Jan 2009 15:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62711

2962 Kevin Lewis	2009-01-08
      Bug#39431 - Adding new records to the records tree needs 
      to be done with a minimum of locking.  But an insert 
      can fail after this and the record will need to be taken off.  
      If the record is read in between by fetchNext(), it would be 
      used inappropriately.  Specifically, the recordNumber would 
      be set to -1 which causes an assert in Bitmap::setSafe.
      Use a new Record::state called recInserting to avoid these 
      records.
[9 Jan 2009 22:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62904

2965 Kevin Lewis	2009-01-09
      Bug#39431 - Rearranged the code so that the check 
      for Record::state == recInserted happens only if 
      Table::fetch() returned a non-null record.  
      Previously, it would crash if records was NULL.
[10 Jan 2009 16:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62927

2968 Kevin Lewis	2009-01-10
      Bug#39431 - If the record number is not incremented before
      the continue, then a hang can occur because the 
      Table::syncObject is not given up and the in-process insert
      cannot continue.
[13 Feb 2009 7:25] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:olav@sun.com-20090113103017-41jbad7qlvlwpwxh) (merge vers: 6.0.10-alpha) (pib:6)
[7 May 2009 14:59] MC Brown
Internal fix. No changelog entry required.
[7 May 2009 15:02] MC Brown
Internal fix. No changelog entry required.