Bug #39431 | Falcon assertion (bitNumber >= 0) failed in Bitmap::setSafe | ||
---|---|---|---|
Submitted: | 13 Sep 2008 8:40 | Modified: | 7 May 2009 14:59 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.7 | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_RECORD TREE |
[13 Sep 2008 8:40]
Philip Stoev
[7 Jan 2009 18:48]
Kevin Lewis
I can reproduce this regularly with RQG running combinations.yy. fetchForUpdate is passed a record in which the recordNumber has been set to -1. This eventually causes the assert in Bitmap::setSafe(). The record Number is only set to -1 when a new record is allocated and placed on the recordLeaf by Table::insert(). But the insert fails, probably in insertIndexes(). So the record is immediately taken off, garbageCollected, and the recordNumber is set to -1. But while it is there, another thread reads it as a candidate and sends it into fetchForUpdate.
[8 Jan 2009 15:42]
Kevin Lewis
What happens is that either void Table::insert(Transaction *transaction, int count, Field **fieldVector, Value **values) or uint Table::insert(Transaction *transaction, Stream *stream) tries to insert a record and calls bool Table::insert(Record * record, Record *prior, int recordNumber) to put this new record into the record tree. This inner Table::insert() uses Table::syncObject with both shared and exclusive locks so as not to lock the table too long. Once the record is visible in the record tree, another thread calls Table::fetchNext() which finds this record. Then the first thread has a problem, throws an exception which is caught in one of the outer two Table::insert functions, and then cleans up this record. It puts NULL back into the record tree, calls garbageCollect, expungeRecord, ect, and also sets the recordNumber to -1. In the mean time, StorageDatabase::nextRow() is sending this bad record to fetchForUpdate, which tries to put a lock record for recordNumber=-1 back into the recordtree. The -1 is caught in a call to BitMap::setSafe(). We cannot send this record into fetchForUpdate(). To prevent that, there are 3 posible fixes; 1) Expand the lock on Table::syncObject - This is not desirable because it will serialize access to the table longer. There is a reason that the inner Table::Insert() uses both shared locks and exclusive locks - for concurrency. 2) Looping in Table::fetchNext() until a record without recordNumber = -1 is found. This will not work always because the -1 could happen after the loop but before the call to fetchForUpdate(). 3) Make Table::fetchNext() can skip records that are not fully 'there' or fully inserted. A new state value can be added for Record::state similar to recDeleting called recInserting. If Table::fetchNext() finds a record with this state, it can just keep looking because that record wold not be visible anyway because it is not committed.
[8 Jan 2009 15:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/62711 2962 Kevin Lewis 2009-01-08 Bug#39431 - Adding new records to the records tree needs to be done with a minimum of locking. But an insert can fail after this and the record will need to be taken off. If the record is read in between by fetchNext(), it would be used inappropriately. Specifically, the recordNumber would be set to -1 which causes an assert in Bitmap::setSafe. Use a new Record::state called recInserting to avoid these records.
[9 Jan 2009 22:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/62904 2965 Kevin Lewis 2009-01-09 Bug#39431 - Rearranged the code so that the check for Record::state == recInserted happens only if Table::fetch() returned a non-null record. Previously, it would crash if records was NULL.
[10 Jan 2009 16:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/62927 2968 Kevin Lewis 2009-01-10 Bug#39431 - If the record number is not incremented before the continue, then a hang can occur because the Table::syncObject is not given up and the in-process insert cannot continue.
[13 Feb 2009 7:25]
Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:olav@sun.com-20090113103017-41jbad7qlvlwpwxh) (merge vers: 6.0.10-alpha) (pib:6)
[7 May 2009 14:59]
MC Brown
Internal fix. No changelog entry required.
[7 May 2009 15:02]
MC Brown
Internal fix. No changelog entry required.