Bug #28519 | falcon crash with signal 4 | ||
---|---|---|---|
Submitted: | 18 May 2007 14:42 | Modified: | 3 Dec 2007 14:21 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.1BK | OS: | Linux (suse 9.3 x86) |
Assigned to: | Christopher Powers | CPU Architecture: | Any |
[18 May 2007 14:42]
Shane Bester
[18 May 2007 16:18]
Kevin Lewis
I am assuming that this assert failed; ASSERT(record->state == recChilled); because it is the only assert in SRLUpdateRecords::thaw(RecordVersion *record) The problem is that two functions up the call stack in RecordVersion::thaw() we find; ASSERT(state == recChilled); So one moment the record was chilled, and the next moment it wasn't... We need some coordination between threads here.
[19 May 2007 9:42]
MySQL Verification Team
testcase. sorry for the random nature of it, but it seems you already have idea how to fix the bug?
Attachment: bug28519.c (text/plain), 7.40 KiB.
[25 Jun 2007 20:40]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29561 ChangeSet@1.2560, 2007-06-25 15:38:16-05:00, chris@terrazzo.site +12 -0 1. Bug#28519 "Falcon crash with signal 4" - Replaced state == chilled assertions in record thaw methods with a benign function return. 2. Reformatted chill/thaw debug output. 3. Check deferredIndex->index == null in Transaction::dropTable(). 4. Added RecordVersion::getRecordData() to encapsulate data.record. If necessary, getRecordData() peforms a record thaw before returning a pointer to the record data.
[25 Jun 2007 20:59]
Christopher Powers
The testcase exposes a race condition in record thawing such that one thread thaws a record out from underneath another thread performing a thaw: StorageInterface::rnd_next() StorageTable::next() StorageDatabase::nextRow() RecordVersion::fetchVersion() recChilled == TRUE here RecordVersion::thaw() Transaction::thaw() SRLUpdateRecords::thaw() ASSERT(state == recChilled) FAILS In this case, the record being thawed was thawed by another thread sometime between fetchVersion() and SRLUpdateRecords::thaw(). This isn't necessarily a bad thing, because the only operations during a thaw that must be serialized are, 1) activating the serial log window from which the record data will be restored, and 2) setting the data.record pointer after the data has been restored. Serial log window access is serialized by an exclusive lock, and changes to record.data are performed atomically via compare and exchange. So, if during a thaw the record is thawed on another thread, then the current thaw operation should simply return without error.
[26 Jun 2007 6:24]
Hakan Küçükyılmaz
I could successfully run a modified version of the testcase with 51 threads and 1800 seconds runtime. I also tried the original testcase with 25 threads and 5600 seconds runtime. The test does not crash Falcon anymore but it hangs sometimes.
[27 Nov 2007 8:42]
Hakan Küçükyılmaz
I successfully ran the test case for 5600 seconds.
[3 Dec 2007 14:21]
MC Brown
A note has been added to the 6.0.4 changelog: Under heavy load when updating Falcon tables, a race condition could occur that would ultimately result in a crash.