| Bug #42080 | falcon_record_cache_memory_leak2 crashes in RecordVersion::scavengeSavepoint | ||
|---|---|---|---|
| Submitted: | 13 Jan 2009 12:49 | Modified: | 7 May 2009 16:57 |
| Reporter: | Olav Sandstå | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
| Version: | 6.0.10-alpha | OS: | Any |
| Assigned to: | Kevin Lewis | CPU Architecture: | Any |
| Tags: | F_SCAVENGER | ||
[13 Jan 2009 12:49]
Olav Sandstå
[13 Jan 2009 13:03]
Vladislav Vaintroub
SIGBUS suggests alignment issues, i.e this should be specific to alignment-sensitive architectures only (or what falcon suppports, probably SPARC only)
[13 Jan 2009 13:10]
Olav Sandstå
Alignment issues was also my initial thought when I saw the SIGBUS error and the core file and call stack I used when filing this report was from running on a 64 bit SPARC server. But I have seen the same test fail also on Windows running on x86 but I do not have any call stack from that crash so it might be a different issue.
[13 Jan 2009 13:22]
Olav Sandstå
According to the core file the crash occurs in RecordVersion.cpp on line 235:
// Remove prior record versions assigned to the savepoint being released
for (; rec && rec->getTransactionId() == targetTransactionId && rec->getSavePointId() >= oldestActiveSavePointId;
rec = rec->getPriorVersion())
The debugger claims that the rec pointer contains:
(dbx) print -r *rec
*rec = {
Record::data = {
Record::record = 0x103324210 ""
}
Record::useCount = 1
Record::format = 0x1033bb290
Record::recordNumber = 1
Record::size = 54243984
Record::generation = 4008636142ULL
Record::highWater = -4370
Record::encoding = 'î'
Record::state = 'î'
Record::active = 'î'
}
Both size, generation, and highWater seems to have "strange" values?
I do not think this is a "record version" object but if I "force" this pointer to be a RecordVersion instead of a Record it contains:
(dbx) print -r *((RecordVersion*)rec)
*(struct RecordVersion *) rec = {
RecordVersion::Record::data = {
RecordVersion::Record::record = 0x103324210 ""
}
RecordVersion::Record::useCount = 1
RecordVersion::Record::format = 0x1033bb290
RecordVersion::Record::recordNumber = 1
RecordVersion::Record::size = 54243984
RecordVersion::Record::generation = 4008636142ULL
RecordVersion::Record::highWater = -4370
RecordVersion::Record::encoding = 'î'
RecordVersion::Record::state = 'î'
RecordVersion::Record::active = 'î'
RecordVersion::priorVersion = 0xeeeeeeeeeeeeeeee
RecordVersion::virtualOffset = 17216961135462248174ULL
RecordVersion::transaction = 0xeeeeeeeeeeeeeeee
RecordVersion::nextInTrans = 0xeeeeeeeeeeeeeeee
RecordVersion::prevInTrans = 0xeeeeeeeeeeeeeeee
RecordVersion::transactionId = 4008636142U
RecordVersion::savePointId = -286331154
RecordVersion::superceded = true
}
[13 Jan 2009 13:24]
Vladislav Vaintroub
Right. The crash location on Windows seems to be different. The callstack extracted from the mysqld.err is 007CA490 mysqld.exe!RecordLeaf::pruneRecords()[recordleaf.cpp:149] FFFFFFFF ntdll.dll!LZDone() 0079B744 mysqld.exe!Database::pruneRecords()[database.cpp:1848] While callstack is not perfect(s. second frame), possibly due to outdated OS/dbghelp library on the machine, the top frame usually can be trusted. So I think this crashes due to invalid prune pointer in recordleaf.cpp:149 ASSERT(prune->useCount == 1);
[13 Jan 2009 17:53]
MySQL Verification Team
I ran several times on Ubuntu 64-bit and I couldn't repeat the crash: Removing Stale Files Creating Directories Installing Master Database ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- falcon.falcon_record_cache_memory_leak2 [ pass ] 83108 -------------------------------------------------------
[13 Jan 2009 18:54]
Kevin Lewis
I have repeated it twice now. The crash I am seeing is in RecordLeaf::pruneRecords() on this line;
ASSERT(prune->useCount == 1);
The variable prune == 0xeeeeeeee which means it is a pointer to invalid memory. Above this, the pointer oldestVisible looks like the Record Pointer that Olav described;
(RecordVersion *)oldestVisible 0x066f7548
Record
data {record=0x067083f8 }
useCount 0x066f0048
format 0x066f0048
{version=0x00000000 maxId=0x00000000 length=0x00000070 ...}
recordNumber 0x00000000 int
size 0xeeeeeeee int
generation 0xeeeeeeeeeeeeeeee unsigned __int64
highWater 0xeeee short
encoding 0xee 'î' unsigned char
state 0xee 'î' unsigned char
active 0xee 'î' unsigned char
priorVersion 0x00000000
virtualOffset 0xeeeeeeeeeeeeeeee unsigned __int64
transaction 0xeeeeeeee
nextInTrans 0xeeeeeeee
prevInTrans 0xeeeeeeee
transactionId 0xeeeeeeee
savePointId 0xeeeeeeee
superceded true
So even though the call stack is different, a record version is being released while a pointer still exists to it.
[13 Jan 2009 20:43]
Olav Sandstå
This test now failed a second time on the same machine as my first crash. This time the crash is similar to what Kevin has seen. This crash has the following call stack: [5] RecordLeaf::pruneRecords(this = 0x1014129c0, table = 0x1013532b0, base = 0, recordScavenge = 0xffffffff7d3fb6c8), line 149 in "RecordLeaf.cpp" [6] Table::pruneRecords(this = 0x1013532b0, recordScavenge = 0xffffffff7d3fb6c8), line 1896 in "Table.cpp" [7] Database::pruneRecords(this = 0x1011a21b8, recordScavenge = 0xffffffff7d3fb6c8), line 1848 in "Database.cpp" [8] Database::scavengeRecords(this = 0x1011a21b8), line 1798 in "Database.cpp" [9] Database::scavenge(this = 0x1011a21b8), line 1753 in "Database.cpp" [10] Database::scavengerThreadMain(this = 0x1011a21b8), line 1924 in "Database.cpp" [11] Database::scavengerThreadMain(database = 0x1011a21b8), line 1913 in "Database.cpp" [12] Thread::thread(this = 0x1013e1b90), line 167 in "Thread.cpp" [13] Thread::thread(parameter = 0x1013e1b90), line 146 in "Thread.cpp" and the value of the prune pointer is: (dbx) print prune prune = 0xeeeeeeeeeeeeeeee
[14 Jan 2009 14:43]
Olav Sandstå
Just for having it recorded in the bug report:
The following crash occurred twice when running the falcon_bug_34351_C test. The call stack is a bit different but it seems to be related to the same issue as originally reported in this bug report:
---- called from signal handler with signal 11 (SIGSEGV) ------
[8] Transaction::committedBefore(this = (nil), transactionId = 73526U), line 1535 in "Transaction.cpp"
[9] RecordScavenge::inventoryRecord(this = 0xfffffd7ffe781a10, record = 0x38e9350), line 132 in "RecordScavenge.cpp"
[10] RecordLeaf::pruneRecords(this = 0x4796168, table = 0x176f2b8, base = 195, recordScavenge = 0xfffffd7ffe781a10), line 137 in "RecordLeaf.cpp"
[11] RecordGroup::pruneRecords(this = 0x42034a8, table = 0x176f2b8, base = 1, recordScavenge = 0xfffffd7ffe781a10), line 122 in "RecordGroup.cpp"
[12] RecordGroup::pruneRecords(this = 0x4203158, table = 0x176f2b8, base = 0, recordScavenge = 0xfffffd7ffe781a10), line 122 in "RecordGroup.cpp"
[13] Table::pruneRecords(this = 0x176f2b8, recordScavenge = 0xfffffd7ffe781a10), line 1896 in "Table.cpp"
[14] Database::pruneRecords(this = 0x15be1b8, recordScavenge = 0xfffffd7ffe781a10), line 1848 in "Database.cpp"
[15] Database::scavengeRecords(this = 0x15be1b8), line 1798 in "Database.cpp"
[16] Database::scavenge(this = 0x15be1b8), line 1753 in "Database.cpp"
[17] Database::scavengerThreadMain(this = 0x15be1b8), line 1924 in "Database.cpp"
[18] Database::scavengerThreadMain(database = 0x15be1b8), line 1913 in "Database.cpp"
[19] Thread::thread(this = 0x17fdcb8), line 167 in "Thread.cpp"
[20] Thread::thread(parameter = 0x17fdcb8), line 146 in "Thread.cpp"
[21] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff11fd7b
[22] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff11ffb0
The crash occurs in the call to Transaction::committedBefore() which seems to be called with a null pointer as transaction object. This call is done from the following code in RecordScavenge.cpp line 132:
if ( !recVer->transaction
|| recVer->transaction->committedBefore(oldestActiveTransaction)
The debugger says:
(dbx) print recVer->transaction
recVer->transaction = (nil)
This indicates that some other thread is working in parallel on this record version?
The complete record version object:
(dbx) print -r *recVer
*recVer = {
RecordVersion::Record::data = {
RecordVersion::Record::record = (nil)
}
RecordVersion::Record::useCount = 1
RecordVersion::Record::format = 0x182ee88
RecordVersion::Record::recordNumber = 19551
RecordVersion::Record::size = 112
RecordVersion::Record::generation = 2ULL
RecordVersion::Record::highWater = -13108
RecordVersion::Record::encoding = '\0'
RecordVersion::Record::state = '\001'
RecordVersion::Record::active = '\001'
RecordVersion::priorVersion = 0x38f8350
RecordVersion::virtualOffset = 21668164ULL
RecordVersion::transaction = (nil)
RecordVersion::nextInTrans = (nil)
RecordVersion::prevInTrans = (nil)
RecordVersion::transactionId = 78340U
RecordVersion::savePointId = 0
RecordVersion::superceded = false
}
[15 Jan 2009 0:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63294 2959 Kevin Lewis 2009-01-14 Bug#42080, Bug#41831 In both of these bugs, the scavenger was pruning records that it should not have. The record version chosen to start pruning is returned from RecordScavenge::inventoryRecord(). This function is improved so that only the oldest visible record is returned. In addition, Recordversion::committedbefore() is added to simplify the code and read RecordVersion::transaction only once since it can be set to null at any time.
[13 Feb 2009 7:25]
Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:klewis@mysql.com-20090114223344-og8t9pw03ckyexdw) (merge vers: 6.0.10-alpha) (pib:6)
[7 May 2009 16:57]
MC Brown
Internal/test fix. No changelog entry required.
