Bug #42080 | falcon_record_cache_memory_leak2 crashes in RecordVersion::scavengeSavepoint | ||
---|---|---|---|
Submitted: | 13 Jan 2009 12:49 | Modified: | 7 May 2009 16:57 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.10-alpha | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_SCAVENGER |
[13 Jan 2009 12:49]
Olav Sandstå
[13 Jan 2009 13:03]
Vladislav Vaintroub
SIGBUS suggests alignment issues, i.e this should be specific to alignment-sensitive architectures only (or what falcon suppports, probably SPARC only)
[13 Jan 2009 13:10]
Olav Sandstå
Alignment issues was also my initial thought when I saw the SIGBUS error and the core file and call stack I used when filing this report was from running on a 64 bit SPARC server. But I have seen the same test fail also on Windows running on x86 but I do not have any call stack from that crash so it might be a different issue.
[13 Jan 2009 13:22]
Olav Sandstå
According to the core file the crash occurs in RecordVersion.cpp on line 235: // Remove prior record versions assigned to the savepoint being released for (; rec && rec->getTransactionId() == targetTransactionId && rec->getSavePointId() >= oldestActiveSavePointId; rec = rec->getPriorVersion()) The debugger claims that the rec pointer contains: (dbx) print -r *rec *rec = { Record::data = { Record::record = 0x103324210 "" } Record::useCount = 1 Record::format = 0x1033bb290 Record::recordNumber = 1 Record::size = 54243984 Record::generation = 4008636142ULL Record::highWater = -4370 Record::encoding = 'î' Record::state = 'î' Record::active = 'î' } Both size, generation, and highWater seems to have "strange" values? I do not think this is a "record version" object but if I "force" this pointer to be a RecordVersion instead of a Record it contains: (dbx) print -r *((RecordVersion*)rec) *(struct RecordVersion *) rec = { RecordVersion::Record::data = { RecordVersion::Record::record = 0x103324210 "" } RecordVersion::Record::useCount = 1 RecordVersion::Record::format = 0x1033bb290 RecordVersion::Record::recordNumber = 1 RecordVersion::Record::size = 54243984 RecordVersion::Record::generation = 4008636142ULL RecordVersion::Record::highWater = -4370 RecordVersion::Record::encoding = 'î' RecordVersion::Record::state = 'î' RecordVersion::Record::active = 'î' RecordVersion::priorVersion = 0xeeeeeeeeeeeeeeee RecordVersion::virtualOffset = 17216961135462248174ULL RecordVersion::transaction = 0xeeeeeeeeeeeeeeee RecordVersion::nextInTrans = 0xeeeeeeeeeeeeeeee RecordVersion::prevInTrans = 0xeeeeeeeeeeeeeeee RecordVersion::transactionId = 4008636142U RecordVersion::savePointId = -286331154 RecordVersion::superceded = true }
[13 Jan 2009 13:24]
Vladislav Vaintroub
Right. The crash location on Windows seems to be different. The callstack extracted from the mysqld.err is 007CA490 mysqld.exe!RecordLeaf::pruneRecords()[recordleaf.cpp:149] FFFFFFFF ntdll.dll!LZDone() 0079B744 mysqld.exe!Database::pruneRecords()[database.cpp:1848] While callstack is not perfect(s. second frame), possibly due to outdated OS/dbghelp library on the machine, the top frame usually can be trusted. So I think this crashes due to invalid prune pointer in recordleaf.cpp:149 ASSERT(prune->useCount == 1);
[13 Jan 2009 17:53]
MySQL Verification Team
I ran several times on Ubuntu 64-bit and I couldn't repeat the crash: Removing Stale Files Creating Directories Installing Master Database ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- falcon.falcon_record_cache_memory_leak2 [ pass ] 83108 -------------------------------------------------------
[13 Jan 2009 18:54]
Kevin Lewis
I have repeated it twice now. The crash I am seeing is in RecordLeaf::pruneRecords() on this line; ASSERT(prune->useCount == 1); The variable prune == 0xeeeeeeee which means it is a pointer to invalid memory. Above this, the pointer oldestVisible looks like the Record Pointer that Olav described; (RecordVersion *)oldestVisible 0x066f7548 Record data {record=0x067083f8 } useCount 0x066f0048 format 0x066f0048 {version=0x00000000 maxId=0x00000000 length=0x00000070 ...} recordNumber 0x00000000 int size 0xeeeeeeee int generation 0xeeeeeeeeeeeeeeee unsigned __int64 highWater 0xeeee short encoding 0xee 'î' unsigned char state 0xee 'î' unsigned char active 0xee 'î' unsigned char priorVersion 0x00000000 virtualOffset 0xeeeeeeeeeeeeeeee unsigned __int64 transaction 0xeeeeeeee nextInTrans 0xeeeeeeee prevInTrans 0xeeeeeeee transactionId 0xeeeeeeee savePointId 0xeeeeeeee superceded true So even though the call stack is different, a record version is being released while a pointer still exists to it.
[13 Jan 2009 20:43]
Olav Sandstå
This test now failed a second time on the same machine as my first crash. This time the crash is similar to what Kevin has seen. This crash has the following call stack: [5] RecordLeaf::pruneRecords(this = 0x1014129c0, table = 0x1013532b0, base = 0, recordScavenge = 0xffffffff7d3fb6c8), line 149 in "RecordLeaf.cpp" [6] Table::pruneRecords(this = 0x1013532b0, recordScavenge = 0xffffffff7d3fb6c8), line 1896 in "Table.cpp" [7] Database::pruneRecords(this = 0x1011a21b8, recordScavenge = 0xffffffff7d3fb6c8), line 1848 in "Database.cpp" [8] Database::scavengeRecords(this = 0x1011a21b8), line 1798 in "Database.cpp" [9] Database::scavenge(this = 0x1011a21b8), line 1753 in "Database.cpp" [10] Database::scavengerThreadMain(this = 0x1011a21b8), line 1924 in "Database.cpp" [11] Database::scavengerThreadMain(database = 0x1011a21b8), line 1913 in "Database.cpp" [12] Thread::thread(this = 0x1013e1b90), line 167 in "Thread.cpp" [13] Thread::thread(parameter = 0x1013e1b90), line 146 in "Thread.cpp" and the value of the prune pointer is: (dbx) print prune prune = 0xeeeeeeeeeeeeeeee
[14 Jan 2009 14:43]
Olav Sandstå
Just for having it recorded in the bug report: The following crash occurred twice when running the falcon_bug_34351_C test. The call stack is a bit different but it seems to be related to the same issue as originally reported in this bug report: ---- called from signal handler with signal 11 (SIGSEGV) ------ [8] Transaction::committedBefore(this = (nil), transactionId = 73526U), line 1535 in "Transaction.cpp" [9] RecordScavenge::inventoryRecord(this = 0xfffffd7ffe781a10, record = 0x38e9350), line 132 in "RecordScavenge.cpp" [10] RecordLeaf::pruneRecords(this = 0x4796168, table = 0x176f2b8, base = 195, recordScavenge = 0xfffffd7ffe781a10), line 137 in "RecordLeaf.cpp" [11] RecordGroup::pruneRecords(this = 0x42034a8, table = 0x176f2b8, base = 1, recordScavenge = 0xfffffd7ffe781a10), line 122 in "RecordGroup.cpp" [12] RecordGroup::pruneRecords(this = 0x4203158, table = 0x176f2b8, base = 0, recordScavenge = 0xfffffd7ffe781a10), line 122 in "RecordGroup.cpp" [13] Table::pruneRecords(this = 0x176f2b8, recordScavenge = 0xfffffd7ffe781a10), line 1896 in "Table.cpp" [14] Database::pruneRecords(this = 0x15be1b8, recordScavenge = 0xfffffd7ffe781a10), line 1848 in "Database.cpp" [15] Database::scavengeRecords(this = 0x15be1b8), line 1798 in "Database.cpp" [16] Database::scavenge(this = 0x15be1b8), line 1753 in "Database.cpp" [17] Database::scavengerThreadMain(this = 0x15be1b8), line 1924 in "Database.cpp" [18] Database::scavengerThreadMain(database = 0x15be1b8), line 1913 in "Database.cpp" [19] Thread::thread(this = 0x17fdcb8), line 167 in "Thread.cpp" [20] Thread::thread(parameter = 0x17fdcb8), line 146 in "Thread.cpp" [21] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff11fd7b [22] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff11ffb0 The crash occurs in the call to Transaction::committedBefore() which seems to be called with a null pointer as transaction object. This call is done from the following code in RecordScavenge.cpp line 132: if ( !recVer->transaction || recVer->transaction->committedBefore(oldestActiveTransaction) The debugger says: (dbx) print recVer->transaction recVer->transaction = (nil) This indicates that some other thread is working in parallel on this record version? The complete record version object: (dbx) print -r *recVer *recVer = { RecordVersion::Record::data = { RecordVersion::Record::record = (nil) } RecordVersion::Record::useCount = 1 RecordVersion::Record::format = 0x182ee88 RecordVersion::Record::recordNumber = 19551 RecordVersion::Record::size = 112 RecordVersion::Record::generation = 2ULL RecordVersion::Record::highWater = -13108 RecordVersion::Record::encoding = '\0' RecordVersion::Record::state = '\001' RecordVersion::Record::active = '\001' RecordVersion::priorVersion = 0x38f8350 RecordVersion::virtualOffset = 21668164ULL RecordVersion::transaction = (nil) RecordVersion::nextInTrans = (nil) RecordVersion::prevInTrans = (nil) RecordVersion::transactionId = 78340U RecordVersion::savePointId = 0 RecordVersion::superceded = false }
[15 Jan 2009 0:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63294 2959 Kevin Lewis 2009-01-14 Bug#42080, Bug#41831 In both of these bugs, the scavenger was pruning records that it should not have. The record version chosen to start pruning is returned from RecordScavenge::inventoryRecord(). This function is improved so that only the oldest visible record is returned. In addition, Recordversion::committedbefore() is added to simplify the code and read RecordVersion::transaction only once since it can be set to null at any time.
[13 Feb 2009 7:25]
Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:klewis@mysql.com-20090114223344-og8t9pw03ckyexdw) (merge vers: 6.0.10-alpha) (pib:6)
[7 May 2009 16:57]
MC Brown
Internal/test fix. No changelog entry required.