Bug #45746 | falcon_backlog test fails due to Gopher hitting "record memory is exhausted" | ||
---|---|---|---|
Submitted: | 25 Jun 2009 11:25 | Modified: | 26 May 2010 17:48 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0.12-alpha | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_MEMORY |
[25 Jun 2009 11:25]
Olav Sandstå
[25 Jun 2009 11:26]
Olav Sandstå
Repeatable as described by running falcon_backlog.
[1 Jul 2009 3:39]
Kevin Lewis
This bug where the gopher thread hits "record memory is exhausted" can also happen when running falcon_bug_36294-big test as shown in pushbuild. See bug#45845 for other problems with that test case.
[8 Jul 2009 19:25]
Kevin Lewis
How can a gopher thread get status 305? Gopher threads are never expected to run out of record memory. By design, they build pages from the contents of the serial log and do not muck with the record cache at all. Back on 5/19/2009, a change committed for Bug#44946 added a little function called Transaction::thawAll(). This is mucking with the record cache! Here was the reason explained in the patch; "In order to prevent 'unthawable' records, make sure all records attached to a transaction are thawed by the gopher thread when the transaction is fully complete. Do this by adding a new function; Transaction::thawAll()" This nasty little function is called by the gopher thread; SerialLogTransaction::commit() -> Transaction::fullyCommitted() -> Transaction::thawAll() Because it thaws every chilled record in the transaction after being completed, it ruins the effect of thawing in the first place! Woops! Huge insert transactions that are larger than the record cache can get committed, but then the gopher thread will crash with error 305 "record memory is exhausted" trying to thaw all those records. This function is still needed, but it does not need to thaw ALL records in the transaction. Once a record has been put into the page it cannot be thawed from the serial log. And if it has already been superceded, there will soon be no way to thaw the record, once that superceding record version gets completed into the data page. So only records that get superceeded between commit-time and complete-time really need to be thawed from the serial log before its transaction is completed. The record memory requirements of Transaction::thawAll can be greatly reduced by checking if the completed record has been superceded before thawing it. With this modification, the huge insert transactions in falcon_backlog and falcon_bug_36294-big.test may not actually have to thaw anything at complete time. But there is a complication to implementing this. RecordVersion::superceded is not very reliable. Most records on a long chain of priorVersions will still have superceded=false. See the comment in Bug#44946 on [20 May 6:23] for an example. An analysis of the where isSuperceded() is used reveals that what we really want to know is whether the record is the current base record, that is, the record at the top of the stack, the one currently attached to the record tree. In order to make this information reliable, it needs to be set and unset by RecordLeaf::store() to the records that ore being attached or detached from that position. So the following patch converts RecordVersion::superceded to RecordVersion::base along with their associated uses. Then it adds if (record->isBase()) to Transaction::thawAll(). If it is not the base record a complete time, it needs to be thawed while it can still be thawed from the serial log. Falcon uses CAS to make that switch. Setting one or two booleans after that is inherently non-atomic. So how can this boolean be considered reliable? Well, because this CAS is done within a transaction. Only one thread works on behalf of a transaction at a time, and once a transacton adds a base record to the top of a record chain, no other transaction can add another. So you cannot get two base records chained up concurrently, overlapping code paths. If the CAS succedes, then that thread can set RecordVersion::base=true immediately after that and trust that it remains that value. The thread for that transaction will be the only user of that boolean.
[8 Jul 2009 20:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/78242 2753 Kevin Lewis 2009-07-08 Bug#45746 - Gopher thread gets error 305. Transaction::thawAll was added for 44946 but it was too intrusive. It required a full thaw of all records in the transaction. This patch limits thawAll to only those records that have already been superceded by a newer change at 'complete time' when thawAll is called. Also, RecordVersion::superceded is converted to RecordVersion::base and made more reliable by updating it in RecordLeaf::store. @ storage/falcon/Record.cpp Bug#45746 - Convert 'superceded' to 'base' The value of 'base does not matter for a Record object since it is not assocviated with any transaction. @ storage/falcon/Record.h Bug#45746 - Convert 'superceded' to 'base' @ storage/falcon/RecordLeaf.cpp Bug#45746 - Set RecordVersion::base in RecordLeaf::store() so that it reliably reflects whether a record is currently the base record. @ storage/falcon/RecordVersion.cpp Bug#45746 - Convert 'superceded' to 'base'. Delete old places where superceded is set since 'base' is set in RecordLeaf::store(). @ storage/falcon/RecordVersion.h Bug#45746 - Convert 'superceded' to 'base'. @ storage/falcon/Table.cpp Bug#45746 - Convert 'superceded' to 'base'. Delete old places where superceded is set since 'base' is set in RecordLeaf::store(). @ storage/falcon/Transaction.cpp Bug#45746 - Convert 'superceded' to 'base'. Use RecordVersion::isBase() to limit the number of records thawed at complete time.
[14 Jul 2009 15:08]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/78652 2755 Kevin Lewis 2009-07-14 Bug#45746 - Gopher thread gets error 305. The previous patch for this bug made it highly unlikely for the gopher thread to get error 305-out-of-record-cache. But it is still possible since thawAll() still may need to muck with the record cache. So this patch makes the gopher thread retry the attempt to allocate a record another 10 time. Client threads which also get this error will not retry so they will probably rollback and free some memory. In thawAll(), the gopher will also give up if the transaction gets too old while this is happening. Then after 10 tries, which should take more than 22.5 seconds, The gopher will just give up trying to thaw that record, which may never be needed anyway. If it is needed, the client thread must be able to recognize an unthawable record. If thawAll does not thaw a record, it will remain thawable from the serial log until the serial log toggles so that the virtual offset is no longer around. So this patch uses two new indicators to identify an unthawable record. The first is a new state called recBuried which indicates that the record version has been replaced in the page. This is now set in thawAll(), during transaction completion. Then later, if a 'buried' record cannot be thawed from the serial log, RecordVersion::thaw() does not attempt to thaw it from the page. It just marks it as unthawable by setting Record::data.record = recordDataIsFreed. Then the client thread in fetchVersion() must throw an error indicating that a visible record cannot be thawed. @ storage/falcon/Record.cpp Bug#45746 - account for new state recBuried @ storage/falcon/Record.h Bug#45746 - new state recBuried @ storage/falcon/RecordVersion.cpp Bug#45746 - fetchVersion() should recognize an unthawable record It throws an internal error with an appropriate message. RecordVersion::thaw() now avoids thawing a record from the data page that has been superceded. @ storage/falcon/Table.cpp code cleanup @ storage/falcon/Transaction.cpp Bug#45746 - thawAll() is enhanced to attempt to thaw arecord 10 time if the error is 305-out-of-record-memory. It will also quit early if the transaction becomes too old. Then if 10 loops occur, it will just ignore that record, hoping that it is not actually needed before the serial log toggles and it becomes unthawable. thawAll also now marks prior records that have been superceded by these completed records as recBuried. the data page, or 'buried'.
[14 Jul 2009 20:11]
Kevin Lewis
Code Review in email and on the phone
[16 Jul 2009 12:39]
John Embretsen
Looking at current (since 2009-07-13) PB2 results for the falcon_backlog test, the test reports success, but in the log we find lots of "record memory is exhausted" error messages, like this one: # 02:24:26 Query: UPDATE `table65536_falcon_int_autoinc` AS X SET `pk` = 'hobinrwybizyppgwvpmwaodbmgjvnlcoclwwbxpggiorbbukjnjryagyjchtudhaifiukvdibhtpffdzxkzflnfwzlhvvyomkpizeglchdonjcxqgevbzwwrcgdafrbuqrdcncucquecwtmoxozkbk' WHERE `pk` = 14694 failed: 1296 Got error 305 'record memory is exhausted' from Falcon Is this as expected and acceptable? (I may have missed something in the discussions relating to this bug... ) Although certain statements fail due to lack of memory it seems that the client is able to continue until the end of the test, and the forced recovery succeeds...
[16 Jul 2009 13:24]
Kevin Lewis
This bug and patch was to avoid having the gopher thread suffer from 305-out-of-record-cache-memory errors. It seems like this is working. The 305 errors are now being reported by the client threads. A future patch which I am working on will re-enable backlogging. This should also allow this test to complete without 305 errors. See Bug#45845.