Bug #44161 Falcon crashes in MemMgr::blockSize(data.record)
Submitted: 8 Apr 2009 17:01 Modified: 15 May 2009 14:47
Reporter: Kevin Lewis Email Updates:
Status: Closed
Category:Server: Falcon Severity:S3 (Non-critical)
Version:6.0.10-team OS:Any
Assigned to: Kevin Lewis Target Version:6.0.11
Tags: F_CHILL THAW
Triage: Triaged: D1 (Critical)

[8 Apr 2009 17:01] Kevin Lewis
Description:
Random Query Generator Crashed here;

int Record::getDataMemUsage(void)
{
	return (data.record == NULL ? 0 : MemMgr::blockSize(data.record));
}

with this call stack;

MemMgr::blockSize (object=0xeeeeeeee) at MemMgr.cpp:1238
Record::getDataMemUsage (this=0xa5d9b308) at Record.cpp:1006
Record::getMemUsage (this=0xa5d9b308) at Record.cpp:1012
RecordLeaf::retireRecords () at RecordLeaf.cpp:208
RecordGroup::retireRecords () at RecordGroup.cpp:135
RecordGroup::retireRecords () at RecordGroup.cpp:135
Table::retireRecords () at Table.cpp:1981
Database::retireRecords () at Database.cpp:1967
Database::scavengeRecords (, forced=true)at Database.cpp:1881
Database::scavenge (, forced=true) at Database.cpp:1811

http://loki41.norway.sun.com/~bteam/pb2/web_loki41.py?action=archive_download&archive_id=4...

This means that a record is being chilled at the exact instant it is being retired.  This
is a hard to reproduce timing event.  But it shows that there is not enough protection
during chilling.

How to repeat:
runall.pl \ 
--basedir=/use/your/own/basedir \ 
--mysqld=--loose-innodb-lock-wait-timeout=5 \ 
--mysqld=--table-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-debug-mask=2 \ 
--mysqld=--skip-safemalloc \ 
--grammar=conf/falcon_backlog.yy \ 
--gendata=conf/falcon_backlog.zz \ 
--mysqld=--transaction-isolation=REPEATABLE-READ \ 
--mysqld=--falcon-record-memory-max=10M \ 
--mysqld=--falcon-record-chill-threshold=1K \ 
--mysqld=--falcon-page-cache-size=128M \ 
--reporters=Deadlock,ErrorLog,Backtrace,Recovery,WinPackage,Shutdown \ 
--duration=1200 \ 
--queries=100000 \ 
--vardir=/use/your/own/vardirs \ 
--mysqld=--log-output=file \ 
--engine=falcon

Suggested fix:
Not known yet.
[9 Apr 2009 15:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71801

3115 Kevin Lewis	2009-04-09
      Bug#44161 & Bug#43145 - These two bugs are two different things that can happen
when a data buffer suddenly disapears after being chilled to the serial log while another
thread is following that pointer.  To prevent this, data.record pointers within Record
objects are now sent to purgatory for the CycleManager to release instead of being freed
immediately.  
      
      Two new purgatory lists are added to the CycleManager since the data buffer can
come in two forms; an array of Value objects and a char string.  Each object in purgatory
is now called a zombie.  Record::deleteData is expanded to delete immediately when called
from the Record destructor, or queueForDelete when called from anywhere else, such as
SRLUpdateRecords::chill or a failed Table::update.
[14 Apr 2009 21:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72067

3123 Kevin Lewis	2009-04-14
      Bug#44161 - Made a mistake in the previous patch for Record::deleteData by putting
data.record = NULL before the calls to DELETE_RECORD and queueForDelete.
[15 Apr 2009 16:59] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version
source revid:hky@sun.com-20090414213212-ibcjs12ic1v2h0e5) (merge vers: 6.0.11-alpha)
(pib:6)
[15 May 2009 14:47] MC Brown
Internal/test fix. No changelog entry required.