MySQL Bugs: #44161: Falcon crashes in MemMgr::blockSize(data.record)

Bug #44161	Falcon crashes in MemMgr::blockSize(data.record)
Submitted:	8 Apr 2009 17:01	Modified:	15 May 2009 14:47
Reporter:	Kevin Lewis	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:	6.0.10-team	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	F_CHILL THAW

Description:
Random Query Generator Crashed here;

int Record::getDataMemUsage(void)
{
	return (data.record == NULL ? 0 : MemMgr::blockSize(data.record));
}

with this call stack;

MemMgr::blockSize (object=0xeeeeeeee) at MemMgr.cpp:1238
Record::getDataMemUsage (this=0xa5d9b308) at Record.cpp:1006
Record::getMemUsage (this=0xa5d9b308) at Record.cpp:1012
RecordLeaf::retireRecords () at RecordLeaf.cpp:208
RecordGroup::retireRecords () at RecordGroup.cpp:135
RecordGroup::retireRecords () at RecordGroup.cpp:135
Table::retireRecords () at Table.cpp:1981
Database::retireRecords () at Database.cpp:1967
Database::scavengeRecords (, forced=true)at Database.cpp:1881
Database::scavenge (, forced=true) at Database.cpp:1811

http://loki41.norway.sun.com/~bteam/pb2/web_loki41.py?action=archive_download&archive_id=4...

This means that a record is being chilled at the exact instant it is being retired.  This is a hard to reproduce timing event.  But it shows that there is not enough protection during chilling.

How to repeat:
runall.pl \ 
--basedir=/use/your/own/basedir \ 
--mysqld=--loose-innodb-lock-wait-timeout=5 \ 
--mysqld=--table-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-debug-mask=2 \ 
--mysqld=--skip-safemalloc \ 
--grammar=conf/falcon_backlog.yy \ 
--gendata=conf/falcon_backlog.zz \ 
--mysqld=--transaction-isolation=REPEATABLE-READ \ 
--mysqld=--falcon-record-memory-max=10M \ 
--mysqld=--falcon-record-chill-threshold=1K \ 
--mysqld=--falcon-page-cache-size=128M \ 
--reporters=Deadlock,ErrorLog,Backtrace,Recovery,WinPackage,Shutdown \ 
--duration=1200 \ 
--queries=100000 \ 
--vardir=/use/your/own/vardirs \ 
--mysqld=--log-output=file \ 
--engine=falcon

Suggested fix:
Not known yet.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71801

3115 Kevin Lewis	2009-04-09
      Bug#44161 & Bug#43145 - These two bugs are two different things that can happen when a data buffer suddenly disapears after being chilled to the serial log while another thread is following that pointer.  To prevent this, data.record pointers within Record objects are now sent to purgatory for the CycleManager to release instead of being freed immediately.  
      
      Two new purgatory lists are added to the CycleManager since the data buffer can come in two forms; an array of Value objects and a char string.  Each object in purgatory is now called a zombie.  Record::deleteData is expanded to delete immediately when called from the Record destructor, or queueForDelete when called from anywhere else, such as SRLUpdateRecords::chill or a failed Table::update.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72067

3123 Kevin Lewis	2009-04-14
      Bug#44161 - Made a mistake in the previous patch for Record::deleteData by putting data.record = NULL before the calls to DELETE_RECORD and queueForDelete.

Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:hky@sun.com-20090414213212-ibcjs12ic1v2h0e5) (merge vers: 6.0.11-alpha) (pib:6)

Internal/test fix. No changelog entry required.