Bug #44161 Falcon crashes in MemMgr::blockSize(data.record)
Submitted: 8 Apr 2009 17:01 Modified: 15 May 2009 14:47
Reporter: Kevin Lewis Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.10-team OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_CHILL THAW
Triage: Triaged: D1 (Critical)

[8 Apr 2009 17:01] Kevin Lewis
Description:
Random Query Generator Crashed here;

int Record::getDataMemUsage(void)
{
	return (data.record == NULL ? 0 : MemMgr::blockSize(data.record));
}

with this call stack;

MemMgr::blockSize (object=0xeeeeeeee) at MemMgr.cpp:1238
Record::getDataMemUsage (this=0xa5d9b308) at Record.cpp:1006
Record::getMemUsage (this=0xa5d9b308) at Record.cpp:1012
RecordLeaf::retireRecords () at RecordLeaf.cpp:208
RecordGroup::retireRecords () at RecordGroup.cpp:135
RecordGroup::retireRecords () at RecordGroup.cpp:135
Table::retireRecords () at Table.cpp:1981
Database::retireRecords () at Database.cpp:1967
Database::scavengeRecords (, forced=true)at Database.cpp:1881
Database::scavenge (, forced=true) at Database.cpp:1811

http://loki41.norway.sun.com/~bteam/pb2/web_loki41.py?action=archive_download&archive_id=4...

This means that a record is being chilled at the exact instant it is being retired.  This is a hard to reproduce timing event.  But it shows that there is not enough protection during chilling.

How to repeat:
runall.pl \ 
--basedir=/use/your/own/basedir \ 
--mysqld=--loose-innodb-lock-wait-timeout=5 \ 
--mysqld=--table-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-debug-mask=2 \ 
--mysqld=--skip-safemalloc \ 
--grammar=conf/falcon_backlog.yy \ 
--gendata=conf/falcon_backlog.zz \ 
--mysqld=--transaction-isolation=REPEATABLE-READ \ 
--mysqld=--falcon-record-memory-max=10M \ 
--mysqld=--falcon-record-chill-threshold=1K \ 
--mysqld=--falcon-page-cache-size=128M \ 
--reporters=Deadlock,ErrorLog,Backtrace,Recovery,WinPackage,Shutdown \ 
--duration=1200 \ 
--queries=100000 \ 
--vardir=/use/your/own/vardirs \ 
--mysqld=--log-output=file \ 
--engine=falcon

Suggested fix:
Not known yet.
[9 Apr 2009 15:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71801

3115 Kevin Lewis	2009-04-09
      Bug#44161 & Bug#43145 - These two bugs are two different things that can happen when a data buffer suddenly disapears after being chilled to the serial log while another thread is following that pointer.  To prevent this, data.record pointers within Record objects are now sent to purgatory for the CycleManager to release instead of being freed immediately.  
      
      Two new purgatory lists are added to the CycleManager since the data buffer can come in two forms; an array of Value objects and a char string.  Each object in purgatory is now called a zombie.  Record::deleteData is expanded to delete immediately when called from the Record destructor, or queueForDelete when called from anywhere else, such as SRLUpdateRecords::chill or a failed Table::update.
[14 Apr 2009 21:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72067

3123 Kevin Lewis	2009-04-14
      Bug#44161 - Made a mistake in the previous patch for Record::deleteData by putting data.record = NULL before the calls to DELETE_RECORD and queueForDelete.
[15 Apr 2009 16:59] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:hky@sun.com-20090414213212-ibcjs12ic1v2h0e5) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 14:47] MC Brown
Internal/test fix. No changelog entry required.