MySQL Bugs: #42185: Falcon crashes during large transaction, assert(source->useCount >= 2)

Bug #42185	Falcon crashes during large transaction, assert(source->useCount >= 2)
Submitted:	18 Jan 2009 6:49	Modified:	15 May 2009 13:40
Reporter:	Mark Callaghan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:	6.0.8	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	crash, delete, F_RECORD TREE

Description:
I run the server with default values for falcon except:
falcon_record_memory_max=500M
falcon_page_cache_size=500M

Put 2M rows into a table that looks like this:
| T1    | CREATE TABLE `T1` (
  `i` int(11) NOT NULL,
  `j` int(11) DEFAULT NULL,
  PRIMARY KEY (`i`)
) ENGINE=Falcon DEFAULT CHARSET=latin1 |

Then do 'delete from T1' and get this crash:

[Falcon] Error: assertion (record->useCount >= 2) failed at line 3573 in file Table.cpp

/usr/sbin/mysqld(my_print_stacktrace+0x24) [0x902dd4]
/usr/sbin/mysqld(handle_segfault+0x340) [0x4abe50]
/usr/grte/v1/lib64/libpthread.so.0 [0x2b876204f550]
/usr/grte/v1/lib64/libpthread.so.0(raise+0x2b) [0x2b876204f43b]
/usr/sbin/mysqld(Error::error(char const*, ...)+0xea) [0x68485a]
/usr/sbin/mysqld(Table::fetchForUpdate(Transaction*, Record*, bool)+0x309) [0x69d369]
/usr/sbin/mysqld(StorageDatabase::nextRow(StorageTable*, int, bool)+0x51) [0x68a6b1]
/usr/sbin/mysqld(StorageInterface::rnd_next(unsigned char*)+0x45) [0x6811c5]
/usr/sbin/mysqld(rr_sequential(READ_RECORD*)+0x15) [0x57fa55]
/usr/sbin/mysqld(mysql_delete(THD*, TABLE_LIST*, Item*, st_sql_list*, unsigned long long, unsigned long long, bool)+0x73f) [0x539b6f]
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x355c) [0x4bc2bc]
/usr/sbin/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x1f4) [0x4bdce4]
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x8d2) [0x4be5d2]
/usr/sbin/mysqld(do_command(THD*)+0xca) [0x4bf11a]
/usr/sbin/mysqld(handle_one_connection+0xc7) [0x4b3247]
/usr/grte/v1/lib64/libpthread.so.0 [0x2b876204909a]
/usr/grte/v1/lib64/libc.so.6(clone+0x72) [0x2b8762eca3d2]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x1987db8 = delete from T1
thd->thread_id=1
thd->killed=NOT_KILLED

How to repeat:
see above

I could not repeat crash with this testcase

Attachment: bug42185.sql (text/x-sql), 609 bytes.

I couldn't repeat instead I got the below error:

mysql> delete from T1;
ERROR 1296 (HY000): Got error 305 'record memory is exhausted' from Falcon
mysql> 

notice that the I used a fresh table install. I noticed in the past Falcon's crashes using an installation already crashed for another tests.

The database files are new with 6.0.8. I have not been able to reproduce this. There was a different crash prior to this -- http://bugs.mysql.com/?id=42184

Thank you for the feedback. So with a non-crashed database you wasn't able to repeat?. Thanks in advance.

This error occurred once. I have not tried to reproduce it.

I can reproduce this crash by running RQG with this script;

perl runall.pl 
--grammar=conf/repeatable_read.yy 
--gendata=conf/transactions.zz 
--threads=4 
--queries=500000 
--duration=2400 
--validator=RepeatableRead 
--basedir=C:\Work\bzr\Chg-10\mysql-6.0-falcon-team 
--engine=Falcon

at the same time that I run this from a mysql client;

create table doubleme (a char(100)) engine = falcon;
insert into doubleme values(repeat('a',95));
insert into doubleme values(repeat('a',95));
insert into doubleme values(repeat('a',95));
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
insert into doubleme (select * from doubleme);
create table copyme as select * from doubleme;

The last create of copyme sends the scavenger and/or backlogger into overtime and a lockRecord is deleted immediately after or a successful call to Table::insertIntoTree at this code;

if (insertIntoTree(lockRecord, source, recordNumber))
	{
	transaction->addRecord(lockRecord);
	lockRecord->release(REC_HISTORY);

	ASSERT(source->useCount >= 2);

	return source;
	}

A background process replaced the lock record from the record tree and cleared it somehow just after being successfully inserted into the tree.  Is this a backlog or scavenge issue?  Looking for the 'other guy'.

I could reproduce the assertion with DBT2 on walldorf three times:
[Falcon] Error: assertion (source->useCount >= 2) failed at line 3558 in file Table.cpp

090322  6:18:26 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388608
read_buffer_size=1048576
max_used_connections=64
max_threads=1601
thread_count=64
connection_count=64
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4943687 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x1fde33f0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x4e0660e8 thread_stack 0x40000
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(my_print_stacktrace+0x2e) [0xabfe3e]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_segfault+0x336) [0x65cc36]
/lib64/libpthread.so.0 [0x372ae0de60]
/lib64/libpthread.so.0(raise+0x2d) [0x372ae0dd3d]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Error::error(char const*, ...)+0xf9) [0x8684d9]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Table::fetchForUpdate(Transaction*, Record*, bool)+0x31d) [0x88115d]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageDatabase::nextIndexed(StorageTable*, void*, int, bool)+0x64) [0x86f654]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageInterface::index_next(unsigned char*)+0xb4) [0x863d34]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageInterface::read_range_first(st_key_range const*, st_key_range const*, bool, bool)+0x74) [0x861c64]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handler::multi_range_read_next(char**)+0x92) [0x768112]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(QUICK_RANGE_SELECT::get_next()+0x36) [0x74fe76]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld [0x764a0d]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, enum_duplicates, bool)+0x7d0) [0x70c660]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_execute_command(THD*)+0x3fdc) [0x670bac]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x23c) [0x672d4c]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x98d) [0x673afd]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(do_command(THD*)+0xc4) [0x674684]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_one_connection+0xf4) [0x666174]
/lib64/libpthread.so.0 [0x372ae062e7]

Philip reported that this problem cannot be repeated after the changes to create a
TransactionState object and the addition of the CycleManager to protect doomed records until after all temporary stack pointers to them have gone away.

Hakan has reproduced this failure in DBT2;

[Falcon] Error: assertion (source->useCount >= 2) failed at line 3580 in
file Table.cpp

Error::error(char const*, ...)
Table::fetchForUpdate(Transaction*, Record*, bool)
StorageDatabase::nextIndexed(StorageTable*, void*, int, bool)
StorageInterface::index_next(unsigned char*)
StorageInterface::read_range_first(st_key_range const*, st_key_ra
nge const*, bool, bool)
handler::multi_range_read_next(char**)
QUICK_RANGE_SELECT::get_next()
mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*,
unsigned int, st_order*, unsigned long long, enum_duplicates,
bool)
mysql_execute_command(THD*)

The error that occurs is here;

Record* Table::fetchForUpdate(Transaction* transaction, Record* source, bool usingIndex)
{
. . .
	case CommittedVisible:
		{
. . .
		if (insertIntoTree(lockRecord, source, recordNumber))
			{
			transaction->addRecord(lockRecord);
			lockRecord->release(REC_HISTORY);
			ASSERT(source->useCount >= 2);
. . .

The call to insertIntoTree has just succeeded in adding the lockRecord to the record tree slot.  So it has a priorRecord pointer to the previous base record, which is committed and visible.  The assert assumes that the useCount should be more than 2.  One count it lockRecord->priorRecord and another should be the calling function which found it with Table::fetch().  So why is it that this useCount < 2?

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

http://lists.mysql.com/commits/71557

3105 Kevin Lewis 2009-04-07

Bug#41661 - Table::insertIntoTree could fail during a Table::insert. This should not happen since Dbb::insertStub should have returned a record number that was free. Nobody else should have been messing with that slot in the RecordLeaf.

The problem was in RecordLeaf::retireRecords. This function called
RecordVersion::retire which then called expungeRecord. This makes the record number free to use again by any other thread. It did this with an exclusive lock on the RecordLeaf, but Table::insertIntoTree often does not get any lock on RecordLeaf when it calls RecordLeaf::store. So a free record number is
reused in Table::insert before the slot has been set to NULL by RecordLeaf::retireRecords. That is why Table::insertIntoTree fails. It expects the slot to be NULL.

Another problem fixed is that the first of the two Table::insert functions did not check the return value of insertIntoTree at all. This would have propogated the error further since there would be nothing in the RecordTree slot after the insert.

Another problem fixed in RecordLeaf::retireRecords is with this statement;
if (record && recordScavenge->canBeRetired(record))
The problem is with incrementing the 'count' integer. If it is zero after stepping through the RecordLeaf, then it is assumed that the RecordLeaf is empty and can be retired. But this 'if' statement gets the wrong result. If the RecordLeaf is full of newer records that cannot yet be retired, then this count will remain zero. The emptySections bitmap will be set for this RecordLeaf. And later, in Table::retireRecords, it will attempt to
retire these sections. But finding records in them, it will just waste a lot of time while blocking all table access.

I changed this code to be accurate in determining if the RecordLeaf is indead empty before setting the bit in Table::emptySections.

Record::retire and RecordVersion::retire now only retire records. The more thorough tests for whether a record can be retired is collected into
RecordScavenge::canBeRetired.

Also, instead of just setting the slot in RecordLeaf to NULL directly, I now use a COMPARE_EXCHANGE_POINTER() just like every other place where this location is changed. Then if it fails for any reason, we will not retire that record. One legitimate reason that this could fail is when this record is determined that it canBeRetired, but immediately after that, a client thread does a fetchForUpdate, adding a lockRecord.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71567

3106 Kevin Lewis	2009-04-07
      Bug#42185 - After further consideration, I believe this is an invalid assert.  If two or more client threads read the same base record at once and on of them is able to lock it first, the others will still have useCounts on that source record.

Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:vvaintroub@mysql.com-20090407182504-bf7g93lm5tdsmnyh) (merge vers: 6.0.11-alpha) (pib:6)

A note have been added to the 6.0.11 changelog: 

The Falcon CycleManager has been updated, which addresses a number of issues when examining records in various transaction states and their visisbility/isolation in relation to other threads.