MySQL Bugs: #41661: Falcon assertion (ret) failed at line 3052 in file Table.cpp

Bug #41661	Falcon assertion (ret) failed at line 3052 in file Table.cpp
Submitted:	20 Dec 2008 22:04	Modified:	15 May 2009 13:40
Reporter:	Philip Stoev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0-falcon-team	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	F_RECORD TREE

Description:
When executing a transactional workload with the Random Query Generator, Falcon asserted as follows:

#7  0x00000000009ccc47 in Error::assertionFailed (text=0xe04267 "ret", fileName=0xe03145 "Table.cpp", line=3052) at Error.cpp:78
#8  0x000000000097d561 in Table::insert (this=0x7fbcfd3775f8, transaction=0x7fbcfd420bd8, stream=0x7fbcfd3aaa68) at Table.cpp:3052
#9  0x0000000000961419 in StorageDatabase::insert (this=0x7fbcfd028210, connection=0x7fbcfd083de0, table=0x7fbcfd3775f8, stream=0x7fbcfd3aaa68)
    at StorageDatabase.cpp:266
#10 0x00000000009685c3 in StorageTable::insert (this=0x7fbcfd3a54c8) at StorageTable.cpp:109
#11 0x000000000095a999 in StorageInterface::write_row (this=0x258dcf0, buff=0x258dfb0 "Ыэ") at ha_falcon.cpp:1132
#12 0x00000000008005e7 in handler::ha_write_row (this=0x258dcf0, buf=0x258dfb0 "Ыэ") at handler.cc:5369
#13 0x0000000000768f46 in write_record (thd=0x2546898, table=0x258d408, info=0x26032d0) at sql_insert.cc:1609
#14 0x0000000000769309 in select_insert::send_data (this=0x2603298, values=@0x7fbcf89ce358) at sql_insert.cc:3165
#15 0x0000000000732826 in end_send (join=0x7fbcf89cc940, join_tab=0x7fbcf89d1060, end_of_records=false) at sql_select.cc:14754
#16 0x000000000073e1dc in evaluate_join_record (join=0x7fbcf89cc940, join_tab=0x7fbcf89d0dc0, error=0) at sql_select.cc:13981
#17 0x000000000073e46e in sub_select (join=0x7fbcf89cc940, join_tab=0x7fbcf89d0dc0, end_of_records=false) at sql_select.cc:13733
#18 0x000000000074d27f in do_select (join=0x7fbcf89cc940, fields=0x7fbcf89ce358, table=0x0, procedure=0x0) at sql_select.cc:13479
#19 0x0000000000762341 in JOIN::exec (this=0x7fbcf89cc940) at sql_select.cc:2845
#20 0x000000000075d08c in mysql_select (thd=0x2546898, rref_pointer_array=0x2548a68, tables=0x2602950, wild_num=0, fields=@0x2548988, conds=0x2603098,
    og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=3491514880, result=0x2603298, unit=0x2548428, select_lex=0x2548880)
    at sql_select.cc:3035
#21 0x0000000000762660 in handle_select (thd=0x2546898, lex=0x2548388, result=0x2603298, setup_tables_done_option=1073741824) at sql_select.cc:301
#22 0x00000000006ce457 in mysql_execute_command (thd=0x2546898) at sql_parse.cc:3167
#23 0x00000000006d37d1 in mysql_parse (thd=0x2546898,
    inBuf=0x2601bf0 "INSERT INTO `table10_falcon_int_autoinc` ( `int_key` , `int` ) SELECT `int` , `int_key` FROM `table10_falcon_int_autoinc` WHERE `pk` > 10 LIMIT 1", length=145, found_semicolon=0x7fbca44f3f00) at sql_parse.cc:5732
#24 0x00000000006d43bc in dispatch_command (command=COM_QUERY, thd=0x2546898,
    packet=0x25fcb49 " INSERT INTO `table10_falcon_int_autoinc` ( `int_key` , `int` ) SELECT `int` , `int_key` FROM `table10_falcon_int_autoinc` WHERE `pk` > 10 LIMIT 1 ", packet_length=147) at sql_parse.cc:1007
#25 0x00000000006d58e5 in do_command (thd=0x2546898) at sql_parse.cc:690
#26 0x00000000006c3a69 in handle_one_connection (arg=0x2546898) at sql_connect.cc:1154
#27 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#28 0x000000315a4e627d in clone () from /lib64/libc.so.6

How to repeat:
If this is repeatable, a test case will be provided.

Rows: 10, max field size: 256; threads 8; queries: update, insert, delete

To reproduce within 20 seconds on 20% of the test runs:

$ perl runall.pl \
  --engine=Falcon \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--loose-innodb-lock-wait-timeout=1 \
  --mysqld=--log-output=none \
  --mysqld=--skip-safemalloc \
  --mem \
  --rows=10 \
  --threads=8 \
  --mysqld=--falcon-consistent=read=1 \
  --varchar-length=256 \
  --basedir=/build/bzr/6.0-falcon-team \
  --mask=2094 \
  --queries=100000000 \
  --duration=900 \
  --gendata=conf/varchar.zz \
  --grammar=conf/varchar.yy

Other crashes are possible with this test, however with sufficient patience the assertion this bug is about will be hit.

This is a case where Table::insertIntoTree() failed inadvertently.  We are seeing this alot.  In this case, it is in Table::insert(Transaction *transaction, Stream *stream)  in the following code.

		// Do the actual insert

		transaction->addRecord(record);
		bool ret = insertIntoTree(record, NULL, recordNumber);
		inserted = true;
		insertIndexes(transaction, record);
		ASSERT(ret);
		record->state = recData;
		record->release(REC_HISTORY);

See also; Bug#42185 & Bug#43298

I could reproduce the assertion with DBT2 on walldorf:

[Falcon] Error: assertion (ret) failed at line 3078 in file Table.cpp

090320 12:22:08 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388608
read_buffer_size=1048576
max_used_connections=16
max_threads=1601
thread_count=16
connection_count=16
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4943687 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x2aab3c390bd0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x4dee00e8 thread_stack 0x40000
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(my_print_stacktrace+0x2e) [0xabfe3e]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_segfault+0x336) [0x65cc36]
/lib64/libpthread.so.0 [0x372ae0de60]
/lib64/libpthread.so.0(raise+0x2d) [0x372ae0dd3d]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Error::error(char const*, ...)+0xf9) [0x8684d9]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Table::insert(Transaction*, Stream*)+0xd8) [0x881c68]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageTable::insert()+0x3d) [0x87479d]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageInterface::write_row(unsigned char*)+0x6c) [0x8637bc]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handler::ha_write_row(unsigned char*)+0x6f) [0x76e08f]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(write_record(THD*, TABLE*, st_copy_info*)+0x5d) [0x6f5cbd]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool)+0xa64) [0x6fade4]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_execute_command(THD*)+0x2d13) [0x66f8e3]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x23c) [0x672d4c]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x98d) [0x673afd]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(do_command(THD*)+0xc4) [0x674684]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_one_connection+0xf4) [0x666174]
/lib64/libpthread.so.0 [0x372ae062e7]
/lib64/libc.so.6(clone+0x6d) [0x372a2ce3bd]

Philip reported that this problem cannot be repeated after the changes to create a
TransactionState object and the addition of the CycleManager to protect doomed records until after all temporary stack pointers to them have gone away.

Hakan has reproduced this bug again running DBT2.

[Falcon] Error: assertion (ret) failed at line 3098 in file Table.cpp

Error::error(char const*, ...)
Table::insert(Transaction*, Stream*)
StorageDatabase::insert(Connection*, Table*, Stream*)
StorageTable::insert()
StorageInterface::write_row(unsigned char*)
handler::ha_write_row(unsigned char*)
write_record(THD*, TABLE*,st_copy_info*)
mysql_insert(THD*,TABLE_LIST*, List<Item>&, List<List<Item> >&,
List<Item>&, List<Item>&, enum_duplicates, bool)+0x994)
mysql_execute_command(THD*)+0x135d)

This is really a failure of the code to handle an unsuccessful call to insertIntoTree().

uint Table::insert(Transaction *transaction, Stream *stream)
{
. . .
	bool ret = insertIntoTree(record, NULL, recordNumber);
	inserted = true;
	insertIndexes(transaction, record);
	ASSERT(ret);
. . .

In this bug, a new record is inserted into a table. The slot in the Recordleaf that is used is controled by the record number chosen. This is assigned by Dbb::insertStub() which uses an exclusive lock on the Section::syncInsert. This syncInsert lock serializes all free record numbers added by Section::expungeRecord() and used again by Section::insertStub(), or reserved by Section::reserveRecordNumber().

But what is a record number that is still in use is returned from Section::insertStub()? Section::expungeRecord() provides these free record numbers, and for Record Locator Page sections, this is always called after the slot in the RecordTree has been set to NULL.

Always except... in RecordLeaf::retireRecords. This function calls Record->retire first, and then sets the RecordLeaf::records[slot] to NULL. It was assumed that an exclusive lock on RecordLeaf::syncObject would make it OK to provide the record number to the Section (RecordLocatorPage) first and then clear the RecordLeaf. But most calls to Table::insertIntotree() do not actually get any lock on RecordLeaf::syncObject.

This means that when the scavenger is retiring records, it often releases the record number to be reused and then sets the slot in the recordLeaf to NULL. In the mean time, any amount of stuff could happen.

For this bug, insertStub returns a record number, the insert calls insertIntoTree, which fails because the that slot in the RecordLeaf is not yet NULL.

I need to rearrange RecordLeaf::retireRecords so that it sets the slot to NULL first, and then releases the record number to be reused.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

http://lists.mysql.com/commits/71557

3105 Kevin Lewis 2009-04-07
Bug#41661 - Table::insertIntoTree could fail during a Table::insert. This should not happen since
Dbb::insertStub should have returned a record number that was free. Nobody else should have
been messing with that slot in the RecordLeaf.

The problem was in RecordLeaf::retireRecords. This function called RecordVersion::retire which
then called expungeRecord. This makes the record number free to use again by any other thread.
It did this with an exclusive lock on the RecordLeaf, but Table::insertIntoTree often does not get
any lock on RecordLeaf when it calls RecordLeaf::store. So a free record number is reused in
Table::insert before the slot has been set to NULL by RecordLeaf::retireRecords. That is why
Table::insertIntoTree fails. It expects the slot to be NULL.

Another problem fixed is that the first of the two Table::insert functions did not check the
return value of insertIntoTree at all. This would have propogated the error further since
there would be nothing in the RecordTree slot after the insert.

Another problem fixed in RecordLeaf::retireRecords is with this statement;
if (record && recordScavenge->canBeRetired(record))
The problem is with incrementing the 'count' integer. If it is zero after stepping through the
RecordLeaf, then it is assumed that the RecordLeaf is empty and can be retired.
But this 'if' statement gets the wrong result. If the RecordLeaf is full of newer records that
cannot yet be retired, then this count will remain zero. The emptySections bitmap will be
set for this RecordLeaf. And later, in Table::retireRecords, it will attempt to retire these
sections. But finding records in them, it will just waste a lot of time.

I changed this code to be accurate in determining if the RecordLeaf is indead empty
before setting the bit in Table::emptySections.

Record::retire and RecordVersion::retire now only retire records. The more thorough
tests for whether a record can be retired is collected into RecordScavenge::canBeRetired.

Also, instead of just setting the slot in RecordLeaf to NULL directly, I now use a
COMPARE_EXCHANGE_POINTER() just like every other place where this location is changed.
Then if it fails for any reason, we will not retire that record. One legitimate reason that this
could fail is when this record is determined that it canBeRetired, but immediately after that,
a client thread does a fetchForUpdate, adding a lockRecord.

Since the 'count' integer is incremented only within it, a RecordLeaf that does not have any

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71790

3114 Kevin Lewis	2009-04-09
      Bug#41661 - A mistake was faound in RecordLeaf::retireRecords.  It makes a call to Record::retire() first and then calls record->getMemUsage().  It should do it the other way around.
       
      In addition, I found that the scavenger thread should also have a CycleLock for each table tht it prunes or retires because these activities can look at priorRecords periodically.

Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:kevin.lewis@sun.com-20090409144517-vor0khtrwnlqltb2) (merge vers: 6.0.11-alpha) (pib:6)

A note have been added to the 6.0.11 changelog: 

The Falcon CycleManager has been updated, which addresses a number of issues when examining records in various transaction states and their visisbility/isolation in relation to other threads.