MySQL Bugs: #44223: Falcon assertion state != recQueuedForDelete in Record::queueForDelete

Bug #44223	Falcon assertion state != recQueuedForDelete in Record::queueForDelete
Submitted:	12 Apr 2009 15:26	Modified:	26 May 2010 17:47
Reporter:	Philip Stoev	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0-falcon-team	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	F_RECORD TREE

Description:
When executing a RQG workload, the following assertion happened in 3 out of 60 test runs:

[Falcon] Error: assertion (state != recQueuedForDelete) failed at line 1101 in file Record.cpp

# 12:07:40 #7  0x00000000009831bf in Error::assertionFailed (
# 12:07:40     text=0xe7cfc2 "state != recQueuedForDelete",
# 12:07:40     fileName=0xe7cf80 "Record.cpp", line=1101) at Error.cpp:78
# 12:07:40 #8  0x0000000000a24a44 in Record::queueForDelete (this=0x2aaab5369648)
# 12:07:40     at Record.cpp:1101
# 12:07:40 #9  0x00000000009b605e in Transaction::rollback (this=0xe0d62a8)
# 12:07:40     at Transaction.cpp:415
# 12:07:40 #10 0x00000000009d0b8c in Connection::rollback (this=0x2aaaaae66698)
# 12:07:40     at Connection.cpp:287
# 12:07:40 #11 0x000000000098ae37 in StorageConnection::rollback (this=0x2aaaaae665e0)
# 12:07:40     at StorageConnection.cpp:191
# 12:07:40 #12 0x000000000097ac49 in StorageInterface::rollback (hton=0xcbf5e40,
# 12:07:40     thd=0x2aaab40d48f0, all=false) at ha_falcon.cpp:1326
# 12:07:40 #13 0x0000000000821bd2 in ha_rollback_trans (thd=0x2aaab40d48f0, all=false)
# 12:07:40     at handler.cc:1298
# 12:07:40 #14 0x00000000008f5334 in trans_rollback_stmt (thd=0x2aaab40d48f0)
# 12:07:40     at transaction.cc:267
# 12:07:40 #15 0x00000000006de7a5 in dispatch_command (command=COM_QUERY,
# 12:07:40     thd=0x2aaab40d48f0,
# 12:07:40     packet=0x2aaab42e7f11 "UPDATE `table100_falcon_varchar_1024_not_null` SET `varchar_1024_utf8` = 8 WHERE `varchar_1024_utf8_not_null` <> 'vyvym$
# 12:07:40     packet_length=229) at sql_parse.cc:1429
# 12:07:40 #16 0x00000000006decfa in do_command (thd=0x2aaab40d48f0) at sql_parse.cc:731
# 12:07:40 #17 0x00000000006cbea6 in handle_one_connection (arg=0x2aaab40d48f0)
# 12:07:40     at sql_connect.cc:1146
# 12:07:40 #18 0x0000003587a062f7 in start_thread () from /lib64/libpthread.so.0
# 12:07:40 #19 0x0000003586ed1b6d in clone () from /lib64/libc.so.6

How to repeat:
Happened within 3 min with the following RQG command:

perl runall.pl    --engine=Falcon   --reporters=Deadlock,ErrorLog,Backtrace,Recovery,Shutdown   --mysqld=--loose-falcon-lock-wait-timeout=1   --mysqld=--loose-innodb-lock-wait-timeout=1   --mysqld=--log-output=none   --mysqld=--skip-safemalloc   --mysqld=--falcon-page-size=4K --rows=100 --threads=16 --mysqld=--falcon-record-chill-threshold=1 --varchar-length=0   --mask=43811   --queries=100000000   --duration=900   --basedir=/export/home/pb2/test/sb_1-462247-1239511953.98/mysql-6.0.11-alpha-linux-x86_64-test  --gendata=conf/varchar.zz  --grammar=conf/varchar.yy 

and has also happened with various settings, including the default settings for chill/thaw .

The initial report says this bug happened "3 out of 60 test runs". In my testing on the 6.0-falcon-kevin tree, it only happened once out of 150 test runs. Therefore unscientifically, this bug may now be less frequent.

When I tested DBT2 with 100 WH on falcon-tree (rev2701, 2009-5-7), this assertion occured very time.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

http://lists.mysql.com/commits/75562

2717 Kevin Lewis 2009-06-03
Bug#43961 & Bug#44223 Both of these bugs are caused by a record that gets queued for delete while still chained up to the transaction. Then during Transaction::rolback (44223) a record to rollback is queued for delete a second time.
Or during Transaction::rollbackSavepoint (43961), the code assumes that this record on the transaction and savepoint is still the base record since it has Record::superceded=false, and tries to call insertIntoTree to rerplace it with its prior. But this record is not the base record either.

This can happen in Table::update(Transaction * transaction, Record *orgRecord, Stream *stream);
If this update replaces a lock record with an updated record and an exception occurred, it may or may not take the record off the transaction and the tree, but it would always queue the lock record for delete. This code has gone through some iterations lately. But in the most recent code, I was able to reporduce 44223.

This patch cleans it up even more by distinguishing whether a record is actually inserted into the tree and added to the transaction. It undoes those actions only if they were done. Then if the update started with a lock record, that recordversion will remain on the tree with superceded=false and it will not be queued for delete. It correctly uses the 'wasLock' variable instead of the 'updated' or 'wasUpdated' flag.