Bug #43299 Falcon crash in Record::addRef() - this=0xcccccccc00000000
Submitted: 2 Mar 2009 8:38 Modified: 15 May 2009 13:26
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon-team OS:Any
Assigned to: Kevin Lewis
Triage: Triaged: D1 (Critical)

[2 Mar 2009 8:38] Philip Stoev
When executing a RQG transactions.yy workload, Falcon crashed as follows:

#4  0x000000000098b466 in interlockedIncrement (ptr=0xcccccccc00000010) at Interlock.h:287
#5  0x0000000000a23b17 in Record::addRef (this=0xcccccccc00000000) at Record.cpp:546
#6  0x0000000000a234f9 in RecordLeaf::fetch (this=0x7f9a26599468, id=-1) at RecordLeaf.cpp:73
#7  0x00000000009932a3 in Table::fetch (this=0x7f9a26905028, recordNumber=-1) at Table.cpp:941
#8  0x0000000000997a0c in Table::update (this=0x7f9a26905028, transaction=0x34e13c0, orgRecord=0x7f9a0d5e3bb0, stream=0x7f9a147162e0) at Table.cpp:3099
#9  0x000000000097a32d in StorageDatabase::updateRow (this=0x7f9a26556208, storageConnection=0x7f9a26593438, table=0x7f9a26905028, oldRecord=0x7f9a0d5e3bb0,
    stream=0x7f9a147162e0) at StorageDatabase.cpp:676
#10 0x000000000098238e in StorageTable::updateRow (this=0x7f9a14710d40, recordNumber=377) at StorageTable.cpp:132
#11 0x0000000000974284 in StorageInterface::update_row (this=0x34806a0, oldData=0x3435140 "ЪЪЪ", newData=0x3434bd0 "ЪЩЪ") at ha_falcon.cpp:1231
#12 0x000000000081671e in handler::ha_update_row (this=0x34806a0, old_data=0x3435140 "ЪЪЪ", new_data=0x3434bd0 "ЪЩЪ") at handler.cc:5524
#13 0x00000000007973a5 in mysql_update (thd=0x32a1678, table_list=0x32b3848, fields=@0x32a3520, values=@0x32a3928, conds=0x0, order_num=1, order=0x32b4150,
    limit=18446744073709551614, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:651
#14 0x00000000006d7340 in mysql_execute_command (thd=0x32a1678) at sql_parse.cc:3014
#15 0x00000000006dcedd in mysql_parse (thd=0x32a1678,
    inBuf=0x32b3640 "UPDATE `table100_falcon_int_autoinc` SET `enum_utf8` = 3  ORDER BY `enum_latin1_not_null_key`", length=93,
    found_semicolon=0x7f9a05860f00) at sql_parse.cc:5752
#16 0x00000000006ddac8 in dispatch_command (command=COM_QUERY, thd=0x32a1678,
    packet=0x330e969 "UPDATE `table100_falcon_int_autoinc` SET `enum_utf8` = 3  ORDER BY `enum_latin1_not_null_key` ", packet_length=94) at sql_parse.cc:1009
#17 0x00000000006deff1 in do_command (thd=0x32a1678) at sql_parse.cc:691
#18 0x00000000006ccf71 in handle_one_connection (arg=0x32a1678) at sql_connect.cc:1146
#19 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#20 0x000000315a4e627d in clone () from /lib64/libc.so.6

note ptr=0xcccccccc00000010, this=0xcccccccc00000000,  id=-1, recordNumber=-1

How to repeat:
perl runall.pl    --engine=Falcon   --reporters=Deadlock,ErrorLog,Backtrace,Recovery   --mysqld=--loose-falcon-lock-wait-timeout=1   --mysqld=--loose-innodb-lock-wait-timeout=1   --mysqld=--log-output=none   --mysqld=--skip-safemalloc   --mysqld=--falcon-page-size=4K  --rows=100 --threads=16    --basedir=/build/bzr/6.0-falcon-team   --mask=57595   --queries=100000000   --duration=900   --gendata=conf/combinations.zz  --grammar=conf/combinations.yy

Suggested fix:
Note that this call stack started going bad four levels before the actual crash. Extra assertions would have helped catch the problem earlier.
[2 Mar 2009 8:40] Philip Stoev
To repeat within 10 seconds of test runtime:

$ perl runall.pl \
  --engine=Falcon \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--loose-innodb-lock-wait-timeout=1 \
  --mysqld=--log-output=none \
  --mysqld=--skip-safemalloc \
  --mysqld=--falcon-page-size=4K \
  --rows=100 \
  --threads=16 \
  --basedir=/build/bzr/6.0-falcon-team \
  --mask=57595 \
  --queries=100000000 \
  --duration=900 \
  --gendata=conf/combinations.zz \
[30 Mar 2009 22:12] Kevin Lewis
Philip reported that this problem cannot be repeated after 
the changes the following changes were added;
1) Olav created a TransactionState object that outlives the 
2) A CycleManager was added to protect doomed records until 
   after all temporary stack pointers to them have gone away.
[15 May 2009 13:26] MC Brown
Internal/test fix only. No changelog entry required.
[15 May 2009 13:40] MC Brown
A note have been added to the 6.0.11 changelog: 

The Falcon CycleManager has been updated, which addresses a number of issues when examining records in various transaction states and their visisbility/isolation in relation to other threads.