Bug #34602 Falcon assertion in Transaction::commitNoUpdates line 364
Submitted: 15 Feb 2008 16:44 Modified: 8 Jan 2009 10:27
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.4 OS:Any
Assigned to:
Tags: simpler_testcase_needed
Triage: D1 (Critical) / R3 (Medium) / E2 (Low)

[15 Feb 2008 16:44] Philip Stoev
Description:
Falcon crashed when running iuds6 with message in error log:

Bugcheck: assertion failed at line 364 in file Transaction.cpp

line before that is "Exception: lock timed out after 1000 milliseconds", nothing else except "Commit transaction" messages.

backtrace is:

#0  0x0000003ba880b132 in pthread_kill () from /lib64/libpthread.so.0
(gdb)  bt
#0  0x0000003ba880b132 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000650e12 in handle_segfault (sig=4) at mysqld.cc:2313
#2  <signal handler called>
#3  0x0000003ba880dd4d in raise () from /lib64/libpthread.so.0
#4  0x0000000000854f1c in Error::debugBreak () at Error.cpp:92
#5  0x0000000000854e30 in Error::error (string=0x0) at Error.cpp:69
#6  0x0000000000854f06 in Error::assertionFailed (fileName=0x31e9 <Address 0x31e9 out of bounds>, line=18625) at Error.cpp:76
#7  0x000000000082fd11 in Transaction::commitNoUpdates (this=0x2aaaae69afe8) at Transaction.cpp:364
#8  0x000000000082f7f2 in Transaction::commit (this=0x2aaaae69afe8) at Transaction.cpp:238
#9  0x0000000000841261 in Connection::commit (this=0x2aaadd03e848) at Connection.cpp:269
#10 0x000000000081928a in StorageConnection::commit (this=0x2aaadd0815d0) at StorageConnection.cpp:146
#11 0x0000000000810208 in StorageInterface::commit (hton=0x31e9, thd=0x2aaabc17ae00, all=true) at ha_falcon.cpp:1132
#12 0x000000000073489c in ha_commit_one_phase (thd=0x2aaabc17ae00, all=true) at handler.cc:786
#13 0x000000000073476f in ha_commit_trans (thd=0x2aaabc17ae00, all=true) at handler.cc:756
#14 0x000000000065ad32 in end_trans (thd=0x2aaabc17ae00, completion=COMMIT) at sql_parse.cc:574
#15 0x000000000066000a in mysql_execute_command (thd=0x2aaabc17ae00) at sql_parse.cc:3574
#16 0x000000000066317e in mysql_parse (thd=0x2aaabc17ae00, inBuf=0xa82ce40 "COMMIT", length=6, found_semicolon=0x47093590) at sql_parse.cc:5410
#17 0x000000000065b895 in dispatch_command (command=COM_QUERY, thd=0x2aaabc17ae00, packet=0xa82ce46 "", packet_length=6) at sql_parse.cc:948
#18 0x000000000065b092 in do_command (thd=0x2aaabc17ae00) at sql_parse.cc:697
#19 0x0000000000659e4f in handle_one_connection (arg=0x31e9) at sql_connect.cc:1146
#20 0x0000003ba88062f7 in start_thread () from /lib64/libpthread.so.0
#21 0x0000003ba80ce85d in clone () from /lib64/libc.so.6

How to repeat:
Run iuds6.tst . I will try to provide a simpler test case.

Core location and recoverability information will follow shortly.
[15 Feb 2008 17:01] Philip Stoev
Recovery appears successfull.
[16 Feb 2008 11:51] Sveta Smirnova
Thank you for the report.

Could you please provide iuds6? I suppose this is a test?
[16 Feb 2008 12:58] Philip Stoev
Setting to verified so that it pops up in Falcon bug reports. This assertion has so far been observed more than once.
[20 Feb 2008 19:00] Hakan Küçükyılmaz
Can't repeat, I have run iuds6.tst with debug and non-debug build for full 24 hours on lu0009.mysql.com.
[21 Feb 2008 11:13] Hakan Küçükyılmaz
Please ignore my latest comment I had an error in my system-qa test suite setup. I will restart iuds6.tst on lu0009.
[21 Feb 2008 13:28] Philip Stoev
This assertion also happens with the latest 6.0.4 build with 4.1.0 (SUSE Linux)
[21 Feb 2008 16:03] Hakan Küçükyılmaz
I got a core file running debug build of mysqld release build on lu0009.mysql.com. We are hitting an assertion

Thread status:

- 64 threads where in pthread_cond_wait@@GLIBC_2.3.2 ()
-  9 threads where in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
-  5 threads where in __gxx_personality_v0 () from /lib64/libc.so.6
-  3 threads where in pthread_cond_timedwait@@GLIBC_2.3.2 ()
-  1 thread was in do_sigwait () from /lib64/libpthread.so.0
-  1 thread was in fdatasync () from /lib64/libc.so.6
-  1 thread was in pthread_kill () from /lib64/libpthread.so.0

Backtrace of thread in pthread_kill()
(gdb) bt
#0  0x00002b81e64daea3 in pthread_kill () from /lib64/libpthread.so.0
#1  0x000000000065e64a in handle_segfault (sig=4) at mysqld.cc:2313
#2  <signal handler called>
#3  0x00002b81e64ddabd in raise () from /lib64/libpthread.so.0
#4  0x00000000008887ac in Error::debugBreak () at Error.cpp:92
#5  0x00000000008888a2 in Error::error (string=<value optimized out>) at Error.cpp:69
#6  0x0000000000888902 in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>,
    line=8163) at Error.cpp:76
#7  0x0000000000855b4c in Table::insert (this=0x2aaaaad61b58, transaction=0x2aaaaef74da0,
    stream=0x0) at Table.cpp:2927
#8  0x00000000008486ce in StorageDatabase::insert (this=<value optimized out>,
    connection=<value optimized out>, table=0x2aaaaad61b58, stream=0x2aaaaec46198)
    at StorageDatabase.cpp:226
#9  0x000000000084c8b3 in StorageTable::insert (this=0x2aaaaec40c10) at StorageTable.cpp:122
#10 0x00000000008458c4 in StorageInterface::write_row (this=0x1d8dcc0, buff=0x26e2a28 "��\023")
    at ha_falcon.cpp:1036
#11 0x000000000075b35b in ha_partition::write_row (this=0x26e2700, buf=<value optimized out>)
    at ha_partition.cc:2763
#12 0x000000000075104b in handler::ha_write_row (this=0x26e2700, buf=0x26e2a28 "��\023")
    at handler.cc:4602
#13 0x00000000006e4efe in write_record (thd=0x2aaab03f2e90, table=0x27d1870, info=0x49933ae0)
    at sql_insert.cc:1548
#14 0x00000000006ea7aa in mysql_insert (thd=0x2aaab03f2e90, table_list=<value optimized out>,
    fields=@0x2055298, values_list=@0x20552e0, update_fields=@0x20552c8, update_values=@0x20552b0,
    duplic=DUP_ERROR, ignore=false) at sql_insert.cc:803
#15 0x00000000006722c9 in mysql_execute_command (thd=0x2aaab03f2e90) at sql_parse.cc:2658
#16 0x00000000007ca7ce in sp_instr_stmt::exec_core (this=0x20563d0, thd=0x1fe3, nextp=0x4)
    at sp_head.cc:2807
#17 0x00000000007ca9fa in sp_lex_keeper::reset_lex_and_exec_core (this=0x2056410,
    thd=0x2aaab03f2e90, nextp=0x49934d64, open_tables=false, instr=0x20563d0) at sp_head.cc:2649
#18 0x00000000007d10cd in sp_instr_stmt::execute (this=0x20563d0, thd=0x2aaab03f2e90,
    nextp=0x49934d64) at sp_head.cc:2758
#19 0x00000000007cf13d in sp_head::execute (this=0x21d3bc0, thd=0x2aaab03f2e90) at sp_head.cc:1195
#20 0x00000000007cfac2 in sp_head::execute_procedure (this=0x21d3bc0, thd=0x2aaab03f2e90,
    args=0x49934fc0) at sp_head.cc:1904
#21 0x00000000007e41cb in Event_job_data::execute (this=0x49935070, thd=0x2aaab03f2e90, drop=false)
    at event_data_objects.cc:1987
#22 0x00000000007df6e6 in Event_worker_thread::run (this=0x499351b7, thd=0x2aaab03f2e90,
    event=0x2aaab2b46020) at event_scheduler.cc:312
#23 0x00000000007df802 in event_worker_thread (arg=<value optimized out>) at event_scheduler.cc:263
#24 0x00002b81e64d6143 in start_thread () from /lib64/libpthread.so.0
#25 0x00002b81e6b0774d in clone () from /lib64/libc.so.6
#26 0x0000000000000000 in ?? ()

The assert in frame #7
#7  0x0000000000855b4c in Table::insert (this=0x2aaaaad61b58, transaction=0x2aaaaef74da0,
    stream=0x0) at Table.cpp:2927
2927    Table.cpp: No such file or directory.
        in Table.cpp

Table.cpp:2927 reads:

		transaction->addRecord(record);
		bool ret = insert(record, NULL, recordNumber);
		ASSERT(ret);
[21 Feb 2008 16:26] Kevin Lewis
Triage as D1/W1/I4.   Jim Starkey writes;

I'd say it has a probability so low as to be negligible for beta.

Transaction::commitNoUpdate has to remove references to the transaction 
being committed from other transaction objects.  The problem is that 
this is a non-lock environment and another thread might be runnable but 
stalled creating a new dependency.  The release dependency codes looks 
for established dependencies and releases them, but it has no way to 
find partially created dependencies.  If, at the end of an attempt to 
release dependencies there is a remaining dependency, it performs an 
exponential backoff starting a millisecond and tries again.  When the 
backoff reaches about a second, it throws the assertion.

One of two things has happened.  Once is that the system has a load were 
a runnable thread has been stalled for over 2 seconds.  The other is a 
systematic bug in dependency tracking.

A systematic bug would show up more frequently than this.  It is 
possible, but unlikely.

I think we should raise the max wait time to maybe 16 seconds, and take 
another look for a deterministic strategy that doesn't serialize 
transaction start.

I think the probability of this being a beta issue is very, very low.
[23 Apr 2008 17:08] Kevin Lewis
The assertion here is ASSERT(dependencies == 0);
It occurs right after a loop of 10 tries to release these dependencies.  
The solution to this may be the same solution to Bug#36296 in which the sleep finctionis reported to be called too often, increasing Falcon's idle time.
[25 Apr 2008 14:31] Ann Harrison
The Transaction object's commitNoUpdates method was created to clean up
transactions that did not make changes to the database so they can be
reused rather than being removed from the transaction list and released.
This optimization makes a big difference when running autocommit tests
without updates at machine speed against cached data.  The cost of creating
and releasing transactions had been a significant part of the total test
time.

However, even a read-only transaction can have other transactions that
depend on it. Once a transaction has ended no new dependencies are 
created on it, but the existing transactions with dependencies need to 
be notified that the read-only transaction has gone away before that
transaction can be reused (nothing worse that finding you have a 
dependency on something you've never heard of).

commitNoUpdates invokes the TransactionManager expungeTransaction method
to loop through all active transactions, telling them to release their 
dependency on the read-only transaction that is ending.  The release is 
done by the Transaction method also called expungeTransaction.  That
method uses an interlocked instruction to keep two threads from releasing 
dependencies simultaneously.

expungeTransaction causes the TransactionManaager 
  
However, in the case where a thread is blocked by another, the blocked thread 
leaves the expungeTransaction method without releasing its dependency.  
Previously, the commitNoUpdate method tried the expunge several times
trying to get rid of all dependencies, and asserted if there were still
dependencies.  In a highly parallel environment, this assertion was 
hit from time to time.  The short sleep before the retry also reduced
CPU utilization.

Now commitNoUpdate tries only once to release dependencies, and if that
fails, it changes the state of the transaction to Available, leaving 
dependencies in place.  That transaction cannot be reused until the
dependencies are released, which they well be, eventually, as concurrent
transactions end. 

So, this change will slightly reduce the reuse of transaction objects
from read-only transactions, but should eliminate a potential assert 
and sleep.
[3 May 2008 4:55] Philip Stoev
I hit this assertion when running sysbench with 2000 users. This is the first time this assertion is hit using sysbench, previously an iuds test was required. Backtrace is:

#0  0x0000003ba880b132 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000007f49fc in write_core (sig=6) at stacktrace.c:305
#2  0x000000000067a803 in handle_segfault (sig=6) at mysqld.cc:2624
#3  <signal handler called>
#4  0x0000003ba880dd4d in raise () from /lib64/libpthread.so.0
#5  0x00000000008ba68c in Error::debugBreak () at Error.cpp:92
#6  0x00000000008ba7a0 in Error::error (string=0xc4e608 "assertion failed at line %d in file %s\n") at Error.cpp:69
#7  0x00000000008ba837 in Error::assertionFailed (fileName=0xc45990 "Transaction.cpp", line=369) at Error.cpp:76
#8  0x000000000087d6d2 in Transaction::commitNoUpdates (this=0x2aaac82de728) at Transaction.cpp:369
#9  0x000000000087d946 in Transaction::commit (this=0x2aaac82de728) at Transaction.cpp:243
#10 0x000000000089c94b in Connection::commit (this=0x1de14848) at Connection.cpp:269
#11 0x0000000000854161 in StorageConnection::commit (this=0x1de147b8) at StorageConnection.cpp:146
#12 0x000000000084eb86 in StorageInterface::commit (hton=0x39cdff0, thd=0xa1d9910, all=true) at ha_falcon.cpp:1154
#13 0x0000000000794153 in ha_commit_one_phase (thd=0xa1d9910, all=true) at handler.cc:1168
#14 0x0000000000794548 in ha_commit_trans (thd=0xa1d9910, all=true) at handler.cc:1137
#15 0x0000000000686c30 in end_active_trans (thd=0xa1d9910) at sql_parse.cc:133
#16 0x0000000000687b4f in begin_trans (thd=0xa1d9910) at sql_parse.cc:156
#17 0x000000000068d28a in mysql_execute_command (thd=0xa1d9910) at sql_parse.cc:3793
#18 0x000000000068fd34 in mysql_parse (thd=0xa1d9910, inBuf=0xa1e4e90 "BEGIN", length=5, found_semicolon=0x5efabf50) at sql_parse.cc:5623
#19 0x00000000006907a8 in dispatch_command (command=COM_QUERY, thd=0xa1d9910, packet=0xa1dcba1 "BEGIN", packet_length=5) at sql_parse.cc:1031
#20 0x00000000006919cd in do_command (thd=0xa1d9910) at sql_parse.cc:723
#21 0x00000000006818b6 in handle_one_connection (arg=0xa1d9910) at sql_connect.cc:1134
#22 0x0000003ba88062f7 in start_thread () from /lib64/libpthread.so.0
#23 0x0000003ba80ce85d in clone () from /lib64/libc.so.6
[4 May 2008 10:59] Philip Stoev
The previous backtrace is from testing the 6.0.5 release. Kevin indicates that the fix has been pushed in mysql-6.0-falcon-team, however the 6.0.5 release does not have it.
[29 Aug 2008 2:02] Kevin Lewis
This fix is in version 6.0.6  The for loop in Transaction commitnoUpdates was eliminated
[8 Jan 2009 10:27] MC Brown
A note has been added to the 6.0.6 changelog: 

An assertion could be raised when the dependencies on a transaction could not be released after a specified time when using FALCON tables.