Bug #34602 | Falcon assertion in Transaction::commitNoUpdates line 364 | ||
---|---|---|---|
Submitted: | 15 Feb 2008 16:44 | Modified: | 8 Jan 2009 10:27 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.4 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | simpler_testcase_needed |
[15 Feb 2008 16:44]
Philip Stoev
[15 Feb 2008 17:01]
Philip Stoev
Recovery appears successfull.
[16 Feb 2008 11:51]
Sveta Smirnova
Thank you for the report. Could you please provide iuds6? I suppose this is a test?
[16 Feb 2008 12:58]
Philip Stoev
Setting to verified so that it pops up in Falcon bug reports. This assertion has so far been observed more than once.
[20 Feb 2008 19:00]
Hakan Küçükyılmaz
Can't repeat, I have run iuds6.tst with debug and non-debug build for full 24 hours on lu0009.mysql.com.
[21 Feb 2008 11:13]
Hakan Küçükyılmaz
Please ignore my latest comment I had an error in my system-qa test suite setup. I will restart iuds6.tst on lu0009.
[21 Feb 2008 13:28]
Philip Stoev
This assertion also happens with the latest 6.0.4 build with 4.1.0 (SUSE Linux)
[21 Feb 2008 16:03]
Hakan Küçükyılmaz
I got a core file running debug build of mysqld release build on lu0009.mysql.com. We are hitting an assertion Thread status: - 64 threads where in pthread_cond_wait@@GLIBC_2.3.2 () - 9 threads where in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 - 5 threads where in __gxx_personality_v0 () from /lib64/libc.so.6 - 3 threads where in pthread_cond_timedwait@@GLIBC_2.3.2 () - 1 thread was in do_sigwait () from /lib64/libpthread.so.0 - 1 thread was in fdatasync () from /lib64/libc.so.6 - 1 thread was in pthread_kill () from /lib64/libpthread.so.0 Backtrace of thread in pthread_kill() (gdb) bt #0 0x00002b81e64daea3 in pthread_kill () from /lib64/libpthread.so.0 #1 0x000000000065e64a in handle_segfault (sig=4) at mysqld.cc:2313 #2 <signal handler called> #3 0x00002b81e64ddabd in raise () from /lib64/libpthread.so.0 #4 0x00000000008887ac in Error::debugBreak () at Error.cpp:92 #5 0x00000000008888a2 in Error::error (string=<value optimized out>) at Error.cpp:69 #6 0x0000000000888902 in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>, line=8163) at Error.cpp:76 #7 0x0000000000855b4c in Table::insert (this=0x2aaaaad61b58, transaction=0x2aaaaef74da0, stream=0x0) at Table.cpp:2927 #8 0x00000000008486ce in StorageDatabase::insert (this=<value optimized out>, connection=<value optimized out>, table=0x2aaaaad61b58, stream=0x2aaaaec46198) at StorageDatabase.cpp:226 #9 0x000000000084c8b3 in StorageTable::insert (this=0x2aaaaec40c10) at StorageTable.cpp:122 #10 0x00000000008458c4 in StorageInterface::write_row (this=0x1d8dcc0, buff=0x26e2a28 "��\023") at ha_falcon.cpp:1036 #11 0x000000000075b35b in ha_partition::write_row (this=0x26e2700, buf=<value optimized out>) at ha_partition.cc:2763 #12 0x000000000075104b in handler::ha_write_row (this=0x26e2700, buf=0x26e2a28 "��\023") at handler.cc:4602 #13 0x00000000006e4efe in write_record (thd=0x2aaab03f2e90, table=0x27d1870, info=0x49933ae0) at sql_insert.cc:1548 #14 0x00000000006ea7aa in mysql_insert (thd=0x2aaab03f2e90, table_list=<value optimized out>, fields=@0x2055298, values_list=@0x20552e0, update_fields=@0x20552c8, update_values=@0x20552b0, duplic=DUP_ERROR, ignore=false) at sql_insert.cc:803 #15 0x00000000006722c9 in mysql_execute_command (thd=0x2aaab03f2e90) at sql_parse.cc:2658 #16 0x00000000007ca7ce in sp_instr_stmt::exec_core (this=0x20563d0, thd=0x1fe3, nextp=0x4) at sp_head.cc:2807 #17 0x00000000007ca9fa in sp_lex_keeper::reset_lex_and_exec_core (this=0x2056410, thd=0x2aaab03f2e90, nextp=0x49934d64, open_tables=false, instr=0x20563d0) at sp_head.cc:2649 #18 0x00000000007d10cd in sp_instr_stmt::execute (this=0x20563d0, thd=0x2aaab03f2e90, nextp=0x49934d64) at sp_head.cc:2758 #19 0x00000000007cf13d in sp_head::execute (this=0x21d3bc0, thd=0x2aaab03f2e90) at sp_head.cc:1195 #20 0x00000000007cfac2 in sp_head::execute_procedure (this=0x21d3bc0, thd=0x2aaab03f2e90, args=0x49934fc0) at sp_head.cc:1904 #21 0x00000000007e41cb in Event_job_data::execute (this=0x49935070, thd=0x2aaab03f2e90, drop=false) at event_data_objects.cc:1987 #22 0x00000000007df6e6 in Event_worker_thread::run (this=0x499351b7, thd=0x2aaab03f2e90, event=0x2aaab2b46020) at event_scheduler.cc:312 #23 0x00000000007df802 in event_worker_thread (arg=<value optimized out>) at event_scheduler.cc:263 #24 0x00002b81e64d6143 in start_thread () from /lib64/libpthread.so.0 #25 0x00002b81e6b0774d in clone () from /lib64/libc.so.6 #26 0x0000000000000000 in ?? () The assert in frame #7 #7 0x0000000000855b4c in Table::insert (this=0x2aaaaad61b58, transaction=0x2aaaaef74da0, stream=0x0) at Table.cpp:2927 2927 Table.cpp: No such file or directory. in Table.cpp Table.cpp:2927 reads: transaction->addRecord(record); bool ret = insert(record, NULL, recordNumber); ASSERT(ret);
[21 Feb 2008 16:26]
Kevin Lewis
Triage as D1/W1/I4. Jim Starkey writes; I'd say it has a probability so low as to be negligible for beta. Transaction::commitNoUpdate has to remove references to the transaction being committed from other transaction objects. The problem is that this is a non-lock environment and another thread might be runnable but stalled creating a new dependency. The release dependency codes looks for established dependencies and releases them, but it has no way to find partially created dependencies. If, at the end of an attempt to release dependencies there is a remaining dependency, it performs an exponential backoff starting a millisecond and tries again. When the backoff reaches about a second, it throws the assertion. One of two things has happened. Once is that the system has a load were a runnable thread has been stalled for over 2 seconds. The other is a systematic bug in dependency tracking. A systematic bug would show up more frequently than this. It is possible, but unlikely. I think we should raise the max wait time to maybe 16 seconds, and take another look for a deterministic strategy that doesn't serialize transaction start. I think the probability of this being a beta issue is very, very low.
[23 Apr 2008 17:08]
Kevin Lewis
The assertion here is ASSERT(dependencies == 0); It occurs right after a loop of 10 tries to release these dependencies. The solution to this may be the same solution to Bug#36296 in which the sleep finctionis reported to be called too often, increasing Falcon's idle time.
[25 Apr 2008 14:31]
Ann Harrison
The Transaction object's commitNoUpdates method was created to clean up transactions that did not make changes to the database so they can be reused rather than being removed from the transaction list and released. This optimization makes a big difference when running autocommit tests without updates at machine speed against cached data. The cost of creating and releasing transactions had been a significant part of the total test time. However, even a read-only transaction can have other transactions that depend on it. Once a transaction has ended no new dependencies are created on it, but the existing transactions with dependencies need to be notified that the read-only transaction has gone away before that transaction can be reused (nothing worse that finding you have a dependency on something you've never heard of). commitNoUpdates invokes the TransactionManager expungeTransaction method to loop through all active transactions, telling them to release their dependency on the read-only transaction that is ending. The release is done by the Transaction method also called expungeTransaction. That method uses an interlocked instruction to keep two threads from releasing dependencies simultaneously. expungeTransaction causes the TransactionManaager However, in the case where a thread is blocked by another, the blocked thread leaves the expungeTransaction method without releasing its dependency. Previously, the commitNoUpdate method tried the expunge several times trying to get rid of all dependencies, and asserted if there were still dependencies. In a highly parallel environment, this assertion was hit from time to time. The short sleep before the retry also reduced CPU utilization. Now commitNoUpdate tries only once to release dependencies, and if that fails, it changes the state of the transaction to Available, leaving dependencies in place. That transaction cannot be reused until the dependencies are released, which they well be, eventually, as concurrent transactions end. So, this change will slightly reduce the reuse of transaction objects from read-only transactions, but should eliminate a potential assert and sleep.
[3 May 2008 4:55]
Philip Stoev
I hit this assertion when running sysbench with 2000 users. This is the first time this assertion is hit using sysbench, previously an iuds test was required. Backtrace is: #0 0x0000003ba880b132 in pthread_kill () from /lib64/libpthread.so.0 #1 0x00000000007f49fc in write_core (sig=6) at stacktrace.c:305 #2 0x000000000067a803 in handle_segfault (sig=6) at mysqld.cc:2624 #3 <signal handler called> #4 0x0000003ba880dd4d in raise () from /lib64/libpthread.so.0 #5 0x00000000008ba68c in Error::debugBreak () at Error.cpp:92 #6 0x00000000008ba7a0 in Error::error (string=0xc4e608 "assertion failed at line %d in file %s\n") at Error.cpp:69 #7 0x00000000008ba837 in Error::assertionFailed (fileName=0xc45990 "Transaction.cpp", line=369) at Error.cpp:76 #8 0x000000000087d6d2 in Transaction::commitNoUpdates (this=0x2aaac82de728) at Transaction.cpp:369 #9 0x000000000087d946 in Transaction::commit (this=0x2aaac82de728) at Transaction.cpp:243 #10 0x000000000089c94b in Connection::commit (this=0x1de14848) at Connection.cpp:269 #11 0x0000000000854161 in StorageConnection::commit (this=0x1de147b8) at StorageConnection.cpp:146 #12 0x000000000084eb86 in StorageInterface::commit (hton=0x39cdff0, thd=0xa1d9910, all=true) at ha_falcon.cpp:1154 #13 0x0000000000794153 in ha_commit_one_phase (thd=0xa1d9910, all=true) at handler.cc:1168 #14 0x0000000000794548 in ha_commit_trans (thd=0xa1d9910, all=true) at handler.cc:1137 #15 0x0000000000686c30 in end_active_trans (thd=0xa1d9910) at sql_parse.cc:133 #16 0x0000000000687b4f in begin_trans (thd=0xa1d9910) at sql_parse.cc:156 #17 0x000000000068d28a in mysql_execute_command (thd=0xa1d9910) at sql_parse.cc:3793 #18 0x000000000068fd34 in mysql_parse (thd=0xa1d9910, inBuf=0xa1e4e90 "BEGIN", length=5, found_semicolon=0x5efabf50) at sql_parse.cc:5623 #19 0x00000000006907a8 in dispatch_command (command=COM_QUERY, thd=0xa1d9910, packet=0xa1dcba1 "BEGIN", packet_length=5) at sql_parse.cc:1031 #20 0x00000000006919cd in do_command (thd=0xa1d9910) at sql_parse.cc:723 #21 0x00000000006818b6 in handle_one_connection (arg=0xa1d9910) at sql_connect.cc:1134 #22 0x0000003ba88062f7 in start_thread () from /lib64/libpthread.so.0 #23 0x0000003ba80ce85d in clone () from /lib64/libc.so.6
[4 May 2008 10:59]
Philip Stoev
The previous backtrace is from testing the 6.0.5 release. Kevin indicates that the fix has been pushed in mysql-6.0-falcon-team, however the 6.0.5 release does not have it.
[29 Aug 2008 2:02]
Kevin Lewis
This fix is in version 6.0.6 The for loop in Transaction commitnoUpdates was eliminated
[8 Jan 2009 10:27]
MC Brown
A note has been added to the 6.0.6 changelog: An assertion could be raised when the dependencies on a transaction could not be released after a specified time when using FALCON tables.