Bug #35932 | Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries | ||
---|---|---|---|
Submitted: | 9 Apr 2008 11:02 | Modified: | 4 Oct 2008 15:36 |
Reporter: | Olav Sandstaa | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0.4 | OS: | Linux (Redhat) |
Assigned to: | Olav Sandstå | CPU Architecture: | Any |
[9 Apr 2008 11:02]
Olav Sandstaa
[28 Apr 2008 19:36]
Kevin Lewis
This may be the same as Bug#34602
[12 May 2008 19:36]
Kevin Lewis
Olav, This is probably fixed with the change to Bug#34602. Please confirm that this no longer asserts for you and close the bug.
[22 May 2008 10:04]
Olav Sandstå
I still see this crash and call stack when running many concurrent select queries. Unless I have done something wrong when pulling from mysql-6.0-falcon-team tree in bitkeeper the version I tested with was pulled from bitkeeper at about 8pm GMT on May 21st. The error written to the error file is: Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... /home/os136802/install/mysql6/bin/mysqld(print_stacktrace+0x20) [0x734be0] /home/os136802/install/mysql6/bin/mysqld(handle_segfault+0x33d) [0x5fb04d] /lib64/tls/libpthread.so.0 [0x360be0c430] /lib64/tls/libpthread.so.0(raise+0x2e) [0x360be0c2ce] /home/os136802/install/mysql6/bin/mysqld(Error::error(char const*, ...)+0xdd) [0 x7d4bed] /home/os136802/install/mysql6/bin/mysqld(Transaction::commitNoUpdates()+0x30e) [ 0x7a242e] /home/os136802/install/mysql6/bin/mysqld(Transaction::commit()+0xd2) [0x7a2572] /home/os136802/install/mysql6/bin/mysqld(Connection::commit()+0x52) [0x7bb672] /home/os136802/install/mysql6/bin/mysqld(StorageConnection::commit()+0x31) [0x78 2221] /home/os136802/install/mysql6/bin/mysqld(StorageInterface::commit(handlerton*, T HD*, bool)+0x3d) [0x77bf4d] /home/os136802/install/mysql6/bin/mysqld(ha_commit_one_phase(THD*, bool)+0x8a) [ 0x6e671a] /home/os136802/install/mysql6/bin/mysqld(ha_commit_trans(THD*, bool)+0x1f0) [0x6 e6ba0] /home/os136802/install/mysql6/bin/mysqld(end_trans(THD*, enum_mysql_completionty pe)+0x281) [0x604c91] /home/os136802/install/mysql6/bin/mysqld(mysql_execute_command(THD*)+0x33a2) [0x 60d042] /home/os136802/install/mysql6/bin/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x143) [0x6116d3] /home/os136802/install/mysql6/bin/mysqld(dispatch_command(enum_server_command, T HD*, char*, unsigned int)+0x5db) [0x611d7b] /home/os136802/install/mysql6/bin/mysqld(do_command(THD*)+0xde) [0x612d3e] /home/os136802/install/mysql6/bin/mysqld(handle_one_connection+0x11f) [0x60402f] /lib64/tls/libpthread.so.0 [0x360be0610a] /lib64/tls/libc.so.6(__clone+0x73) [0x360b5c68c3]
[22 May 2008 11:59]
Hakan Küçükyılmaz
Olav, can you provide us with a reproducable test script or C/C++/Java program? It would be of great help. Best regards, Hakan
[22 May 2008 20:37]
Kevin Lewis
Olav, The only assert left in the current Transaction::commitNoUpdates is ASSERT(!deferredIndexes); What does your code look like? Which assert is it hitting?
[22 May 2008 23:00]
Olav Sandstå
Hakan, Right now the test program I run is part of Java DB's performance test. I will try to reproduce this bug with a smaller test program and then send that. Olav
[27 May 2008 13:43]
Olav Sandstå
Kevin, To answer your question: I too have only one ASSERT in the commitNoUpdate() method (but it is not this that is triggered). I added some extra code to the Error::error() method that handles the assert to give out information about where the ASSERT occurred. Based on this it seems like the ASSERT actually is in Transaction::releaseDependencies(): if (COMPARE_EXCHANGE_POINTER(&state->transaction, transaction, NULL)) { ASSERT(transaction->transactionId == state->transactionId || transaction->state == Available); transaction->releaseDependency(); I have no idea why this method is not included in the stack trace that is printed to the mysql error file.
[27 May 2008 14:30]
Vladislav Vaintroub
Just a quick comment about method not included into stacktrace - most probably this is inlining by compiler. releaseDependencies() is quite short and is a good candidate for inlining. In such cases it is quite handy to get callstacks with file/line information. But I have no idea if pstack on Solaris does this and if yes how to force it to output this info.
[27 May 2008 19:45]
Ann Harrison
If you're seeing that assert, then you probably don't have (or Jim didn't push) the change that eliminated another transaction state CommitingReadOnly which was introduced for debugging another problem and is a legal state in this situation. Rather than handling it, we agreed to removed the debugging code. Cheers, Ann
[27 May 2008 21:53]
Olav Sandstå
Ann, thanks for clarifying that the assert I see should be or will soon be removed. Unless I have done something wrong or misunderstood how bitkeeper is working I believe that I have the most recent version of the mysql-6.0-falcon-team tree. I did a bk pull last week and did a new bk pull this morning (Norwegian time). I have not quite yet found out how to determine into which tree(s) a commit actually is in.
[28 May 2008 11:05]
Olav Sandstå
I have now verified that I have the change set where the CommitReadOnly state variable was removed (ChangeSet@1.2670.3.3) in the version I run this test with. So it seems like we still have a raise condition in this area.
[28 May 2008 11:07]
Olav Sandstå
With Wlad's excellent extention to the ASSERT test I now get the following written to the MySQL error file when this crash occurs: [Falcon] Error: assertion (transaction->transactionId == state->transactionId || transaction->state == Available) failed at line 776 in file Transaction.cpp
[6 Jun 2008 10:18]
Olav Sandstå
I have reproduced the problem and got a core file from the assert. I am not sure how much we can trust what is in a core file given this is possibly a cache coherence problem by here is the data I get from the core file: (gdb) print transaction->transactionId $5 = 0 (gdb) print state->transactionId $6 = 5369787 (gdb) print transaction->state $7 = 9 If these numbers can be trusted it is the "transaction->transactionId == state->transactionId" part of the assert that fails. The transaction state is 9 (Available). A dump of the entire transaction object looks like: (gdb) print *transaction $8 = {_vptr.Transaction = 0xac7e10, connection = 0x0, database = 0x2a95ed5bc8, transactionId = 0, oldestActive = 5369782, blockedBy = 0, curSavePointId = 1, next = 0x2a960a7d30, prior = 0x2a960b99c8, waitingFor = 0x0, savePoints = 0x0, freeSavePoints = 0x0, localSavePoints = {{next = 0x0, savepoints = 0xcccccccccccccccc, backloggedRecords = 0xcccccccccccccccc, records = 0xcccccccccccccccc, id = -858993460}, {next = 0x2a960a7940, savepoints = 0xcccccccccccccccc, backloggedRecords = 0xcccccccccccccccc, records = 0xcccccccccccccccc, id = -858993460}, {next = 0x2a960a7968, savepoints = 0xcccccccccccccccc, backloggedRecords = 0xcccccccccccccccc, records = 0xcccccccccccccccc, id = -858993460}, {next = 0x2a960a7990, savepoints = 0xcccccccccccccccc, backloggedRecords = 0xcccccccccccccccc, records = 0xcccccccccccccccc, id = -858993460}, { next = 0x2a960a79b8, savepoints = 0x0, backloggedRecords = 0x0, records = 0x2a960a7cf0, id = 1}}, deferredIndexes = 0x0, thread = 0x2a962322b8, blockingRecord = 0x0, backloggedRecords = 0x0, startTime = 523, deferredIndexCount = 0, statesAllocated = 33, isolationLevel = 8, xidLength = 0, mySqlThreadId = 22, xid = 0x0, states = 0x2a962375d8, commitTriggers = false, systemTransaction = false, hasUpdates = false, writePending = false, pendingPageWrites = false, hasLocks = false, syncActive = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xac5e10}, monitorCount = 0, mutex = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0, exclusiveThread = 0x0, waiters = 0, lockState = 0, stalls = 0, objectId = 260080, sharedCount = 0, collisionCount = 0, exclusiveCount = 223646, waitCount = 0, queueLength = 0, where = 0x0, name = 0x2a96120ea0 "Transaction::syncActive"}, syncIndexes = {<SynchronizationObject> = { _vptr.SynchronizationObject = 0xac5e10}, monitorCount = 0, mutex = {<SynchronizationObject> = { _vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0, exclusiveThread = 0x0, waiters = 0, lockState = 0, stalls = 0, objectId = 260081, sharedCount = 0, collisionCount = 0, exclusiveCount = 0, waitCount = 0, queueLength = 0, where = 0x0, name = 0x2a9622e998 "Transaction::syncIndexes"}, syncObject = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xac5e10}, monitorCount = 0, mutex = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = { __m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0, exclusiveThread = 0x0, waiters = 0, lockState = 0, stalls = 4995, objectId = 260082, sharedCount = 1781161, collisionCount = 13, exclusiveCount = 447293, waitCount = 4995, queueLength = 619, where = 0xac7a23 "Transaction::commitNoUpdates", name = 0x2a96120a88 "Transaction::syncObject"}, syncSavepoints = {<SynchronizationObject> = { _vptr.SynchronizationObject = 0xac5e10}, monitorCount = 0, mutex = {<SynchronizationObject> = { _vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0, exclusiveThread = 0x0, waiters = 0, lockState = 0, stalls = 0, objectId = 260083, sharedCount = 0, collisionCount = 0, exclusiveCount = 0, waitCount = 0, queueLength = 0, where = 0x0, name = 0x2a9622e8d8 "Transaction::syncSavepoints"}, totalRecordData = 0, totalRecords = 0, chilledRecords = 0, chilledBytes = 0, thawedRecords = 0, thawedBytes = 0, debugThawedRecords = 0, debugThawedBytes = 0, committedRecords = 0, deletedRecords = 0, chillPoint = 0x2a960a7cf0, scanIndexCount = 1, numberStates = 4, state = 9, dependencies = 1, useCount = 2, inList = 1, firstRecord = 0x0, lastRecord = 0x0}
[23 Jun 2008 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[24 Jun 2008 11:12]
Philip Stoev
Setting to verified. Bugs left in state "open" expire after 30 days.
[24 Jun 2008 17:36]
Kevin Lewis
Olav, Can you try to repeat this again with the current code? The change I made for Bug#36438 may have affected this because it closed a gap in which a transaction cannot be found while it is being transferred from the active list to the committed list.
[25 Jun 2008 11:18]
Olav Sandstå
Kevin, I have repeated this test with the lastest version from the mysql-6.0-falcon tree and I still see this crash. The call stack I get is: #0 0x000000360be09737 in pthread_kill () from /lib64/tls/libpthread.so.0 (gdb) where #0 0x000000360be09737 in pthread_kill () from /lib64/tls/libpthread.so.0 #1 0x000000000073928a in write_core (sig=6) at stacktrace.c:302 #2 0x00000000005eae62 in handle_segfault (sig=6) at mysqld.cc:2635 #3 <signal handler called> #4 0x000000360be0c2ce in raise () from /lib64/tls/libpthread.so.0 #5 0x00000000007cf612 in Error::debugBreak () at Error.cpp:94 #6 0x00000000007cf59a in Error::error (string=Variable "string" is not available. ) at Error.cpp:71 #7 0x00000000007cf5fd in Error::assertionFailed (text=Variable "text" is not available. ) at Error.cpp:78 #8 0x00000000007a6a99 in Transaction::releaseDependencies (this=0x2a960cc238) at Transaction.cpp:779 #9 0x00000000007a5c50 in Transaction::commitNoUpdates (this=0x2a960cc238) at Transaction.cpp:346 #10 0x00000000007a5814 in Transaction::commit (this=0x2a960cc238) at Transaction.cpp:246 #11 0x00000000007bb7d9 in Connection::commit (this=0x2a9625e930) at Connection.cpp:269 #12 0x000000000078bc32 in StorageConnection::commit (this=0x2a9625e898) at StorageConnection.cpp:146 #13 0x0000000000785ffd in StorageInterface::commit (hton=Variable "hton" is not available. ) at ha_falcon.cpp:1152 #14 0x00000000006df8a3 in ha_commit_one_phase (thd=0x2abeb400e0, all=true) at handler.cc:1177 #15 0x00000000006df73a in ha_commit_trans (thd=0x2abeb400e0, all=true) at handler.cc:1146 #16 0x00000000005f4ca3 in end_trans (thd=0x2abeb400e0, completion=COMMIT) at sql_parse.cc:592 #17 0x00000000005fb2bb in mysql_execute_command (thd=0x2abeb400e0) at sql_parse.cc:3887 #18 0x00000000005fee7e in mysql_parse (thd=0x2abeb400e0, inBuf=0x2abfa60ba0 "commit", length=6, found_semicolon=0x4c1e05f0) at sql_parse.cc:5799 #19 0x00000000005f5a00 in dispatch_command (command=COM_QUERY, thd=0x2abeb400e0, packet=0x2abeb43281 "commit", packet_length=6) at sql_parse.cc:1058 #20 0x00000000005f4ffb in do_command (thd=0x2abeb400e0) at sql_parse.cc:731 #21 0x00000000005f3d3d in handle_one_connection (arg=Variable "arg" is not available. ) at sql_connect.cc:1134 #22 0x000000360be0610a in start_thread () from /lib64/tls/libpthread.so.0 #23 0x000000360b5c68c3 in clone () from /lib64/tls/libc.so.6 #24 0x0000000000000000 in ?? () (gdb)
[10 Jul 2008 15:22]
Kevin Lewis
Duplicate Bug#38004 was opened with this same assert.
[12 Aug 2008 13:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/51417 2775 Olav Sandstaa 2008-08-12 Initial fix to Bug#35932 Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries Changed the failing assert so that it handles that the transition of the Transaction object from Active to Available is not "atomic" when seen from a thread not having locked the transaction object.
[14 Aug 2008 11:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/51620 2779 Olav Sandstaa 2008-08-14 Follow-up fix to Bug#35932 Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries The initial assert added in commit http://lists.mysql.com/commits/51491 was too strict. Change this to only assert if the state of the transaction the dependency is pointing to is Initializing.
[14 Aug 2008 11:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/51622 2779 Olav Sandstaa 2008-08-14 Follow-up fix to Bug#35932 Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries The initial assert added in commit http://lists.mysql.com/commits/51491 was too strict. Change this to only assert if the state of the transaction the dependency is pointing to is Initializing.
[27 Aug 2008 7:33]
Xuekun Hu
I replied at bug#36410 also. I met the same call stack with sysbench oltp simple test with older mysql 6.0 version. Now I tested again at mysql-6.0-faclon (revno 2779) version on three different machines (all are quad-core 2-socket system). The crash seems like to be disappeared, but a new scene is happened. On one machine, sysbench oltp simple test works fine, whatever with low or high concurrent threads runnings. However on other two machines, with higher concurrent threads, mysql will restart automaticly, without any error messages, no call stack. You can only see the below message on the screen. "./bin/mysqld_safe: line 133: 6866 Aborted nohup /home/mysql/mysql/libexec/mysqld --basedir=/home/mysql/mysql/ --datadir=/home/mysql/database/var --user=mysql --skip-grant-tables --max_connections=3000 --table_cache=10240 --log-error=/home/mysql/database/var/perf-alcolu.err --pid-file=/home/mysql/database/var/perf-alcolu.pid --socket=/tmp/mysql.sock --port=3306 </dev/null >>/home/mysql/database/var/perf-alcolu.err 2>&1 >>/home/mysql/database/var/perf-alcolu.err 2>&1"
[28 Aug 2008 20:15]
Bugs System
Pushed into 6.0.7-alpha (revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (version source revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (pib:3)
[28 Aug 2008 22:45]
Kevin Lewis
This bug is 'fixed' by changing the assert so that it accounts for known concurrency issues in commitNoUpdates. There is a follow-on assert tracked by Bug#38739 that can still happen. The concurrency problems with releasing dependencies will be fixed by a re-implementation of transaction dependencies, which will be done under Bug#38739.
[13 Sep 2008 19:58]
Bugs System
Pushed into 6.0.6-alpha (revid:olav@sun.com-20080814111842-7229caxvwxzrgn8t) (version source revid:sergefp@mysql.com-20080611231653-nmuqmw6dedjra79i) (pib:3)
[4 Oct 2008 15:36]
Jon Stephens
Documented bugfix in the 6.0.6 changelog as follows: Multiple threads executing repeated queries on the same Falcon table led eventually to a crash of the server.