Bug #35932 Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries
Submitted: 9 Apr 2008 11:02 Modified: 4 Oct 2008 15:36
Reporter: Olav Sandstaa Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.4 OS:Linux (Redhat)
Assigned to: Olav Sandstå CPU Architecture:Any
Triage: D1 (Critical)

[9 Apr 2008 11:02] Olav Sandstaa
Description:
With multiple client connections/threads (more than 10) where each thread is running a simple query where it reads a random record from a table Falcon eventually crashes due to "mysqld got signal 4" (Illegal instruction). The following is written to the error file:

===========================================================================

080404 12:53:25 - mysqld got signal 4 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=18
max_threads=151
threads_connected=18
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338303 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x174b610
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
/home/os136802/install/mysql6/bin/mysqld(print_stacktrace+0x20) [0x72f480]
/home/os136802/install/mysql6/bin/mysqld(handle_segfault+0x324) [0x5f8804]
/lib64/tls/libpthread.so.0 [0x360be0c430]
/lib64/tls/libpthread.so.0(raise+0x2e) [0x360be0c2ce]
/home/os136802/install/mysql6/bin/mysqld(Error::error(char const*, ...)+0xdd) [0
x7ce91d]
/home/os136802/install/mysql6/bin/mysqld(Transaction::commitNoUpdates()+0x28d) [
0x79c18d]
/home/os136802/install/mysql6/bin/mysqld(Transaction::commit()+0xd2) [0x79c3d2]
/home/os136802/install/mysql6/bin/mysqld(Connection::commit()+0x52) [0x7b5542]
/home/os136802/install/mysql6/bin/mysqld(StorageConnection::commit()+0x31) [0x77
cfb1]
/home/os136802/install/mysql6/bin/mysqld(StorageHandler::commit(THD*)+0x99) [0x7
80b99]
/home/os136802/install/mysql6/bin/mysqld(StorageInterface::commit(handlerton*, T
HD*, bool)+0x98) [0x7764e8]
/home/os136802/install/mysql6/bin/mysqld(ha_commit_one_phase(THD*, bool)+0x8a) [
0x6e129a]
/home/os136802/install/mysql6/bin/mysqld(ha_commit_trans(THD*, bool)+0x205) [0x6
e1735]
/home/os136802/install/mysql6/bin/mysqld(end_trans(THD*, enum_mysql_completionty
pe)+0x281) [0x6024a1]
/home/os136802/install/mysql6/bin/mysqld(mysql_execute_command(THD*)+0x3419) [0x
60a9a9]
/home/os136802/install/mysql6/bin/mysqld(mysql_parse(THD*, char const*, unsigned
 int, char const**)+0x143) [0x60efd3]
/home/os136802/install/mysql6/bin/mysqld(dispatch_command(enum_server_command, T
HD*, char*, unsigned int)+0xe73) [0x60ff13]
/home/os136802/install/mysql6/bin/mysqld(do_command(THD*)+0xc8) [0x6105c8]
/home/os136802/install/mysql6/bin/mysqld(handle_one_connection+0x11f) [0x60183f]
/lib64/tls/libpthread.so.0 [0x360be0610a]
/lib64/tls/libc.so.6(__clone+0x73) [0x360b5c68c3]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x1879560 = commit
thd->thread_id=16
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
pure virtual method called
pure virtual method called
pure virtual method called
pure virtual method called
pure virtual method called
pure virtual method called
pure virtual method called
pure virtual method called
pure virtual method called
pure virtual method called
terminate called without an active exception
terminate called without an active exception
pure virtual method called
terminate called recursively
pure virtual method called
pure virtual method called
terminate called recursively
Fatal signal pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
080404 12:53:25 mysqld_safe Number of processes running now: 0
080404 12:53:25 mysqld_safe mysqld restarted

======================================================================

This is repeatable with MySQL 6.0.4 and with the source code check out from both the mysql-6.0 and mysql-6.0-falcon trees (check-out on April 4th 2008). The above error log was produced using the source from mysql-6.0-falcon.

How to repeat:
An overview of what the program causing the problem does is (cut and paste from Java program):

1. Create a single table:

   CREATE TABLE simple id INTEGER, sec_id INTEGER, data_fld CHARACTER(100),PRIMARY KEY(id))

2. Insert 100.000 records into this table

3. Have N threads (where N > 15?) where each thread do a simple select of one of the records in the table:

    SELECT data_fld FROM " + TABLE_NAME + " WHERE id = ? 
    COMMIT;

After some minutes (normally less than five) the crash occurs
[28 Apr 2008 19:36] Kevin Lewis
This may be the same as Bug#34602
[12 May 2008 19:36] Kevin Lewis
Olav, This is probably fixed with the change to Bug#34602.  Please confirm that this no longer asserts for you and close the bug.
[22 May 2008 10:04] Olav Sandstå
I still see this crash and call stack when running many concurrent select queries. Unless I have done something wrong when pulling from mysql-6.0-falcon-team tree in bitkeeper the version I tested with was pulled from bitkeeper at about 8pm GMT on May 21st.

The error written to the error file is:

Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
/home/os136802/install/mysql6/bin/mysqld(print_stacktrace+0x20) [0x734be0]
/home/os136802/install/mysql6/bin/mysqld(handle_segfault+0x33d) [0x5fb04d]
/lib64/tls/libpthread.so.0 [0x360be0c430]
/lib64/tls/libpthread.so.0(raise+0x2e) [0x360be0c2ce]
/home/os136802/install/mysql6/bin/mysqld(Error::error(char const*, ...)+0xdd) [0
x7d4bed]
/home/os136802/install/mysql6/bin/mysqld(Transaction::commitNoUpdates()+0x30e) [
0x7a242e]
/home/os136802/install/mysql6/bin/mysqld(Transaction::commit()+0xd2) [0x7a2572]
/home/os136802/install/mysql6/bin/mysqld(Connection::commit()+0x52) [0x7bb672]
/home/os136802/install/mysql6/bin/mysqld(StorageConnection::commit()+0x31) [0x78
2221]
/home/os136802/install/mysql6/bin/mysqld(StorageInterface::commit(handlerton*, T
HD*, bool)+0x3d) [0x77bf4d]
/home/os136802/install/mysql6/bin/mysqld(ha_commit_one_phase(THD*, bool)+0x8a) [
0x6e671a]
/home/os136802/install/mysql6/bin/mysqld(ha_commit_trans(THD*, bool)+0x1f0) [0x6
e6ba0]
/home/os136802/install/mysql6/bin/mysqld(end_trans(THD*, enum_mysql_completionty
pe)+0x281) [0x604c91]
/home/os136802/install/mysql6/bin/mysqld(mysql_execute_command(THD*)+0x33a2) [0x
60d042]
/home/os136802/install/mysql6/bin/mysqld(mysql_parse(THD*, char const*, unsigned
 int, char const**)+0x143) [0x6116d3]
/home/os136802/install/mysql6/bin/mysqld(dispatch_command(enum_server_command, T
HD*, char*, unsigned int)+0x5db) [0x611d7b]
/home/os136802/install/mysql6/bin/mysqld(do_command(THD*)+0xde) [0x612d3e]
/home/os136802/install/mysql6/bin/mysqld(handle_one_connection+0x11f) [0x60402f]
/lib64/tls/libpthread.so.0 [0x360be0610a]
/lib64/tls/libc.so.6(__clone+0x73) [0x360b5c68c3]
[22 May 2008 11:59] Hakan Küçükyılmaz
Olav,

can you provide us with a reproducable test script or C/C++/Java program? It would be of great help.

Best regards,

Hakan
[22 May 2008 20:37] Kevin Lewis
Olav, The only assert left in the current Transaction::commitNoUpdates is
ASSERT(!deferredIndexes);
What does your code look like?  Which assert is it hitting?
[22 May 2008 23:00] Olav Sandstå
Hakan,

Right now the test program I run is part of Java DB's performance test. I will try to reproduce this bug with a smaller test program and then send that. 

Olav
[27 May 2008 13:43] Olav Sandstå
Kevin,

To answer your question: I too have only one ASSERT in the commitNoUpdate() method (but it is not this that is triggered).

I added some extra code to the Error::error() method that handles the assert to give out information about where the ASSERT occurred. Based on this it seems like the ASSERT actually is in Transaction::releaseDependencies():

			if (COMPARE_EXCHANGE_POINTER(&state->transaction, transaction, NULL))
				{
				ASSERT(transaction->transactionId == state->transactionId || transaction->state == Available);
				transaction->releaseDependency();

I have no idea why this method is not included in the stack trace that is printed to the mysql error file.
[27 May 2008 14:30] Vladislav Vaintroub
Just a quick comment about method not included into stacktrace - most probably
this is inlining by compiler. releaseDependencies() is quite short and is a good candidate for inlining. In such cases it is quite handy to get callstacks with file/line information. But I have no idea if pstack on Solaris  does this and if yes how to force it to output this info.
[27 May 2008 19:45] Ann Harrison
If you're seeing that assert, then you probably don't have (or Jim didn't
push) the change that eliminated another transaction state CommitingReadOnly
which was introduced for debugging another problem and is a legal state in
this situation.  Rather than handling it, we agreed to removed the debugging
code.

Cheers,

Ann
[27 May 2008 21:53] Olav Sandstå
Ann, thanks for clarifying that the assert I see should be or will soon be removed.

Unless I have done something wrong or misunderstood how bitkeeper is working I believe that I have the most recent version of the mysql-6.0-falcon-team tree. I did a bk pull last week and did a new bk pull this morning (Norwegian time). I have not quite yet found out how to determine into which tree(s) a commit actually is in.
[28 May 2008 11:05] Olav Sandstå
I have now verified that I have the change set where the CommitReadOnly state variable was removed (ChangeSet@1.2670.3.3) in the version I run this test with. So it seems like we still have a raise condition in this area.
[28 May 2008 11:07] Olav Sandstå
With Wlad's excellent extention to the ASSERT test I now get the following written to the MySQL error file when this crash occurs:

[Falcon] Error: assertion (transaction->transactionId == state->transactionId || transaction->state == Available) failed at line 776 in file Transaction.cpp
[6 Jun 2008 10:18] Olav Sandstå
I have reproduced the problem and got a core file from the assert. I am not sure how much we can trust what is in a core file given this is possibly a cache coherence problem by here is the data I get from the core file:

(gdb) print transaction->transactionId
$5 = 0
(gdb) print state->transactionId
$6 = 5369787
(gdb) print transaction->state
$7 = 9

If these numbers can be trusted it is the "transaction->transactionId == state->transactionId" part of the assert that fails. The transaction state is 9 (Available).

A dump of the entire transaction object looks like:

(gdb) print *transaction
$8 = {_vptr.Transaction = 0xac7e10, connection = 0x0, database = 0x2a95ed5bc8, transactionId = 0, oldestActive = 5369782, blockedBy = 0,
 curSavePointId = 1, next = 0x2a960a7d30, prior = 0x2a960b99c8, waitingFor = 0x0, savePoints = 0x0, freeSavePoints = 0x0,
 localSavePoints = {{next = 0x0, savepoints = 0xcccccccccccccccc, backloggedRecords = 0xcccccccccccccccc, records = 0xcccccccccccccccc,
     id = -858993460}, {next = 0x2a960a7940, savepoints = 0xcccccccccccccccc, backloggedRecords = 0xcccccccccccccccc,
     records = 0xcccccccccccccccc, id = -858993460}, {next = 0x2a960a7968, savepoints = 0xcccccccccccccccc,
     backloggedRecords = 0xcccccccccccccccc, records = 0xcccccccccccccccc, id = -858993460}, {next = 0x2a960a7990,
     savepoints = 0xcccccccccccccccc, backloggedRecords = 0xcccccccccccccccc, records = 0xcccccccccccccccc, id = -858993460}, {
     next = 0x2a960a79b8, savepoints = 0x0, backloggedRecords = 0x0, records = 0x2a960a7cf0, id = 1}}, deferredIndexes = 0x0,
 thread = 0x2a962322b8, blockingRecord = 0x0, backloggedRecords = 0x0, startTime = 523, deferredIndexCount = 0, statesAllocated = 33,
 isolationLevel = 8, xidLength = 0, mySqlThreadId = 22, xid = 0x0, states = 0x2a962375d8, commitTriggers = false,
 systemTransaction = false, hasUpdates = false, writePending = false, pendingPageWrites = false, hasLocks = false,
 syncActive = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xac5e10}, monitorCount = 0,
   mutex = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = {__m_reserved = 0, __m_count = 0,
       __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0, exclusiveThread = 0x0, waiters = 0,
   lockState = 0, stalls = 0, objectId = 260080, sharedCount = 0, collisionCount = 0, exclusiveCount = 223646, waitCount = 0,
   queueLength = 0, where = 0x0, name = 0x2a96120ea0 "Transaction::syncActive"}, syncIndexes = {<SynchronizationObject> = {
     _vptr.SynchronizationObject = 0xac5e10}, monitorCount = 0, mutex = {<SynchronizationObject> = {
       _vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0,
       __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0, exclusiveThread = 0x0, waiters = 0, lockState = 0, stalls = 0,
   objectId = 260081, sharedCount = 0, collisionCount = 0, exclusiveCount = 0, waitCount = 0, queueLength = 0, where = 0x0,
   name = 0x2a9622e998 "Transaction::syncIndexes"}, syncObject = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xac5e10},
   monitorCount = 0, mutex = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = {
       __m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0,
   exclusiveThread = 0x0, waiters = 0, lockState = 0, stalls = 4995, objectId = 260082, sharedCount = 1781161, collisionCount = 13,
   exclusiveCount = 447293, waitCount = 4995, queueLength = 619, where = 0xac7a23 "Transaction::commitNoUpdates",
   name = 0x2a96120a88 "Transaction::syncObject"}, syncSavepoints = {<SynchronizationObject> = {
     _vptr.SynchronizationObject = 0xac5e10}, monitorCount = 0, mutex = {<SynchronizationObject> = {
       _vptr.SynchronizationObject = 0xad3a50}, holder = 0x0, mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0,
       __m_lock = {__status = 0, __spinlock = 0}}}, que = 0x0, exclusiveThread = 0x0, waiters = 0, lockState = 0, stalls = 0,
   objectId = 260083, sharedCount = 0, collisionCount = 0, exclusiveCount = 0, waitCount = 0, queueLength = 0, where = 0x0,
   name = 0x2a9622e8d8 "Transaction::syncSavepoints"}, totalRecordData = 0, totalRecords = 0, chilledRecords = 0, chilledBytes = 0,
 thawedRecords = 0, thawedBytes = 0, debugThawedRecords = 0, debugThawedBytes = 0, committedRecords = 0, deletedRecords = 0,
 chillPoint = 0x2a960a7cf0, scanIndexCount = 1, numberStates = 4, state = 9, dependencies = 1, useCount = 2, inList = 1,
 firstRecord = 0x0, lastRecord = 0x0}
[23 Jun 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[24 Jun 2008 11:12] Philip Stoev
Setting to verified. Bugs left in state "open" expire after 30 days.
[24 Jun 2008 17:36] Kevin Lewis
Olav,  Can you try to repeat this again with the current code?  The change I made for Bug#36438 may have affected this because it closed a gap in which a transaction cannot be found while it is being transferred from the active list to the committed list.
[25 Jun 2008 11:18] Olav Sandstå
Kevin, I have repeated this test with the lastest version from the mysql-6.0-falcon tree and I still see this crash.

The call stack I get is:

#0  0x000000360be09737 in pthread_kill () from /lib64/tls/libpthread.so.0
(gdb) where
#0  0x000000360be09737 in pthread_kill () from /lib64/tls/libpthread.so.0
#1  0x000000000073928a in write_core (sig=6) at stacktrace.c:302
#2  0x00000000005eae62 in handle_segfault (sig=6) at mysqld.cc:2635
#3  <signal handler called>
#4  0x000000360be0c2ce in raise () from /lib64/tls/libpthread.so.0
#5  0x00000000007cf612 in Error::debugBreak () at Error.cpp:94
#6  0x00000000007cf59a in Error::error (string=Variable "string" is not available.
) at Error.cpp:71
#7  0x00000000007cf5fd in Error::assertionFailed (text=Variable "text" is not available.
) at Error.cpp:78
#8  0x00000000007a6a99 in Transaction::releaseDependencies (this=0x2a960cc238)
    at Transaction.cpp:779
#9  0x00000000007a5c50 in Transaction::commitNoUpdates (this=0x2a960cc238)
    at Transaction.cpp:346
#10 0x00000000007a5814 in Transaction::commit (this=0x2a960cc238)
    at Transaction.cpp:246
#11 0x00000000007bb7d9 in Connection::commit (this=0x2a9625e930)
    at Connection.cpp:269
#12 0x000000000078bc32 in StorageConnection::commit (this=0x2a9625e898)
    at StorageConnection.cpp:146
#13 0x0000000000785ffd in StorageInterface::commit (hton=Variable "hton" is not available.
) at ha_falcon.cpp:1152
#14 0x00000000006df8a3 in ha_commit_one_phase (thd=0x2abeb400e0, all=true)
    at handler.cc:1177
#15 0x00000000006df73a in ha_commit_trans (thd=0x2abeb400e0, all=true)
    at handler.cc:1146
#16 0x00000000005f4ca3 in end_trans (thd=0x2abeb400e0, completion=COMMIT)
    at sql_parse.cc:592
#17 0x00000000005fb2bb in mysql_execute_command (thd=0x2abeb400e0)
    at sql_parse.cc:3887
#18 0x00000000005fee7e in mysql_parse (thd=0x2abeb400e0,
    inBuf=0x2abfa60ba0 "commit", length=6, found_semicolon=0x4c1e05f0)
    at sql_parse.cc:5799
#19 0x00000000005f5a00 in dispatch_command (command=COM_QUERY, thd=0x2abeb400e0,
    packet=0x2abeb43281 "commit", packet_length=6) at sql_parse.cc:1058
#20 0x00000000005f4ffb in do_command (thd=0x2abeb400e0) at sql_parse.cc:731
#21 0x00000000005f3d3d in handle_one_connection (arg=Variable "arg" is not available.
) at sql_connect.cc:1134
#22 0x000000360be0610a in start_thread () from /lib64/tls/libpthread.so.0
#23 0x000000360b5c68c3 in clone () from /lib64/tls/libc.so.6
#24 0x0000000000000000 in ?? ()
(gdb)
[10 Jul 2008 15:22] Kevin Lewis
Duplicate Bug#38004 was opened with this same assert.
[12 Aug 2008 13:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51417

2775 Olav Sandstaa	2008-08-12
      Initial fix to Bug#35932 Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries
      
      Changed the failing assert so that it handles that the transition of the Transaction object from Active
      to Available is not "atomic" when seen from a thread not having locked the transaction object.
[14 Aug 2008 11:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51620

2779 Olav Sandstaa	2008-08-14
      Follow-up fix to Bug#35932 Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries
            
      The initial assert added in commit http://lists.mysql.com/commits/51491 was too strict. Change this to only
      assert if the state of the transaction the dependency is pointing to is Initializing.
[14 Aug 2008 11:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51622

2779 Olav Sandstaa	2008-08-14
      Follow-up fix to Bug#35932 Falcon crashes in Transaction::commitNoUpdates for simple SELECT queries
            
      The initial assert added in commit http://lists.mysql.com/commits/51491 was too strict. Change this to only
      assert if the state of the transaction the dependency is pointing to is Initializing.
[27 Aug 2008 7:33] Xuekun Hu
I replied at bug#36410 also.

I met the same call stack with sysbench oltp simple test with older mysql 6.0 version. Now
I tested again at mysql-6.0-faclon (revno 2779) version on three different machines (all
are quad-core 2-socket system).  The crash seems like to be disappeared, but a new scene
is happened. On one machine, sysbench oltp simple test works fine, whatever with low or
high concurrent threads runnings. However on other two machines, with higher concurrent
threads, mysql will restart automaticly, without any error messages, no call stack. You
can only see the below message on the screen. 

"./bin/mysqld_safe: line 133:  6866 Aborted                 nohup
/home/mysql/mysql/libexec/mysqld --basedir=/home/mysql/mysql/
--datadir=/home/mysql/database/var --user=mysql --skip-grant-tables --max_connections=3000
--table_cache=10240 --log-error=/home/mysql/database/var/perf-alcolu.err
--pid-file=/home/mysql/database/var/perf-alcolu.pid --socket=/tmp/mysql.sock --port=3306
</dev/null >>/home/mysql/database/var/perf-alcolu.err 2>&1
>>/home/mysql/database/var/perf-alcolu.err 2>&1"
[28 Aug 2008 20:15] Bugs System
Pushed into 6.0.7-alpha  (revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (version source revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (pib:3)
[28 Aug 2008 22:45] Kevin Lewis
This bug is 'fixed' by changing the assert so that it accounts for known concurrency issues in commitNoUpdates.  There is a follow-on assert tracked by Bug#38739 that can still happen.  The concurrency problems with releasing dependencies will be fixed by a re-implementation of transaction dependencies, which will be done under Bug#38739.
[13 Sep 2008 19:58] Bugs System
Pushed into 6.0.6-alpha  (revid:olav@sun.com-20080814111842-7229caxvwxzrgn8t) (version source revid:sergefp@mysql.com-20080611231653-nmuqmw6dedjra79i) (pib:3)
[4 Oct 2008 15:36] Jon Stephens
Documented bugfix in the 6.0.6 changelog as follows:

        Multiple threads executing repeated queries on the same Falcon table led
        eventually to a crash of the server.