Bug #40155 [Falcon] Error: assertion (ret == 0) failed at line 80 in file Mutex.cpp
Submitted: 19 Oct 2008 19:28 Modified: 15 May 2009 17:04
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0-rpl OS:Any
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: assertion, disk full, F_ERROR HANDLING, Memory, pushbuild, sporadic, test failure, widespread
Triage: Triaged: D1 (Critical)

[19 Oct 2008 19:28] Sven Sandberg
Description:
Sporadic pushbuild failure happened in various tests in 6.0-rpl on sol10-amd64-a:

rpl.rpl_truncate_3innodb                 [ fail ]

CURRENT_TEST: rpl.rpl_truncate_3innodb
[Falcon] Error: assertion (ret == 0) failed at line 80 in file Mutex.cpp

081004  3:48:04 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=0
max_threads=20
thread_count=0
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 10324 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Writing a core file

Failed to start mysqld.1

 - saving '/tmp/var-ps_stm_threadpool-6/2/log/rpl.rpl_truncate_3innodb/' to '/tmp/var-ps_stm_threadpool-6/log/rpl.rpl_truncate_3innodb/'
 - found 'core' (1/5)

Retrying test, attempt(2/3)...

fork failed sleep 1 second and redo: Not enough space at lib/My/SafeProcess/Base.pm line 76.
Can't call method "value" on unblessed reference at ./mysql-test-run.pl line 4159.
rpl.rpl_spec_variables                   [ skip ]. Test requires: 'true'

Retrying test, attempt(3/3)...

How to repeat:
E.g., https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-rpl&order=80 sol10-amd64-a/ps_stm_threadpool
xref: http://tinyurl.com/5836eh
[19 Oct 2008 19:28] Sven Sandberg
See also BUG#40133, BUG#40114, BUG#40109, BUG#38970, and BUG#37779.
[19 Oct 2008 19:37] Sven Sandberg
The crashes happened in only two different pushes. Both those pushes it seems that the host's disk was nearly full. So it's probably related to that.
[20 Oct 2008 15:26] Kevin Lewis
Olav, Assigning this to you for invetigation since you are currently working on what happens when a disk full error occurs.
[27 Oct 2008 20:51] Olav Sandstå
It also seems that the server running this test was "out of memory" at the time this failure happened (taken from /var/adm/messages on sol10-amd64-a):

Oct  4 02:47:31 sol10-amd64-a tmpfs: [ID 518458 kern.warning] WARNING: /tmp: Fil
e system full, swap space limit exceeded
Oct  4 02:52:49 sol10-amd64-a last message repeated 5556 times

(if I have managed to correlate time stamps correctly).

The assert that is hit is in Mutex::lock():

	int ret = pthread_mutex_lock (&mutex);
	ASSERT(ret == 0);

Given that we have no core file and no call stack I expect it may be hard to reproduce this exact crash.

Possible things that can be done:

1. Instrument the code so that it writes something to stderr when this happens again (but no guarantee that this will ever be written if the system is out of memory)

2. Start testing Falcon on a system where we "simulates" out-of-memory by filling up the swap space - and see what happens.
[12 Feb 2009 12:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66031

3014 Olav Sandstaa	2009-02-12
      Bug #40155 [Falcon] Error: assertion (ret == 0) failed at line 80 in file Mutex.cpp
      
      Adding code for writing the error status from the call to pthread_mutex_lock to
      the mysql log file when this error occurs. This does not fix the problem but
      will help understanding why this assert happens.
[12 Feb 2009 16:51] Kevin Lewis
We cannot do anything about machines with not enough memory.  I think we should fix this bug with this patch.
[16 Feb 2009 9:50] Olav Sandstå
With the extra information provided by the latest patch the output when this crash occurs is:

[Falcon] Error: Mutex::lock: pthread_mutex_lock returned errno 45
[Falcon] Error: assertion (ret == 0) failed at line 92 in file Mutex.cpp

On Solaris error number 45 is:

#define EDEADLK 45      /* Deadlock condition.                  */

and the man page for pthread_mutex_lock says:

"The  pthread_mutex_lock() function may fail if:

     EDEADLK         The current thread already owns the mutex."

The question then is: Does Falcon actually try to lock the mutex twice or is this a consequence of an "out-of-error situation"?
[23 Feb 2009 22:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67267

3033 Olav Sandstaa	2009-02-23
      Bug #40155 Error: assertion (ret == 0) failed at line 80 in file Mutex.cpp
      
      Add a location name to every Mutex object. The purpose is to be able to
      know which mutex an error occurs on.
[24 Feb 2009 14:04] Philip Stoev
Here is a backtrace for this assertion obtained with the RQG.

#7  0x00000000009774c7 in Error::assertionFailed (text=0xe4ec52 "ret == 0",
    fileName=0xe4ec48 "Mutex.cpp", line=92) at Error.cpp:78
#8  0x000000000097bb8f in Mutex::lock (this=0x2aaabc454d70) at Mutex.cpp:92
#9  0x000000000099102a in SyncObject::grantLocks (this=0x2aaabc454d60)
    at SyncObject.cpp:839
#10 0x0000000000991499 in SyncObject::unlock (this=0x2aaabc454d60,
    sync=0x41a6b2f0, type=Exclusive) at SyncObject.cpp:536
#11 0x00000000009907c5 in ~Sync (this=0x41a6b2f0) at Sync.cpp:51
#12 0x00000000009aaaf2 in Transaction::commitNoUpdates (this=0x2aaabc454bf8)
    at Transaction.cpp:353
#13 0x00000000009aac5f in Transaction::commit (this=0x2aaabc454bf8)
    at Transaction.cpp:212
#14 0x00000000009c4446 in Connection::commit (this=0x2aaaaae37378)
    at Connection.cpp:272
#15 0x000000000097ec85 in StorageConnection::commit (this=0x2aaaaae372e0)
    at StorageConnection.cpp:150
#16 0x000000000096f191 in StorageInterface::commit (hton=0x15d5c620,
    thd=0x16aac9b0, all=true) at ha_falcon.cpp:1258
#17 0x000000000081a59b in ha_commit_one_phase (thd=0x16aac9b0, all=true)
    at handler.cc:1223
#18 0x000000000081aaf2 in ha_commit_trans (thd=0x16aac9b0, all=true)
    at handler.cc:1191
#19 0x00000000008ebe34 in trans_commit (thd=0x16aac9b0) at transaction.cc:132
#20 0x00000000006d48a8 in mysql_execute_command (thd=0x16aac9b0)
    at sql_parse.cc:4035
#21 0x00000000006d6af9 in mysql_parse (thd=0x16aac9b0,
    inBuf=0x16ac7e38 "COMMIT", length=6, found_semicolon=0x41a6cf20)
    at sql_parse.cc:5752
#22 0x00000000006d7c8b in dispatch_command (command=COM_QUERY, thd=0x16aac9b0,
    packet=0x16ab7cb1 " COMMIT", packet_length=7) at sql_parse.cc:1009
#23 0x00000000006d911e in do_command (thd=0x16aac9b0) at sql_parse.cc:691
#24 0x00000000006c65da in handle_one_connection (arg=0x16aac9b0)
[24 Feb 2009 14:17] Philip Stoev
The previous callstack was accompanied by the following message in the error log:

[Falcon] Error: Mutex::lock: pthread_mutex_lock returned errno 22
[2 Mar 2009 14:12] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090302140208-lfdejjbcyezlhhjt) (version source revid:olav@sun.com-20090223224236-skt6frq2ad6ub61g) (merge vers: 6.0.11-alpha) (pib:6)
[2 Mar 2009 14:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68000

3043 Olav Sandstaa	2009-03-02
      Fix to Bug #40155 Error: assertion (ret == 0) failed at line 80 in file Mutex.cpp
      
      This crash was due to the Memory manager trying to lock the same mutex twice. It happened
      when the server run out of memory. The memory manager handles this by throwing an exception.
      Unfortunatly, when creating this exception there there was call both the Log and the
      JString class that did allocate memory leading to the second call to the memory manager's
      mutex.
      
      This patch fixes this by:
      
       -SQLError exception do not longer call the Log if the error code is out-of-memory
       -SQLError use an new method on JString that does not allocate memory for the text
      but just stores the pointer to the existing string
      
      In addition the memory manager now writes a message directly to stderr when it runs out
      of memory.
[2 Mar 2009 17:44] Kevin Lewis
Patch looks OK.  But I agree with Vlad that the bool staticString should be isStaticString or isStatic or isReadonly or isConst.
[5 Mar 2009 13:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68376

3051 Olav Sandstaa	2009-03-05
      Fix to Bug #40155 Error: assertion (ret == 0) failed at line 80 in file Mutex.cpp
            
      This crash was due to the Memory manager trying to lock the same mutex twice. It happened
      when the server run out of memory. The memory manager handles this by throwing an exception.
      Unfortunatly, when creating this exception there there was call both the JString and the
      Log class that did allocate memory leading to the second call to the memory manager's
      mutex.
            
      This patch fixes this by:
            
      -SQLError use an new method on JString that does not allocate memory for the text
       but just stores the pointer to the existing string
      -Simplified SQLError's use of the Log if the error is out-of-memory
     @ storage/falcon/JString.cpp
        Support for using an external string instead of copying the string to an internal buffer. 
        This is done to avoid having to allocate space in the JString object in situations where
        Falcon have run out of memory.
     @ storage/falcon/JString.h
        Add a method for setting the JSTring's string to use an externally allocated string
     @ storage/falcon/SQLError.cpp
        Special handle the OUT_OF_MEMORY_ERROR code so that this is handled without allocating memory.
     @ storage/falcon/SQLError.h
        Add the sql error code as a paramter to the error method.
[2 Apr 2009 17:38] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090402144811-yc5kp8g0rjnhz7vy) (version source revid:hky@sun.com-20090305143825-td8ysuv8g78v06qs) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 17:04] MC Brown
A note has been added to the 6.0.11 changelog: 

Falcon could cause an assertion when the system has run out of memory and tries to report the memory allocation failure.