Bug #39552 Falcon hangs when system memory is exhausted
Submitted: 20 Sep 2008 5:10 Modified: 15 May 2009 17:03
Reporter: Allan Packer Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.7 OS:Any
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: F_ERROR HANDLING, falcon, hang

[20 Sep 2008 5:10] Allan Packer
Description:
When system memory is exhausted (i.e. the system runs out of memory and swap space), Falcon attempts to log an error message, and in the process tries to allocate more memory.   The result is that mysqld hangs.

See the stack trace for thread 41 - after the memoryIsExhausted (MemMgr.cpp), you'll notice another MemMgrAllocate.

-----------------  lwp# 41 / thread# 41  --------------------
fffffd7ffef42ff7 lwp_park (0, 0, 0)
fffffd7ffef3bcc6 mutex_lock_impl () + 116
fffffd7ffef3bd5c mutex_lock () + c
0000000000961646 _ZN5Mutex4lockEP4Sync8LockTypei () + 16
0000000000908815 _ZN4Sync4lockE8LockType () + 45
000000000095ffbf _ZN6MemMgr5allocEi () + 4f
000000000096094e _Z14MemMgrAllocatej () + 5e
000000000095e27d _ZN7JString9setStringEPKc () + 5d
000000000095e331 _ZN7JStringaSEPKc () + 11
0000000000999ffd _ZN8SQLErrorC1E7SqlCodePKcz () + 12d
0000000000960802 _ZN6MemMgr17memoryIsExhaustedEv () + 42
000000000095ff34 _ZN6MemMgr8allocRawEi () + 94
000000000096018a _ZN6MemMgr5allocEi () + 21a
000000000096094e _Z14MemMgrAllocatej () + 5e
0000000000944f0e _ZN13DeferredIndex7addNodeEP8IndexKeyi () + 32e
00000000009521c2 _ZN5Index6insertEP6RecordP11Transaction () + 62
000000000091026e _ZN5Table13insertIndexesEP11TransactionP13RecordVersion () + 3e
00000000009103db _ZN5Table6insertEP11TransactionP6Stream () + bb
000000000090361a _ZN12StorageTable6insertEv () + 3a
00000000008f856c _ZN16StorageInterface9write_rowEPh () + 6c
0000000000828c86 _ZN7handler12ha_write_rowEPh () + 86
00000000007c010b _Z12write_recordP3THDP8st_tableP12st_copy_info () + 5b
000000000083ecbd _Z10mysql_loadP3THDP12sql_exchangeP10TABLE_LISTR4ListI4ItemES8_S8_15enum_duplicatesbb () + 184d
000000000074dff1 _Z21mysql_execute_commandP3THD () + 34b1
0000000000750ead _Z11mysql_parseP3THDPKcjPS2_ () + 20d
0000000000751b5c _Z16dispatch_command19enum_server_commandP3THDPcj () + 89c
00000000007526ee _Z10do_commandP3THD () + ce
00000000007448dc handle_one_connection () + ec
fffffd7ffef42d09 _thr_setup () + 89
fffffd7ffef42fb0 _lwp_start ()

How to repeat:
Run yourself out of swap space while running Falcon.
[9 Oct 2008 17:01] MySQL Verification Team
Thank you for the bug report. Could you please try with latest source?. I can't
repeat instead I got the below error message:

mysql 6.0 > insert into tu (col1) select col1 from tu;
ERROR 1296 (HY000): Got error 305 'record memory is exhausted' from Falcon
mysql 6.0 >
[10 Nov 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[10 Nov 2008 5:50] Kevin Lewis
If the system is fully out of memory, MemMgr::memoryIsExhausted() uses a SQLError object to display this out-of-memory message.  There are 4 JString fields as members of the SQLError class.  The SQLError object must allocate more memory for these JString objects.  

However, in this bug report, it is highly unlikely that this second alloc would have failed.  The first allocation was a larger piece of memory for the DeferredIndex object.  The second was much smaller for these JStrings.   In fact, this call stack shows that the second call to MemMgr::alloc did not even try to allocate any memory.  

MemMgr::alloc uses a Mutex in the Sync() constructor instead of a SyncObject because Mutexes are faster than SyncObjects and this does not need shared locks.  This call stack waits indefinitely trying to get an exclusive lock on this Mutex for the second time by the same thread.

We had this problem once before on Linux.  That was for a changeset I made for  Bug#34890 (falcon_bug_22150.test fails on Pushbuild).  After the main changeset was pushed where SyncObject::mutex was held longer to avoid a missed signal, we found a stall on Linux only because the mutex.lock() in SyncObject::stalled() was now a recursive lock by the same thread.  The comment for removing that extra mutex.lock() was; "Do not allow recursive locking of mutexes since it is not supported by pthreads."  

Actually it is supported by pthreads if the mutex type is PTHREAD_MUTEX_RECURSIVE, but Falcon does not use that.  Unlike Linux pthreads, Windows allows recursive locks for CRITICAL_SECTIONs which is what is used for a Mutex.  In addition, the more complex SyncObject also allows recursion.

The following lines do not allow multiple locks by the same thread under #ifdef _PTHREADS or #ifdef SOLARIS_MT

storage\falcon\Mutex.cpp(79):	int ret = pthread_mutex_lock (&mutex);
storage\falcon\Mutex.cpp(84):	int ret = mutex_lock (&mutex);

So there are two ways to fix this problem.
1) Allow recursive locks for the Falcon Mutex class.
2) Do not allow MemMgr::memoryIsExhausted() to use a SQLError class which contains JStrings, or anything else that may need to allocate memory.  

I am sssigning this to Olav since if the solution should be number 1, we will need to make mutex_lock() recursive on Solaris.
[3 Apr 2009 10:19] Olav Sandstå
This is caused by the same issue as in bug#40155. The main difference is that in this case the mysqld process is hanging while in Bug#40155 it ended in a crash situation.

A patch for Bug#40155 has been made and committed that should also fix this issue. Setting the status for the bug to patch queued.
[3 Apr 2009 10:22] Olav Sandstå
Patch is merged into mysql-6.0.
[15 May 2009 17:03] MC Brown
A note has been added to the 6.0.11 changelog: 

When using Falcon and the system runs out of all memory and swap space, mysqld could hang while attempting to write an error message.