Bug #40633 pushbuild failure: Falcon assertion "lockState == 0" failed in SyncObject.cpp
Submitted: 11 Nov 2008 10:30 Modified: 15 May 2009 16:17
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-rpl OS:Any
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: 6.0-rpl-green, assertion, F_SHUTDOWN, falcon, pushbuild, sporadic, syncobject, test failure
Triage: Triaged: D3 (Medium)

[11 Nov 2008 10:30] Sven Sandberg
Description:
Sporadic pushbuild failure:

main.init_file                           [ fail ]

CURRENT_TEST: main.init_file
[Falcon] Error: assertion (lockState == 0) failed at line 132 in file SyncObject.cpp

mysqltest got signal 6
read_command_buf at 0x1315840 = 
conn->name at 0x1e7d8b8 is an invalid pointer
conn->cur_query at (nil) is an invalid pointer
Attempting backtrace...
stack_bottom = (nil) thread_stack 0x40000
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(my_print_stacktrace+0x32)[0x65843c]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded[0x5f8507]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded[0x5f8533]
/lib64/libc.so.6[0x2ac3230c3b20]
/lib64/libc.so.6(gsignal+0x35)[0x2ac3230c3aa5]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN5Error10debugBreakEv+0xe)[0xa99414]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN5Error5errorEPKcz+0x12f)[0xa99545]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN5Error15assertionFailedEPKcS1_i+0x2d)[0xa995e3]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN10SyncObjectD1Ev+0x2f)[0xab7afd]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN5CacheD0Ev+0x362)[0xb8a462]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN8DatabaseD0Ev+0x43c)[0xba605c]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN10Connection16shutdownDatabaseEv+0x72)[0xb94394]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN15StorageDatabase5closeEv+0x56)[0xaa85f4]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN14StorageHandler15shutdownHandlerEv+0x98)[0xaacca6]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN16StorageInterface13falcon_deinitEPv+0x32)[0xaa2a18]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_ZN16StorageInterface5panicEP10handlerton17ha_panic_function+0x19)[0xaa2a3d]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_Z22ha_finalize_handlertonP13st_plugin_int+0xa4)[0x971726]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded[0x6f2ab4]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded[0x6f2fef]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_Z15plugin_shutdownv+0x69)[0x6f5a49]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded[0x61d410]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(end_embedded_server+0x37)[0x61d7bf]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(mysql_server_end+0x19)[0x61140b]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(_Z16free_used_memoryv+0x19c)[0x5f96ce]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded[0x5f96e0]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(main+0xe90)[0x6066ce]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2ac3230b1154]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/97/bzr_mysql-6.0-rpl-exp/libmysqld/examples/mysqltest_embedded(__gxx_personality_v0+0xb9)[0x5f5939]
terminate called after throwing an instance of 'SQLError'

 - saving '/dev/shm/var-embedded-111/1/log/main.init_file/' to '/dev/shm/var-embedded-111/log/main.init_file/'

Retrying test, attempt(2/3)...

How to repeat:
E.g., in main.init_file on https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-rpl&order=97 sapsrv1/embedded
xref shows over 2000 failures: http://tinyurl.com/5q7ops

This happens only in 6.0-rpl, from push 31. However, push 31 and 30 only fixed build problems, so the problems was really introduced by push 29:
https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-rpl&order=29
Warning: this push contains a very big changeset. If it causes Firefox to hang, press 'escape' quickly to stop loading the page.
[11 Nov 2008 15:18] Sven Sandberg
Please note that the xref takes very long to load (several minutes). Have patience :-)
[1 Dec 2008 15:22] Kevin Lewis
John,  It seems strange that this assert would occur using the new MTR but not the old one.  

storage\falcon\SyncObject.cpp(132):	ASSERT(lockState == 0);

This is a syncObject that is released while still active.  We have not seen this for a long time, not that it cannot occur.  Can you try to reproduce this and add a more thorough description of how it can be reproduced?  Does it really require the newer MTR?
[16 Dec 2008 10:27] Alfranio Correia
This failure happens in several tests:

main.max_length_for_sort_data_basic
main.mix2_myisam_ucs2
main.ndb_log_update_as_write_basic
main.slow_launch_time_basic
main.table_open_cache_basic
main.collation_database_func

mysqltest got signal 6
read_command_buf at 0x1061e60 = 
conn->name at 0x1bcca70 is an invalid pointer
conn->cur_query at (nil) is an invalid pointer
Attempting backtrace...
stack_bottom = (nil) thread_stack 0x40000
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(my_print_stacktrace+0x2e)[0x5eab6e]
/lib64/libc.so.6[0x2b07f399bb20]
/lib64/libc.so.6(gsignal+0x35)[0x2b07f399baa5]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_ZN5Error5errorEPKcz+0x102)[0xa0bca2]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_ZN10SyncObjectD1Ev+0x26)[0x92ba56]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_ZN5CacheD0Ev+0x131)[0x9ec421]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_ZN8DatabaseD0Ev+0x21b)[0x9fc55b]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_ZN15StorageDatabase5closeEv+0x2e)[0x91fafe]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_ZN14StorageHandler15shutdownHandlerEv+0x4a)[0x922a6a]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_ZN16StorageInterface5panicEP10handlerton17ha_panic_function+0x16)[0x91b936]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_Z22ha_finalize_handlertonP13st_plugin_int+0x2b)[0x821a5b]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded[0x661576]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded[0x66463a]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(_Z15plugin_shutdownv+0x15a)[0x664e9a]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(end_embedded_server+0xa7)[0x5c2727]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(mysql_server_end+0x19)[0x5be949]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded[0x5abcd8]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(main+0xba3)[0x5b6533]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2b07f3989154]
/data0/pushbuild/pb/bzr_mysql-6.0-rpl/125/mysql-6.0.9-alpha-pb125/libmysqld/examples/mysqltest_embedded(__gxx_personality_v0+0xc1)[0x5a9109]
terminate called after throwing an instance of 'SQLError'

See https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=bzr_mysql-6.0-rpl&entry=Alfranio...
[19 Dec 2008 15:47] John Embretsen
Unable to reproduce in local lab. Will try to test manually on pushbuild hosts after vacation (i.e. January 2009), but if anyone wants to look at this in the mean time, go ahead...
[12 Jan 2009 15:00] Kevin Lewis
Svoj,  John has been trying to reproduce this with no luck.  Asigning to you to try to reproduce.  This may give you a chance to study the SyncObject a little.  I'm not sure what we can do about this if we cannot reproduce it.
[16 Jan 2009 15:32] John Embretsen
I'm not able to reproduce manually on pushbuild servers either (sapsrv1, sapsrv2), at least not on mounted disks (I did not have access to /data0). Saw other sporadic issues, but those seemed unrelated and probably related to full disks. 

This issue still occurs in Pushbuild, though, so I hope svoj or others will be able to shed some light on this.
[19 Jan 2009 14:01] John Embretsen
Bug#42184 (Crash in Falcon on shutdown) describes steps that would lead to the same assertion failing.
[21 Jan 2009 10:39] John Embretsen
Arbitrary test to be used for reproducing bug 40633. Run multiple times with --mem and --embedded.

Attachment: repro40633.test (application/octet-stream, text), 267 bytes.

[21 Jan 2009 10:45] John Embretsen
Using the attached test 'repro40633' I was actually able to reproduce this issue exactly once on sapsrv1. Command line was:

./mtr --mem --embedded --repeat=1000 repro40633 | tee repro40633.out

The test failed with the assertion after about 630 runs.
I have repeated this procedure a few times more (after adding the option --mysqld=--core-file), however the test has not failed again. I will continue to run this test for a while to see if it happens again.
[26 Jan 2009 10:25] John Embretsen
GDB output of "backtrace full" using core file generated upon crash of repro40633 test after 895 runs on saprv1

Attachment: 40633-backtrace-full.txt (text/plain), 9.63 KiB.

[26 Jan 2009 10:43] John Embretsen
While trying to reproduce this on sapsrv2 (pushbuild host, Linux), I saw a different assertion being hit, after having run the same test 5584 times in a single run of mysql-test-run.pl:

main.repro40633                          [ pass ]      4
(...)
main.repro40633                          [ pass ]      4
main.repro40633                          [ fail ]
        Test ended at 2009-01-23 16:36:07

CURRENT_TEST: main.repro40633
mysqltest_embedded: pthread_mutex_lock.c:80: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
mysqltest got signal 6

GDB stack traces will be attached shortly.
[26 Jan 2009 10:45] John Embretsen
GDB stack traces from "new" assertion described in previous comment (`mutex->__data.__owner == 0').

Attachment: 40633-newassert-backtrace.txt (text/plain), 15.08 KiB.

[26 Jan 2009 19:02] Vladislav Vaintroub
I have reproduced the crash on my machine.
The syncObject in question is Cache::syncWait. It is not a usual synchronization object (there is no Sync guarding it that will automatically unlocked when a function calling it returns). It is rather used as an event, I believe to prevent multiple flushes running in parallel.

My guess is that Cache::flush locks it at a time, where io threads cannot unlock it anymore, because as they become inactive or shut down.

A quick fix would be to set syncWait.lockState to None prior to return from 
~Cache::Cache(). I dunno what would be a good fix though
[26 Jan 2009 20:01] Olav Sandstå
Also in the core reproduced by John earlier today it is the Cache::syncWait
object that is unlocked and lead to the assert:

(gdb) print syncWait
$1 = {<SynchronizationObject> = {_vptr.SynchronizationObject = 0xf18f70}, 
  monitorCount = 0, mutex = {<SynchronizationObject> = {
      _vptr.SynchronizationObject = 0xe85410}, holder = 0x0, mutex = {
      __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, 
        __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\0' <repeats 39 times>, __align = 0}}, queue = 0x0, 
  exclusiveThread = 0x2aaaaaed90c8, readers = 0, waiters = 0, lockState = -1, 
  stalls = 0, objectId = 45, sharedCount = 0, collisionCount = 0, 
  exclusiveCount = 1, waitCount = 0, queueLength = 0, location = 0x0, 
  name = 0xf362e7 "Cache::syncWait"}

I agree with Wlad that we can fix this by just clearing the lock in the Cache destructor to avoid that we hit this assert. This should be safe since the lock anyway is getting destructed. Still, I think it will be good to try to see if it is a better solution to make the IO thread clear the lock when it is stopped.
[26 Jan 2009 20:02] Kevin Lewis
Cache::~Cache() could use syncWait->isLocked() to determine if it needs to be unlocked, now that the cache is going away.
[26 Jan 2009 20:16] Vladislav Vaintroub
That syncWait is locked might indicate that checkpoint was not complete, maybe because io threads were shutting down?
[27 Jan 2009 10:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/64111

2795 Olav Sandstaa	2009-01-27
      Fix for Bug #40633 Falcon assertion "lockState == 0" failed in SyncObject.cpp
      
      During shutdown it can happen that the IO threads terminate while the Cache.syncWait lock is still
      held. To avoid that this leads to an assert in the SyncObject desctrutor we release this lock
      in the Cache destructor.
[27 Jan 2009 19:01] Kevin Lewis
Olav,  Actually, I don't think this solution is quite good enough.

There is a possible race condition after the current patch; All ioThreads are barely finished when one more flush comes in. The syncWait is unlocked by all ioTheads which shutdown, but it is locked just after that by a flush.  I think that Cache needs to have a boolean controlled by Cache::flushLock which tells if there are any active ioThreads to signal.  Cache::flush() can check it before locking syncWait.  And the ioThread can set it when leaving.
[29 Jan 2009 11:42] Olav Sandstå
The initial fix/patch for this problem has been pushed into the mysql-6.0-rpl branch.

A follow-up patch that contains a more complete fix that includes avoiding that the IO threads exits while having the syncWait lock is planned for inclusion in the falcon tree.
[30 Jan 2009 13:28] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)
[9 Feb 2009 14:38] Olav Sandstå
This crash occurs because Falcon seemingly "forget" to unlock the
syncWait lock - or in another way: It looks the syncWait lock once
more than it unlocks it.

By adding tracing to the code I have been able to identify two cases
that can lead to this error (these are also the same as suggested by
Kevin) (and eliminate several other scenarios):

1. IO threads do not see a flush request:

   If two threads are doing issuing a flush request almost
   simultanously (for instance when starting Falcon and then just
   shutting it down) and the IO threads are just finishing a flush,
   then the following might happen:

   1. Both threads doing the flush request will succeed in calling
   syncWait.lock(NULL, Exclusive). 

   2. The IO threads will see this as a single flush reqest and thus
   only call syncWait.unlock() one.

   When Falcon shutdown is done the assert happens.

   The cause for this problem is that we in Cache::ioThread() when
   finishing a flush releases the flushLock before unlocking the
   syncWait lock.

   Solution: By reversing the unlocking of flushLock and syncWait in
   Cache::ioThread() the flushLock will prevent that this can happen.

2. IO threads shutdown with active flush request

   If just as the IO threads are completing a flush request and then
   another thread is issuing a flush reqest followed by a shutdown
   request, the IO threads might only see the shutdown requests - and
   shutdown without executing the flush request:

   1. The IO threads are just finished with the flush operation and
   sets theflushing flag to false and then unlocks the syncWait and
   flushLock

   2. A thread comes in an do a flush request that is successful

   3. A thread (most likely the same as in 2) request shutdown of the cache

   4. The IO threads continue: checks if there is a shutdown request
   and if there is, they shut down without noticing that there was a
   flush request (and without releasing the syncWait lock

   Solution: The IO thread when checking for shutdown it should also
   check for a pending flush request at the same time (and either
   execute this before it shutdowns or at least release the syncWait
   lock).
[9 Feb 2009 14:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65622

3010 Olav Sandstaa	2009-02-09
      Fix for Bug #40633 Falcon assertion "lockState == 0" failed in SyncObject.cpp
      
      This fix contains the following changes to avoid that we can shutdown and
      delete the cache with the waitSync lock still locked:
      
      1. Reverse the order for how IO threads unlocks syncWait and
         flushLock to avoid that more than one thread is able to succeed with
         a flush request (ie. to avoid that the syncWait lock is successfully
         locked more than once).
      
      2. Make the IO threads check for any pending flush when they are told to
         shutdown to avoid that they shutdown without doing the pending flush.
      
      3. Remove the temporary fix that checked the status of the syncWait lock in the 
         Cache's destructor and unlocked it if it was locked.
[9 Feb 2009 15:25] Kevin Lewis
I think there is still a race condition here.  As soon as the ioThread unlocks flushLock one last time, the thread calling Cache::flush() may be waiting to get it.  Cache::flush() does not check if threads are in shutdown before setting syncWait and trying to wake them.  So I think the Cache object can still get to its destruction with syncWait locked.

There is a check for {if (flushing) return;}  in Cache::flush() that is protected by flushLock.  Why not use that boolean, or better yet, a new one called Cache::shuttingDown that is also protected by flushLock?
[9 Feb 2009 21:29] Olav Sandstå
Kevin, thanks for looking at the patch. This was one of the cases that I considered might happen and cause this problem. The reason for not including it was that I was unable to see how it could happen. The call to Cache::shutdown() is done very late in the Database::shutdown(). At this point the internal scheduler and the serial log is both successfully shutdown. I had problems to find a scenario where "something" did a call to Database::flush() and Cache::flush() after both the internal scheduler and the serial log was shut down.

Still, I think you suggestion about adding this extra check is a good idea. It will make the Cache code more robust and the code will be able to handle it gracefully if there are changes on how and when we are flushing the cache (but ideally, calling flush() on a object where we have already called shutdown() should lead to an error instead of gracefully being ignored?).

I will make an updated version of the patch that adds a check for the cache being already shutdown when a flush call is done.
[10 Feb 2009 10:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65717

3010 Olav Sandstaa	2009-02-10
      Fix for Bug #40633 Falcon assertion "lockState == 0" failed in SyncObject.cpp
            
      This fix contains the following changes to avoid that we can shutdown
      and delete the cache with the waitSync lock still locked:
            
      1. Reverse the order for how IO threads unlocks syncWait and flushLock
         to avoid that more than one thread is able to succeed with a flush
         request (ie. to avoid that the syncWait lock is successfully locked
         more than once).
            
      2. Make the IO threads check for any pending flush when they are told
         to shutdown to avoid that they shutdown without doing the pending
         flush.
      
      3. Introduce a new state variable that is set when Cache::shutdown()
         is called. After this is set new Cache::flush() requests will be
         ignored.
            
      4. Remove the temporary fix that checked the status of the syncWait
         lock in the Cache's destructor and unlocked it if it was locked.
[10 Feb 2009 15:08] Kevin Lewis
OK to push
[25 Feb 2009 22:11] Olav Sandstå
After having been able to reproduce this with core files several times it seems like the scenario that triggers this bug is due to very short embedded tests where the following happens.

1. The thread starting Falcon is creating the page cache (Cache) and creating the two IO threads.

2. A requests for a flush is done. This sets an exclusive lock on the syncWait SyncObject.

3. Falcon is shutdown immediately. The page cache and its two IO threads are shut down. This sets the IO thread's shutdownInProgress flag on each of the IO threads.

4. The IO thread's OS threads are scheduled for running for the first time. They see the shutdownInProgress flag being set. So they shutdown without having even called the Cache::ioThreads() functionality.

This make the IO threads exit without unlocking the syncWait syncObject - and the assert hits when the Cache's destructor is run.

Proposed solution: In step 1 above: wait until the IO threads have started to run before continuing (alternatively do this waiting in step 3: check that the IO threads have started to run before shutting them down).
[26 Feb 2009 14:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67692

3039 Olav Sandstaa	2009-02-26
      Fix for BUG#40633 Falcon assertion "lockState == 0" failed in SyncObject.cpp
      
      The cause for this crash was that when running the embedded test
      suite, Falcon is just started and then for many test stopped almost
      immediately. This could result in that the IO threads never started to
      run. As a result the Cache could be shutdown with a pending flush
      request and the corresponding syncWait lock locked.
      
      This fix solves this problem by waiting for the IO threads to start
      before returning from the initialization of the Page Cache.
[26 Feb 2009 16:29] Kevin Lewis
Latest patch reviewed. OK to push
[2 Mar 2009 14:12] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090302140208-lfdejjbcyezlhhjt) (version source revid:vvaintroub@mysql.com-20090211225555-gcuo9fv97xlhydd8) (merge vers: 6.0.10-alpha) (pib:6)
[3 Mar 2009 22:03] Olav Sandstå
Setting the state back to "patch queued" as it is only the first two out of three patches that have been merged into mysql-6.0.
[2 Apr 2009 17:38] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090402144811-yc5kp8g0rjnhz7vy) (version source revid:olav@sun.com-20090226145903-b1pv91djifpq1yrd) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 16:17] MC Brown
Internal/test fix. No changelog entry required.