Bug #58653 Sporadic crash due to assertion failure 0 == space->n_pending_flushes
Submitted: 2 Dec 2010 11:02 Modified: 12 Jan 2011 22:46
Reporter: Alexander Nozdrin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.1-plugin, 5.5.8 OS:Any
Assigned to: Sunny Bains CPU Architecture:Any
Tags: pb2, sporadic, test failure

[2 Dec 2010 11:02] Alexander Nozdrin
Description:
Tree: http://pb2.norway.sun.com/?template=show_pushes&branch=mysql-5.5.8-release

Test case: sys_vars.innodb_adaptive_flushing_basic

Platform: solaris10-x86-32bit community tar-gz

Log: http://pb2.norway.sun.com/?action=archive_download&archive_id=2615001&pretty=please

That's a spurious failure: the test passed on the 2nd and 3rd rounds.

Symptoms:
101201  2:17:36  InnoDB: Assertion failure in thread 1 in file /export/home2/pb2/build/sb_0-2612415-1291142674.25/mysql-5.5.8/storage/innobase/fil/fil0fil.c line 1377
InnoDB: Failing assertion: 0 == space->n_pending_flushes

Stack trace:
conn->cur_query at 956a568 = SELECT @@global.innodb_adaptive_flushing
Attempting backtrace...
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:my_print_stacktrace+0x2c
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x14b006
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x14b04a
/lib/libc.so.1:0xa4a4f
/lib/libc.so.1:0x9ae72
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x77aceb [ Signal 11 (SEGV)]
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x77b4c9
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x8100a0
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x4b0b11
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x424bf3
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:__1cWha_finalize_handlerton6FpnNst_plugin_int__i_+0x4f
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x2cd62f
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x2cd970
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:__1cPplugin_shutdown6F_v_+0x107
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x1c35c2
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:end_embedded_server+0x32
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:mysql_server_end+0x2a
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:__1cQfree_used_memory6F_v_+0x208
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:0x141174
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:main+0x173b
/export/home/pb2/test/sb_2-2613309-1291152569.05/mysql-5.5.8-solaris10-i386/bin/mysqltest_embedded:_start+0x7d

How to repeat:
Check out PB.
[2 Dec 2010 11:07] Alexander Nozdrin
It's not clear if this is a regression or not: the crash
hasn't been seen before, but it's sporadic and happened
only once on one particular platform.
[2 Dec 2010 11:30] Vasil Dimov
backtrace of all threads

Attachment: bt.txt (text/plain), 12.90 KiB.

[7 Dec 2010 3:19] Sunny Bains
One thing I can confirm is that it was not introduced by the fix for bug# 54538.
This looks like a bug that has always existed, at least as far back as 5.0 and
probably all versions prior to that. InnoDB just didn't free the space memory
in fil_close_all_files() in versions prior to the plugin. In the plugin we introduced
code that frees all the memory during shutdown and there is an assertion in that
code that has highlighted  the race condition. It is surprising that we
haven't hit this before.

We release the fil_system->mutex in fil_flush() and then call os_file_flush(). It
is possible for another thread to call fil_close_all_files() during that flush.
fil_close_all_files() will then free all the spaces. This particular path is only
possible during the shutdown phase.
[7 Dec 2010 9:04] Sunny Bains
Scratch the previous analysis. This is related to bug# 54617. While trying
to reproduce this bug I was able to reproduce bug# 54617. The underlying
issue is the same. I stalled the master thread by calling os_thread_wait() in
GDB just before it reserved the mutex to increment srv_n_threads_active[].
If we stall the master thread before it reserves a slot and if in the meantime
a shutdown is triggered then the shutdown code can't detect that the master
thread is active because srv_n_threads_active has not been incremented at that
stage by the master thread.

It shouldn't really affect production systems because there we don't have
such quick start/shutdown scenario.
[8 Jan 2011 15:08] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:vasil.dimov@oracle.com-20110108150732-8qygun7nuaqf9d3u) (version source revid:vasil.dimov@oracle.com-20110108150646-44c9j7ck64ocey31) (merge vers: 5.6.2) (pib:24)
[8 Jan 2011 15:09] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:vasil.dimov@oracle.com-20110108150508-gpanhz48z8069qot) (version source revid:vasil.dimov@oracle.com-20110108150048-b1y9m8xe72hay0ch) (merge vers: 5.5.9) (pib:24)