Bug #56521 Assertion failed: (m_state == 2), function allocated_to_free, pfs_lock.h (138)
Submitted: 2 Sep 2010 20:51 Modified: 3 Oct 2010 1:35
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S2 (Serious)
Version:5.5.6-m3 OS:MacOS (10.5, x86_64)
Assigned to: Marc ALFF CPU Architecture:Any
Tags: assertion, crash

[2 Sep 2010 20:51] Joerg Bruehe
Description:
Found in the release build of 5.5.6-m3:

=====
perfschema.start_server_no_cond_class    [ retry-fail ]
        Test ended at YYYY-MM-DD HH:MM:SS

CURRENT_TEST: perfschema.start_server_no_cond_class
CURRENT_TEST: perfschema.start_server_no_cond_class

The server [mysqld.1 - pid: PID, winpid: PID, exit: 256] crashed while running 'check warnings'
Server log from this test:
YYMMDD HH:MM:SS [Warning] Setting lower_case_table_names=2 because file system for /Volumes/hd3/pb2/test/sb_2-2225848-1283375676.91/mysql-5.5.6-m3-osx10.5-x86_64/mysql-test/var/m
ysqld.1/data/ is case insensitive
YYMMDD HH:MM:SS [Note] Plugin 'FEDERATED' is disabled.
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use InnoDB's own implementation
InnoDB: Compressed tables use zlib 1.2.3
YYMMDD HH:MM:SS  InnoDB: highest supported file format is Barracuda.
YYMMDD HH:MM:SS InnoDB 1.1.2 started; log sequence number LSN
YYMMDD HH:MM:SS [Note] Event Scheduler: Loaded 0 events
YYMMDD HH:MM:SS [Note] /PATH/mysqld: ready for connections.
Version: 'VERS' socket: '/PATH/SOCK' port: PORT MySQL COMMENT
Assertion failed: (m_state == 2), function allocated_to_free, file /Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/mysql-5.5.6-m3/storage/perfschema/pfs_lock.h, line 138.
YYMMDD HH:MM:SS - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = NUMBER K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x40000
0   mysqld                              0x00000000003f99b5 my_print_stacktrace + 57^@
1   mysqld                              0x00000000000ee14e handle_segfault + 590^@
2   libSystem.B.dylib                   0x00000000819a33fa _sigtramp + 26^@
3   ???                                 0x000000000519a170 0x0 + 85565808^@
4   mysqld                              0x00000000000f6a0f _Z11mysqld_mainiPPc + 3577^@
5   mysqld                              0x00000000000016af main + 9^@
6   mysqld                              0x0000000000001684 start + 52^@
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
=====

The same "Assertion failed" (with the crash) happens in these tests:
perfschema.aggregate
perfschema.dml_events_waits_history
perfschema.start_server_no_cond_class   (log above)
perfschema.start_server_no_cond_inst
perfschema.start_server_no_file_class
perfschema.start_server_no_file_inst
perfschema.start_server_no_mutex_class
perfschema.start_server_no_mutex_inst
perfschema.start_server_on

How to repeat:
Run the full test suite.
[2 Sep 2010 23:39] Marc ALFF
Requesting the following feedback:

What exactly are the configure / compiling options used ?

I am asking because I am surprised to see a DBUG_ASSERT in a "release" build.

Also, I have been running the performance schema test suite several times a day
in development in debug (with DBUG_ASSERT), for several months, on a Mac,
and I never had such failures (and the related code has been there for ages).

One point to clarify is which implementation of "my_atomic" was used
in the build, as there are several choices.

Thanks,
-- Marc
[3 Sep 2010 12:01] Joerg Bruehe
Requested info:

> Requesting the following feedback:

> What exactly are the configure / compiling options used ?

5.5.6 release build via PB2 using cmake, here is the line:

+ cmake /Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/mysql-5.5.6-m3 -DBUILD_CONFIG=mysql_release -DINSTALL_LAYOUT=STANDALONE -DCMAKE_OSX_ARCHITECTURES=x86_64 -DCMAKE_BUILD_TYPE=Debug '-DCOMPILATION_COMMENT=MySQL Community Server - Debug (GPL)' -DFEATURE_SET=community

> I am asking because I am surprised to see a DBUG_ASSERT
> in a "release" build.

Our binary packages also include a debug server, and the assertion is thrown when that server runs the test suite.

> Also, I have been running the performance schema test suite
> several times a day in development in debug (with DBUG_ASSERT),
> for several months, on a Mac,
> and I never had such failures
> (and the related code has been there for ages).

Now that more test logs are in, I can tell that this problem only happens on OS X 10.5, x86_64. I have updated the "OS" field.
With 32 bit, this problem doesn't show, and also on OS X 10.6.
(However, another assertion is triggered in "lowercase_table2", but that test is labeled "exp-fail".)

> One point to clarify is which implementation of "my_atomic"
> was used in the build, as there are several choices.

The cmake log isn't very informative here, for the module it just shows this:

[ 23%] Building C object mysys/CMakeFiles/mysys.dir/my_atomic.c.o
cd /Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/debug/mysys && /usr/bin/gcc  -DHAVE_CONFIG_H -DUSE_MYSYS_NEW -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.5.sdk  -Wall -O -g -fno-common -DENABLED_DEBUG_SYNC -I/Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/debug/include -I/Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/mysql-5.5.6-m3/zlib -I/Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/mysql-5.5.6-m3/include -I/Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/mysql-5.5.6-m3/mysys   -o CMakeFiles/mysys.dir/my_atomic.c.o   -c /Volumes/hd2/pb2/build/sb_0-2225408-1283270786.88/mysql-5.5.6-m3/mysys/my_atomic.c

Maybe this helps:

-- Performing Test HAVE_GCC_ATOMIC_BUILTINS
-- Performing Test HAVE_GCC_ATOMIC_BUILTINS - Failed
.....
-- Performing Test HAVE_IB_GCC_ATOMIC_BUILTINS
-- Performing Test HAVE_IB_GCC_ATOMIC_BUILTINS - Failed
-- Performing Test HAVE_IB_ATOMIC_PTHREAD_T_GCC
-- Performing Test HAVE_IB_ATOMIC_PTHREAD_T_GCC - Failed

Sorry, "ato" (ignoring case) doesn't have any other hits.

Looking into the code, you might want to know about "MY_ATOMIC_MODE_DUMMY" and "SAFE_MUTEX", but neither of these are logged in any way.
[9 Sep 2010 19:56] Marc ALFF
Analysis

Possible root cause of this crash found so far are:

1)

Somehow, the internal memory in the performance schema is wiped while threads are still running and making performance schema calls.
This is known to be the case during shutdown, see bug#56324 and bug#56666.

However, these race conditions are very rare, and this possible root cause does not explain much more frequent test failures where several tests do fail in a row, as reported.

So I think 1) is not the root cause here

2)

The performance schema implementation is lock free, in the sense that it does not use any mutex / rwlock / etc internally.
This is achieved by using atomic operations implemented by MY_ATOMIC.

This assumes that these atomic operations are actually functional.
The failed assertion that got triggered is precisely about the internal integrity of the internal atomic lock state used in pfs_lock.h, implemented with MY_ATOMIC.

Given the build logs:

-- Performing Test HAVE_GCC_ATOMIC_BUILTINS
-- Performing Test HAVE_GCC_ATOMIC_BUILTINS - Failed
.....
-- Performing Test HAVE_IB_GCC_ATOMIC_BUILTINS
-- Performing Test HAVE_IB_GCC_ATOMIC_BUILTINS - Failed
-- Performing Test HAVE_IB_ATOMIC_PTHREAD_T_GCC
-- Performing Test HAVE_IB_ATOMIC_PTHREAD_T_GCC - Failed

there is a good possibility that the MY_ATOMIC implementation picked 
was MY_ATOMIC_MODE_DUMMY. This would also explain why this sudden crashes only did occur on a specific platform/system headers/compiler configuration.

The performance schema code should not be built under this configuration, but this goes undetected.

This by itself is a verified bug that needs to be resolved.
[9 Sep 2010 20:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117934

3205 Marc Alff	2010-09-09
      Bug#56521 Assertion failed: (m_state == 2), function allocated_to_free, pfs_lock.h (138)
      
      Before this fix, it was possible to build the server:
      - with the performance schema
      - with a dummy implementation of my_atomic (MY_ATOMIC_MODE_DUMMY).
      
      In this case, the resulting binary will just crash,
      as this configuration is not supported.
      
      This fix enforces that the build will fail with a compilation error in this
      configuration, instead of resulting in a broken binary.
[9 Sep 2010 20:36] Christopher Powers
Patch approved.
[9 Sep 2010 21:20] Marc ALFF
Pushed into:
- mysql-5.5-bugfixing
- mysql-trunk-bugfixing
- mysql-next-mr-bugfixing
[13 Sep 2010 13:17] Joerg Bruehe
For the record:

1) When this patch was applied to 5.5-bugfixing, it introduced compile errors on Windows. They were fixed by Tor who changed the include order:

=== modified file 'storage/perfschema/ha_perfschema.cc'
--- storage/perfschema/ha_perfschema.cc 2010-09-09 20:28:47 +0000
+++ storage/perfschema/ha_perfschema.cc 2010-09-10 09:10:38 +0000
@@ -19,8 +19,8 @@
 */ #include "my_global.h"
-#include "my_atomic.h"
 #include "my_pthread.h"
+#include "my_atomic.h"
 #include "sql_plugin.h" #include "mysql/plugin.h" #include "ha_perfschema.h"

2) I applied both patches to the 5.5.6-rc build tree,
   and compiles on OS X still succeed.
So while this patch may be good in its own right, by preventing a broken combination, it does not seem to address the root cause of the problem.
[13 Sep 2010 21:59] Marc ALFF
Created bug#56760 to resolve the remaining failures.
[19 Sep 2010 20:04] Bugs System
Pushed into mysql-5.5 5.5.7-rc (revid:joerg@mysql.com-20100919200112-sandxxbu83pzc73w) (version source revid:joerg@mysql.com-20100919200112-sandxxbu83pzc73w) (merge vers: 5.5.7-rc) (pib:21)
[24 Sep 2010 13:25] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:kostja@sun.com-20100924132225-4q1jcqf55ktbsduz) (version source revid:kostja@sun.com-20100924132225-4q1jcqf55ktbsduz) (merge vers: 5.6.1-m4) (pib:21)
[24 Sep 2010 13:35] Bugs System
Pushed into mysql-next-mr (revid:kostja@sun.com-20100924133252-dayzhtbdbye7rwbp) (version source revid:kostja@sun.com-20100924133252-dayzhtbdbye7rwbp) (pib:21)
[24 Sep 2010 23:05] Paul DuBois
Noted in 5.5.7, 5.6.1 changelogs.

It was possible to compile mysqld with Performance Schema support but
with a dummy atomic-operations implementation, which caused a server
crash.
[28 Sep 2010 15:23] Paul DuBois
Revised changelog entry:

It was possible to compile mysqld with Performance Schema support but
with a dummy atomic-operations implementation, which caused a server
crash. This problem does not affect binary distributions. It is
helpful as a safety measure for users who build MySQL from source.
[2 Oct 2010 18:13] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alexander.nozdrin@oracle.com-20101002180948-852x1cuv7c6i85ea) (version source revid:alexander.nozdrin@oracle.com-20101002180857-an32jpuwzemsp4f2) (merge vers: 5.6.1-m4) (pib:21)
[2 Oct 2010 18:14] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[2 Oct 2010 18:16] Bugs System
Pushed into mysql-5.5 5.5.7-rc (revid:alexander.nozdrin@oracle.com-20101002180831-590ka2tuit9qoxbb) (version source revid:alexander.nozdrin@oracle.com-20101002180831-590ka2tuit9qoxbb) (merge vers: 5.5.7-rc) (pib:21)