Bug #39440 Maria crash in _ma_remove_not_visible_states_with_lock()
Submitted: 14 Sep 2008 12:25 Modified: 25 Dec 2008 19:19
Reporter: Philip Stoev Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Maria storage engine Severity:S1 (Critical)
Version:6.0.7,6.0.9 OS:Any
Assigned to: CPU Architecture:Any

[14 Sep 2008 12:25] Philip Stoev
Description:
When executing a tpcb-like scenario (rpl_sys SystemQA test), maria crashed as follows:

libc.so.1`_lwp_kill+8(b, 5e7000, b, ff11c35c, ff172e50, ff2010)
handle_segfault+0x20c(b, 0, fe8fba68, ff174784, 0, 0)
libc.so.1`__sighndlr+0xc(b, 0, fe8fba68, 1b22b0, 0, 0)
libc.so.1`call_user_handler+0x3b8(b, 0, 12, 0, ff013a00, fe8fba68)
_ma_remove_not_visible_states_with_lock+0x74(50a18a8, 1000, ff173700, ff013a00, 50a1d00, 0)
really_execute_checkpoint+0x5e4(15, fe8fbee0, 1, 40d0ca4, 23, 437e280)
ma_checkpoint_background+0x348(1e, 2, ff1400, ff1400, 906, 6dd800)
libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)

How to repeat:
If this happens again, a test case will be provided. In the meantime, a Solaris core + binary from SunStudio are available for debugging.
[14 Sep 2008 20:21] Philip Stoev
Here is a better stack trace from Linux:

#0  0x0000003ba880b132 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000644dbe in handle_segfault (sig=6) at mysqld.cc:2660
#2  <signal handler called>
#3  0x0000003ba8030055 in raise () from /lib64/libc.so.6
#4  0x0000003ba8031af0 in abort () from /lib64/libc.so.6
#5  0x0000003ba806824b in __libc_message () from /lib64/libc.so.6
#6  0x0000003ba806f4f4 in _int_free () from /lib64/libc.so.6
#7  0x0000003ba8072b1c in free () from /lib64/libc.so.6
#8  0x0000000000a21bc3 in _ma_remove_not_visible_states_with_lock (share=0x2aaabc052b20) at ma_state.c:156
#9  0x0000000000a61acf in really_execute_checkpoint () at ma_checkpoint.c:1062
#10 0x0000000000a626d4 in ma_checkpoint_background (arg=<value optimized out>) at ma_checkpoint.c:132
#11 0x0000003ba88062f7 in start_thread () from /lib64/libpthread.so.0
#12 0x0000003ba80ce85d in clone () from /lib64/libc.so.6

151       {
152         next= history->next;
153         if (!trnman_exists_active_transactions(history->trid, last_trid,
154                                                trnman_is_locked))
155         {
156           my_free(history, MYF(0)); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
157           continue;
158         }
159         *parent= history;
160         parent= &history->next;
[14 Sep 2008 20:22] Philip Stoev
The error printed to STDERR was 

*** glibc detected *** /data1/6.0.7/6.0.7_x64/bin/mysqld: free(): invalid pointer: 0x00002aaab42f7b38 ***
[26 Sep 2008 10:56] Michael Widenius
Please recompile with safe_malloc (configure option --debug=full) and retry.

safe_malloc will give us more information about what could have gone wrong
[20 Oct 2008 14:01] Guilhem Bichot
Saw that once when running maria_bulk_insert.yy
[21 Nov 2008 10:00] Guilhem Bichot
Here's how I run the .yy script:

./runall.pl --basedir=/m/bzrrepos/mysql-maria --engine=Maria --grammar=conf/maria_bulk_insert.yy --queries=100000 --reporters=Deadlock

Note that I got it only once. Other times I get other problems (like BUG#40579).
[10 Dec 2008 17:30] Michael Widenius
I tried this test 3 times on my 64 bit linux system on Mysql-Maria and it worked for me every time.

As there has been one critical fix in the history_state area, which is where this test case crashed, one critical fix in the transaction mannager when accessing freed memory and a fix in the page handler which fixes the bug that Guilhem refered to, I think it's very likely that this bug is fixed.

If this bug happens again after next MySQL-Maria -> MySQL-6,0-maria merge, please reopen the bug
[23 Dec 2008 11:21] Philip Stoev
This bug is present in 6.0.9 both debug and non-debug binaries, with the crash happening immediately after takeoff on the iuds6 test.

Running the debug binary unfortunately does not provide any safemalloc output.

Current 6.0-maria is not affected so hopefully future releases will not be affected.
[25 Dec 2008 19:19] Philip Stoev
Actually the 6.0.9 crash is in _ma_remove_not_visible_states(), which is bug 41395