Bug #38474 Maria crash in _ma_setup_live_state (pushbuild2)
Submitted: 30 Jul 2008 21:09 Modified: 10 Mar 2009 14:33
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Maria storage engine Severity:S1 (Critical)
Version:6.0-bzr OS:Any
Assigned to: Michael Widenius CPU Architecture:Any

[30 Jul 2008 21:09] Philip Stoev
Description:
When executing a concurrent workload, maria crashed as follows:

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00581c78 in pthread_kill () from /lib/libpthread.so.0
#2  0x08900683 in my_write_core (sig=11) at stacktrace.c:307
#3  0x082f38b9 in handle_segfault (sig=11) at mysqld.cc:2654
#4  <signal handler called>
#5  0x087a76c2 in _ma_setup_live_state (info=0x99257d0) at ma_state.c:91
#6  0x0878a2c2 in ha_maria::external_lock (this=0x985b0c8, thd=0x973c208, lock_type=1) at ha_maria.cc:2163
#7  0x084163e6 in handler::ha_external_lock (this=0x985b0c8, thd=0x973c208, lock_type=1) at handler.cc:5273
#8  0x084263ee in ha_partition::external_lock (this=0x9856348, thd=0x973c208, lock_type=1) at ha_partition.cc:2555
#9  0x084163e6 in handler::ha_external_lock (this=0x9856348, thd=0x973c208, lock_type=1) at handler.cc:5273
#10 0x082e9ba7 in lock_external (thd=0x973c208, tables=0x99354e0, count=1) at lock.cc:389
#11 0x082eaeb6 in mysql_lock_tables (thd=0x973c208, tables=0x984c440, count=1, flags=0, need_reopen=0xad32f3ff) at lock.cc:290
#12 0x083484f1 in lock_tables (thd=0x973c208, tables=0x97c8e20, count=1, flags=0, need_reopen=0xad32f3ff) at sql_base.cc:4407
#13 0x0834ecd5 in open_and_lock_tables_derived (thd=0x973c208, tables=0x97c8e20, derived=true, flags=0) at sql_base.cc:4112
#14 0x08310ca5 in open_and_lock_tables (thd=0x973c208, tables=0x97c8e20) at ../../sql/mysql_priv.h:1613
#15 0x083aa01d in mysql_delete (thd=0x973c208, table_list=0x97c8e20, conds=0x97c93b8, order=0x97c8858, limit=18446744073709551615, options=0,
    reset_auto_increment=false) at sql_delete.cc:54
#16 0x083081ca in mysql_execute_command (thd=0x973c208) at sql_parse.cc:3076
#17 0x084a35ab in sp_instr_stmt::exec_core (this=0x97c94a8, thd=0x973c208, nextp=0xad33031c) at sp_head.cc:2885
#18 0x084a37f3 in sp_lex_keeper::reset_lex_and_exec_core (this=0x97c94d0, thd=0x973c208, nextp=0xad33031c, open_tables=false, instr=0x97c94a8)
    at sp_head.cc:2715
#19 0x084a9027 in sp_instr_stmt::execute (this=0x97c94a8, thd=0x973c208, nextp=0xad33031c) at sp_head.cc:2828
#20 0x084a569d in sp_head::execute (this=0x97d0eb8, thd=0x973c208) at sp_head.cc:1238
#21 0x084a62d5 in sp_head::execute_procedure (this=0x97d0eb8, thd=0x973c208, args=0x973d9e8) at sp_head.cc:1963
#22 0x0830beed in mysql_execute_command (thd=0x973c208) at sql_parse.cc:4175
#23 0x0830e1e1 in mysql_parse (thd=0x973c208, inBuf=0x97a4568 "CALL test.pinsdel2()", length=20, found_semicolon=0xad331270) at sql_parse.cc:5800
#24 0x0830ec55 in dispatch_command (command=COM_QUERY, thd=0x973c208, packet=0x9788f91 "CALL test.pinsdel2()", packet_length=20) at sql_parse.cc:1050
#25 0x0830ff1f in do_command (thd=0x973c208) at sql_parse.cc:723
#26 0x082fc99b in handle_one_connection (arg=0x973c208) at sql_connect.cc:1153
#27 0x0057d32f in start_thread () from /lib/libpthread.so.0
#28 0x0049a27e in clone () from /lib/libc.so.6

(gdb) list
86          link that is deleted by concurrently running checkpoint.
87
88          It's enough to compare trids here (instead of calling
89          tranman_can_read_from) as history->trid is a commit_trid
90        */
91        while (trn->trid < history->trid) <<<<<<<<<<<<<< HERE
92          history= history->next;
93        pthread_mutex_unlock(&share->intern_lock);
94        /* The current item can't be deleted as it's the first one visible for us */
95        tables->state_start=  tables->state_current= history->state;

(gdb) print history
$1 = (MARIA_STATE_HISTORY *) 0x0

How to repeat:
If this is repeatable, a test case will be provided.
[30 Jul 2008 22:04] Philip Stoev
Test case for  bug 38368

Attachment: bug38368.zip (application/x-zip-compressed, text), 4.21 KiB.

[30 Jul 2008 22:06] Philip Stoev
Please find attached a test case for this bug. Please place the .txt files in mysql-test and the .test files in mysql-test/t and then run:

$ perl ./mysql-test-run.pl \
  --stress \
  --stress-init-file=bug38368_init.txt \
  --stress-test-file=bug38368_run.txt \
  --stress-threads=10 \
  --skip-ndb \
  --mysqld=--log-output=file \
  --stress-test-duration=60000

A crash will happen shortly after takeoff. The _init file may include tables and views that are not used in the actual test. Please let me know if an even more simplifed test case is required.
[31 Jul 2008 21:21] Philip Stoev
Note: partitions are used in the test case.
[24 Aug 2008 13:22] Michael Widenius
I tried the test on 2 different 64 bit linux systems, with and without debug, but couldn't repeat the problem withing 30 minutes/2 hours on either system.

It's possible that my earlier fix for Bug#30816 fixed this case too.
[15 Sep 2008 9:26] Philip Stoev
This bug is still present in 6.0.7
[28 Sep 2008 14:49] Philip Stoev
Not present in 6.0-maria
[28 Sep 2008 16:22] Philip Stoev
This just happened again on 6.0-maria with a different test. I will try to provide a simple test case.
[24 Nov 2008 19:58] Guilhem Bichot
More precise post.

It recently happened in 5.1-maria:

guilhem@mysql.co...
2008-11-20 15:52:38
machine sol10 sparc32 sysqa
test maria_stress:
http://clustra.norway.sun.com/~bteam/pb2/web.py?action=archive_download&archive_id=138119&...;
pretty=please

serg@mysql.com-2...
2008-11-10 20:18:15
machine sol10 sparc32 sysqa
test maria_stress:
http://clustra.norway.sun.com/~bteam/pb2/web.py?action=archive_download&archive_id=122892&...;
pretty=please

The links above contain the exact log of what was done and stack traces.
[25 Nov 2008 10:08] Guilhem Bichot
Also seen on Nov 22 when running tests on Linux 64 bit with default engine = Maria:
main.events_restart_phase1     [ pass ]   
main.events_restart_phase2     [ pass ]   
main.events_restart_phase3     [ fail ]
mysqltest: In included file "./include/wait_condition.inc": At line 53: Error running query 'select count(distinct name)=3 from execution_log': 2013 Lost connection to MySQL server during query
(same stack trace).
[2 Dec 2008 15:23] Sergei Golubchik
another test case:

./mtr --mysqld=-#f,new_trid,trnman_new_trn,trnman_end_trn,trnman_collect_transactions:t:d:O,dbug.log --mysqld=--default-storage-engine=maria mysqlslap

this crashes almost always in_ma_setup_live_state():

while (trn->trid < history->trid && history->trid != ~(TrID)0)
   history= history->next;

when the end of a list (history==0) is reached
[2 Dec 2008 15:31] Guilhem Bichot
I can indeed get this crash in 5.1-maria:
Program received signal SIGSEGV, Segmentation fault.
0x08610ce1 in _ma_setup_live_state (info=0xb696b4e8) at ma_state.c:94
94        while (trn->trid < history->trid && history->trid != ~(TrID)0)
Current language:  auto; currently c
(gdb) p history
$2 = (MARIA_STATE_HISTORY *) 0x0

by using Serg's exact mtr command-line in his last post, to which I add --mem.
[10 Dec 2008 16:49] Michael Widenius
Fixed a bug that caused a crash at exactly this line in the MySQL-Maria tree.
Will be merged to MySQL-6.0 shortly