Bug #27871 MySQL server crashed with damaged stack trace
Submitted: 17 Apr 2007 5:57 Modified: 10 May 2007 10:55
Reporter: Yuan WANG Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.32 OS:Linux (2.6.16-2-em64t-p4-smp)
Assigned to: CPU Architecture:Any

[17 Apr 2007 5:57] Yuan WANG
Description:
We are a blog service provoder. We use multiple MySQL servers in our application. These days, some of our MySQL servers crashed several times, all with damaged stack trace.  A typical error log is given in the attachment.

How to repeat:
We can not found out how to repeat the problem. I can just try my best to describe the  symptom. The hardware setting of our server is:
  Box: Dell 2950
  CPU: Intel(R) Xeon(R) CPU            5130  @ 2.00GHz
  MEM: 4G
  Disk: Ultra SCISI 320 (Raid 0+1)
  OS: 2.6.16-2-em64t-p4-smp x86_64 GNU/Linux
  MySQL version: 5.0.30-Debian_1-log Debian etch distribution

A typical server has about 20G data and servers 500 questions per second.  We use XA transactions heavily, with about 70 XA transactions per second. In each error log of the crash, "thd->query" must be "XA START" or "XA COMMIT" statement and some prepared transactions will be found. So the problem seems to be related to XA transaction?

And it seems that heavy system load will leads to more crash.
[17 Apr 2007 5:58] Yuan WANG
A typical error log

Attachment: crash-db14-070414-1 (application/octet-stream, text), 10.29 KiB.

[17 Apr 2007 7:54] Valeriy Kravchuk
Thank you for a problem report. Please, try to use newer versions 5.0.37/5.0.38, and check if this crash is repeatable with them.
[17 Apr 2007 10:05] Sergei Golubchik
Unfortunately, "problem seems to be related to XA transaction" is a rather weak hint, it doesn't help much.

Other things you can try to do to gather more information:
1. enable core dumps (there's mysqld option, and set ulimit properly)
2. use debug version of the server
3. enable general log
[17 Apr 2007 10:14] MySQL Verification Team
I believe what Mr Stone refers to is that our 64-bit binaries cannot make proper stack traces in general.  It would appear that bug #21250 might need re-opening.
[18 Apr 2007 2:34] Yuan WANG
We have made a debug version of MySQL and run it with core dump enabled. If it crashed again, I will report more information.
[18 Apr 2007 8:16] Valeriy Kravchuk
I am waiting for your further feedback.
[30 Apr 2007 11:57] Yuan WANG
We got the coredump, and the stack trace is listed as follows:
  #0  0x00002aefab1b207b in raise () from /lib/libc.so.6
  #1  0x00002aefab1b384e in abort () from /lib/libc.so.6
  #2  0x00002aefaadb13ff in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
  #3  0x00002aefaadaf5a6 in std::set_unexpected () from /usr/lib/libstdc++.so.6
  #4  0x00002aefaadaf5d3 in std::terminate () from /usr/lib/libstdc++.so.6
  #5  0x00002aefaadafa8f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
  #6  0x00000000005e3c75 in MYSQLparse (yythd=0x2aaac8902020) at sql_yacc.yy:4210
  #7  0x00000000005cb0dd in mysql_parse (thd=0x2aaac8902020, 
    inBuf=0xa238f10 "SELECT ID FROM CircleBlog WHERE (((((CircleID = 21361) AND (Valid < 12)) AND (AllowView = -100)) AND (IsPublished = 1)) AND (ApplyDate >= 1176973212117)) AND (ApplyDate < 1177578012117)", length=<value optimized out>) at sql_parse.cc:5799
  #8  0x00000000005cb73b in dispatch_command (command=<value optimized out>, thd=0x2aaac8902020, packet=<value optimized out>, 
    packet_length=<value optimized out>) at sql_parse.cc:1775
  #9  0x00000000005cca82 in do_command (thd=0x2aaac8902020) at sql_parse.cc:1557
  #10 0x00000000005cd7ad in handle_one_connection (arg=<value optimized out>) at sql_parse.cc:1188
  #11 0x00002aefaa999f1a in start_thread () from /lib/libpthread.so.0
  #12 0x00002aefab24c602 in clone () from /lib/libc.so.6
  #13 0x0000000000000000 in ?? ()

However, no problem occurs if I execute the SQL above.
[30 Apr 2007 11:59] Yuan WANG
We got the coredump, and the stack trace is listed as follows:
  #0  0x00002aefab1b207b in raise () from /lib/libc.so.6
  #1  0x00002aefab1b384e in abort () from /lib/libc.so.6
  #2  0x00002aefaadb13ff in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
  #3  0x00002aefaadaf5a6 in std::set_unexpected () from /usr/lib/libstdc++.so.6
  #4  0x00002aefaadaf5d3 in std::terminate () from /usr/lib/libstdc++.so.6
  #5  0x00002aefaadafa8f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
  #6  0x00000000005e3c75 in MYSQLparse (yythd=0x2aaac8902020) at sql_yacc.yy:4210
  #7  0x00000000005cb0dd in mysql_parse (thd=0x2aaac8902020, 
    inBuf=0xa238f10 "SELECT ID FROM CircleBlog WHERE (((((CircleID = 21361) AND (Valid < 12)) AND (AllowView = -100)) AND (IsPublished = 1)) AND (ApplyDate >= 1176973212117)) AND (ApplyDate < 1177578012117)", length=<value optimized out>) at sql_parse.cc:5799
  #8  0x00000000005cb73b in dispatch_command (command=<value optimized out>, thd=0x2aaac8902020, packet=<value optimized out>, 
    packet_length=<value optimized out>) at sql_parse.cc:1775
  #9  0x00000000005cca82 in do_command (thd=0x2aaac8902020) at sql_parse.cc:1557
  #10 0x00000000005cd7ad in handle_one_connection (arg=<value optimized out>) at sql_parse.cc:1188
  #11 0x00002aefaa999f1a in start_thread () from /lib/libpthread.so.0
  #12 0x00002aefab24c602 in clone () from /lib/libc.so.6
  #13 0x0000000000000000 in ?? ()

However, no problem occurs if I execute the SQL above.
[6 May 2007 2:28] Yuan WANG
Another stacktrace:

#0  0x00002b1f4901607b in raise () from /lib/libc.so.6
(gdb) where
#0  0x00002b1f4901607b in raise () from /lib/libc.so.6
#1  0x00002b1f4901784e in abort () from /lib/libc.so.6
#2  0x00002b1f48c15424 in __gnu_cxx::__verbose_terminate_handler ()
   from /usr/lib/libstdc++.so.6
#3  0x00002b1f48c135a6 in std::set_unexpected () from /usr/lib/libstdc++.so.6
#4  0x00002b1f48c135d3 in std::terminate () from /usr/lib/libstdc++.so.6
#5  0x00002b1f48c13a8f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#6  0x00000000005e3c75 in MYSQLparse (yythd=0x2aab09fcb760) at sql_yacc.yy:4210
#7  0x000000000062749d in Prepared_statement::prepare (this=0x2aab024ae3f0, 
    packet=0x2aab09fcd8d1 "SELECT ID FROM UserRelation WHERE ((ParentID = ?) AND (GroupID = ?)) AND (Affirm = 1)", packet_len=86) at sql_prepare.cc:2797
#8  0x0000000000628588 in mysql_stmt_prepare (thd=0x2aab09fcb760, 
    packet=0x2aab09fcd8d1 "SELECT ID FROM UserRelation WHERE ((ParentID = ?) AND (GroupID = ?)) AND (Affirm = 1)", packet_length=86) at sql_prepare.cc:1870
#9  0x00000000005cc309 in dispatch_command (command=COM_SLEEP, 
    thd=0x2aab09fcb760, packet=0x2cbd <Address 0x2cbd out of bounds>, 
    packet_length=86) at sql_parse.cc:1749
#10 0x00000000005cca82 in do_command (thd=0x2aab09fcb760) at sql_parse.cc:1557
#11 0x00000000005cd7ad in handle_one_connection (arg=<value optimized out>)
    at sql_parse.cc:1188
#12 0x00002b1f487fdf1a in start_thread () from /lib/libpthread.so.0
#13 0x00002b1f490b0602 in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()
[7 May 2007 13:33] Yuan WANG
The previous coredumps are made with passing --core-file to mysqld, and I think there are missleading. After passing --core-file, I get totally different coredumps.  The following are some stacktraces. It seems that the problem is related to xid cache?

Stacetrace 1:
#0  0x00002b83d78d9797 in pthread_kill () from /lib/libpthread.so.0
(gdb) WHERE
#0  0x00002b83d78d9797 in pthread_kill () from /lib/libpthread.so.0
#1  0x00000000005aefc5 in handle_segfault (sig=11) at mysqld.cc:2132
#2  <signal handler called>
#3  movelink (array=0x146a0de0, find=259, next_link=4294967295, newlink=412) at hash.c:272
#4  0x00000000008c782a in my_hash_insert (info=0xd25920, record=0x2aaaabbb7e18 "\023\231") at hash.c:427
#5  0x000000000059de01 in xid_cache_insert (xid_state=0x2aaaabbb7e18) at sql_class.cc:2186
#6  0x00000000005ca316 in mysql_execute_command (thd=0x2aaaabbb6060) at sql_parse.cc:4838
#7  0x00000000005cb24c in mysql_parse (thd=0x2aaaabbb6060, inBuf=0x111c92b0 "XA START 0x31353831323036383237323633343537303738,0x33,0x9913", 
    length=<value optimized out>) at sql_parse.cc:5834
#8  0x00000000005cb73b in dispatch_command (command=<value optimized out>, thd=0x2aaaabbb6060, packet=<value optimized out>, 
    packet_length=<value optimized out>) at sql_parse.cc:1775
#9  0x00000000005cca82 in do_command (thd=0x2aaaabbb6060) at sql_parse.cc:1557
#10 0x00000000005cd7ad in handle_one_connection (arg=<value optimized out>) at sql_parse.cc:1188
#11 0x00002b83d78d5f1a in start_thread () from /lib/libpthread.so.0
#12 0x00002b83d8188602 in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()

Stacetrace 2:
#0  0x00002abe4b515797 in pthread_kill () from /lib/libpthread.so.0
#1  0x00000000005aefc5 in handle_segfault (sig=11) at mysqld.cc:2132
#2  <signal handler called>
#3  0x00000000008ec4d9 in my_hash_sort_bin (cs=<value optimized out>, key=0xd0e422a1d9b79400 <Address 0xd0e422a1d9b79400 out of bounds>, 
    len=3652686848, nr1=0x5085a6f0, nr2=0x5085a6e8) at ctype-bin.c:283
#4  0x00000000008c65fe in calc_hash (hash=<value optimized out>, key=0x2aaaaba385e0 "", length=3652686848) at hash.c:45
#5  0x00000000008c6940 in hash_rec_mask (hash=0xd25920, pos=<value optimized out>, buffmax=512, maxlength=290) at hash.c:171
#6  0x00000000008c6e27 in hash_first (hash=0xd25920, key=0x15c7bbc8 "\023", length=<value optimized out>, current_record=0x5085a7b4)
    at hash.c:226
#7  0x00000000008c6eee in hash_search (hash=0x1bfa20, key=0x2aaaaba385e0 "", length=3652686848) at hash.c:192
#8  0x000000000059de6f in xid_cache_search (xid=0x15c7bbc0) at sql_class.cc:2154
#9  0x00000000005ca0c0 in mysql_execute_command (thd=0x2aaaac8a1d50) at sql_parse.cc:4830
#10 0x00000000005cb24c in mysql_parse (thd=0x2aaaac8a1d50, inBuf=0x15c7b9e0 "XA START 0x31363231323036383236383937313938393937,0x32,0x9913", 
    length=<value optimized out>) at sql_parse.cc:5834
#11 0x00000000005cb73b in dispatch_command (command=<value optimized out>, thd=0x2aaaac8a1d50, packet=<value optimized out>, 
    packet_length=<value optimized out>) at sql_parse.cc:1775
#12 0x00000000005cca82 in do_command (thd=0x2aaaac8a1d50) at sql_parse.cc:1557
#13 0x00000000005cd7ad in handle_one_connection (arg=<value optimized out>) at sql_parse.cc:1188
#14 0x00002abe4b511f1a in start_thread () from /lib/libpthread.so.0
#15 0x00002abe4bdc4602 in clone () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()

Stacktrace 3:
#0  0x00002adfeced3797 in pthread_kill () from /lib/libpthread.so.0
#1  0x00000000005aefc5 in handle_segfault (sig=11) at mysqld.cc:2132
#2  <signal handler called>
#3  movelink (array=0x138aab50, find=0, next_link=4294967295, newlink=1) at hash.c:272
#4  0x00000000008c746b in hash_delete (hash=0xd25920, record=0x2aaab7740388 "��������\023") at hash.c:497
#5  0x000000000059b775 in xid_cache_delete (xid_state=0x2aaab7740388) at sql_class.cc:2195
#6  0x00000000005c9c89 in mysql_execute_command (thd=0x2aaab773e5d0) at sql_parse.cc:4935
#7  0x00000000005cb24c in mysql_parse (thd=0x2aaab773e5d0, inBuf=0x13452400 "XA COMMIT 0x31363331323036383236353139323435373333,0x31,0x9913", 
    length=<value optimized out>) at sql_parse.cc:5834
#8  0x00000000005cb73b in dispatch_command (command=<value optimized out>, thd=0x2aaab773e5d0, packet=<value optimized out>, 
    packet_length=<value optimized out>) at sql_parse.cc:1775
#9  0x00000000005cca82 in do_command (thd=0x2aaab773e5d0) at sql_parse.cc:1557
#10 0x00000000005cd7ad in handle_one_connection (arg=<value optimized out>) at sql_parse.cc:1188
#11 0x00002adfececff1a in start_thread () from /lib/libpthread.so.0
#12 0x00002adfed782602 in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()
[7 May 2007 14:34] Yuan WANG
Sorry for my typo. The first sentence of my previous comment should be "The previous coredumps are made without passing --core-file to mysqld"
[10 May 2007 10:55] Valeriy Kravchuk
Looks like we have to consider this as a duplciate (less clear version of) bug #28323 now.