Bug #41733 | Start of replication after start of MySql (InnoDB) often results in crash. | ||
---|---|---|---|
Submitted: | 24 Dec 2008 13:03 | Modified: | 23 Mar 2009 15:19 |
Reporter: | Ben Clewett | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.0.67 | OS: | Linux (SUSE 11.0 + Linux 2.6.25.18) |
Assigned to: | CPU Architecture: | Any | |
Tags: | crash, innodb, replication |
[24 Dec 2008 13:03]
Ben Clewett
[24 Dec 2008 20:09]
MySQL Verification Team
Hi Ben! o) did you run mysql_upgrade after doing the upgrade ? o) do you replicate many temporary tables ? o) please try get a core file so we can see a stack trace
[29 Dec 2008 11:16]
Ben Clewett
> Hi Ben! > > 1) did you run mysql_upgrade after doing the upgrade ? > 2) do you replicate many temporary tables ? > 3) please try get a core file so we can see a stack trace 1. The server was created from an SQL dump file. Therefore it was never at any other version. Other servers which have been updrageded from 5.0.n show the same problem. 2. You mean ENGINE=TEMPORARY tables? By default these are replicated, where about 10,000 are created/destroyed a day. I do not know of a method of accuratelly recording whis without programming. If there is a simple option to stop replicating these tables, I would like to know. 3. I would expect the file 'core' in the process CWD? I cannot find either a 'core' file or anything unusual in the CWD, sorry! Regards, Ben.
[12 Jan 2009 10:47]
MySQL Verification Team
Hi Ben! To get a core file edit my.cnf and restart the mysqld process: [mysqld] core-file [mysqld_safe] core-file-size=unlimited Perhaps some OS-specific configuration needs to be done also. core file will usually be the datadir of mysql. When corefile is created, can you get a stack trace: gdb /path/to/mysqld --core /path/to/core bt thanks!
[12 Jan 2009 17:26]
Ben Clewett
I am very sorry but I still cannot get you a core file. The directives you requested were honoured by MySql, although I cannot see them in 'SHOW GLOBAL VARIABLES'. MySql still crashes after replication start, but no core is found. (find / -name "core*" -ls) I also modified my start script to use: "ulimit -c unlimited" I have tested core file generation from other broken applictioins, which works. I can even see at the bottom of the error.log: > Trying to get some variables. > Some pointers may be invalid and cause the dump to abort... > thd->query at 0x7f3e2000200f is invalid pointer > thd->thread_id=2 > The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains > information that should help you find out what is causing the crash. > Writing a core file However I can't find a core file. Please let me know if there is anything else I can try...
[12 Jan 2009 18:04]
MySQL Verification Team
Hi Ben! Since you use SUSE, can you check this page about creating core files: http://www.novell.com/coolsolutions/feature/16257.html Appears that mysqld is configured properly already.
[13 Jan 2009 8:46]
Ben Clewett
Thanks for the SUSE URL. I worked through this URL, including the test they give to make sure the core file is generated from another application, which works. Still no luck with MySql. I found a bug in earlier versions of MySql, where it cannot generate a core becuase of the 'setuid' on startup, but this seems fixed in my version. I'll keep trying... Ben
[26 Jan 2009 14:40]
Susanne Ebrecht
Ben, without core file we are not really able to help you. Because you told that you will keep trying I will set this back to need feedback. Please take your time. There is an automatism, that after one month nothing happens this will be set to "no feedback". Please then feel free to re-open the bug report again.
[26 Jan 2009 15:09]
Ben Clewett
Susanne, I believe 5.0.67 may be covered by a bug where I must start MySql as user 'mysql' in order to generate core files. However I am now completely unable to make MySql crash. What ever pattern of load/data in my replication tree must have changed. Which is surprising, as nothing I am aware of has changed. I will keep monitoring and trying to get this core file. I will keep trying. Regards, Ben.
[27 Jan 2009 10:15]
Susanne Ebrecht
Ben, many thanks for your feedback. I will check your hint with core file only when systems belongs to user mysql.
[28 Jan 2009 8:17]
Ben Clewett
Finally I have got the stack trace: #0 0x00007fac906f5ce6 in pthread_kill () from /lib64/libpthread.so.0 #1 0x00000000005a6729 in handle_segfault (sig=11) at mysqld.cc:2375 #2 <signal handler called> #3 0x00000000005e375a in lock_tables (thd=0x369c5f0, tables=0x412cc4a0, count=1, need_reopen=0x36a3950) at table.h:731 #4 0x00000000005e31d7 in simple_open_n_lock_tables (thd=0x369c5f0, tables=0x412cc4a0) at sql_base.cc:3022 #5 0x00000000006c228a in my_tz_find_with_opening_tz_tables (thd=0x369c5f0, name=0x412ccfd0) at tztime.cc:2367 #6 0x000000000062df76 in Query_log_event::exec_event (this=0x36ad190, rli=0x368a300, query_arg=0x36951af "BEGIN", q_len_arg=57263600) at log_event.cc:2003 #7 0x00000000006a9b6c in exec_relay_log_event (thd=0x369c5f0, rli=0x368a300) at slave.cc:3420 #8 0x00000000006a7994 in handle_slave_sql (arg=0x368af10) at slave.cc:4030 #9 0x00007fac906f1040 in start_thread () from /lib64/libpthread.so.0 #10 0x00007fac8fdb90cd in clone () from /lib64/libc.so.6 Any more I get I'll pass your way, With great hope this is useful to you, Ben
[28 Jan 2009 8:33]
MySQL Verification Team
looks like bug #21651 which was closed at "can't repeat"
[29 Jan 2009 8:08]
Ben Clewett
I have a second core file today, which it much the same as the first server: #0 0x00007fe1b63c1ce6 in pthread_kill () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fe1b63c1ce6 in pthread_kill () from /lib64/libpthread.so.0 #1 0x00000000005a6729 in handle_segfault (sig=11) at mysqld.cc:2375 #2 <signal handler called> #3 0x00000000005e375a in lock_tables (thd=0x369c600, tables=0x408d94a0, count=1, need_reopen=0x36a3978) at table.h:731 #4 0x00000000005e31d7 in simple_open_n_lock_tables (thd=0x369c600, tables=0x408d94a0) at sql_base.cc:3022 #5 0x00000000006c228a in my_tz_find_with_opening_tz_tables (thd=0x369c600, name=0x408d9fd0) at tztime.cc:2367 #6 0x000000000062df76 in Query_log_event::exec_event (this=0x36b2a10, rli=0x368a310, query_arg=0x36959bf "BEGIN", q_len_arg=57263616) at log_event.cc:2003 #7 0x00000000006a9b6c in exec_relay_log_event (thd=0x369c600, rli=0x368a310) at slave.cc:3420 #8 0x00000000006a7994 in handle_slave_sql (arg=0x368af20) at slave.cc:4030 #9 0x00007fe1b63bd040 in start_thread () from /lib64/libpthread.so.0 #10 0x00007fe1b5a850cd in clone () from /lib64/libc.so.6 Regards,
[29 Jan 2009 12:26]
Susanne Ebrecht
Ben, sorry that I have to plead you for additional tests. We have had some threading problems with 5.0 that got fixed in 5.1. Would you please try MySQL 5.1 here. The actual version is 5.1.30
[29 Jan 2009 13:22]
Ben Clewett
No easy: my replication tree contains 22 MySql processes, all on 5.0.67, and upgrade not planed. Much beta testing of our code needed before any roleout. But I can replicate from 5.0.67 to 5.1.30 and see whether error is still present. Best I can do. Will take a day or two...
[30 Jan 2009 11:56]
Susanne Ebrecht
Ben, slave is crashing so it would be enough when slave is 5.1. Many thanks in advance.
[26 Feb 2009 12:47]
Ben Clewett
As per request: The bug appears when we replicate from 5.0.67 to 5.0.67. However when we replicate from 5.0.67 to 5.1.30, this bug does not appear. The replication appears completely stable. Therefore we will upgrade. Regards, Ben.
[23 Mar 2009 15:19]
Susanne Ebrecht
Ben, then I am sure that your problem is fixed in 5.1 by fixing some bugs related to threading. I will close this bug because it is fixed in version 5.1. If you will get problems again by using 5.1 feel free to open this bug report again.
[12 Oct 2010 8:52]
MySQL Verification Team
may have been fixed along with bug #9953