MySQL Bugs: #47920: Crash during recovery after force kill

Bug #47920	Crash during recovery after force kill
Submitted:	8 Oct 2009 16:39	Modified:	28 Feb 2010 9:44
Reporter:	Matthew Boehm	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: General	Severity:	S2 (Serious)
Version:	5.1.35-ndb-7.0.7	OS:	Linux (RHEL5.4)
Assigned to:		CPU Architecture:	Any
Tags:	binlog, crash, recovery

Description:
MySQL was "stuck" trying to drop a completely empty database. Attempted to kill process via mysql cli. No luck. Forced to kill -9 entire mysqld. Attempted to restart mysql with /etc/init.d/mysql start. No success. This was in mysql error log.

091008 11:28:35 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
091008 11:28:36  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Last MySQL binlog file position 0 240172, file name /var/lib/mysql/binlog/binlog.000012
091008 11:28:36  InnoDB: Started; log sequence number 0 507044
091008 11:28:36 [Note] Recovering after a crash using /var/lib/mysql/binlog/binlog
091008 11:28:36 [Note] Starting crash recovery...
091008 11:28:36 [Note] Crash recovery finished.
/usr/sbin/mysqld(print_stacktrace+0x1e)[0x72556e]
/usr/sbin/mysqld(handle_segfault+0x326)[0x60acd6]
/lib64/libpthread.so.0[0x3ebfc0e4c0]
/usr/sbin/mysqld(_Z12push_warningP3THDN11MYSQL_ERROR18enum_warning_levelEjPKc+0x40)[0x688720]
/usr/sbin/mysqld(_Z19push_warning_printfP3THDN11MYSQL_ERROR18enum_warning_levelEjPKcz+0xd9)[0x6889d9]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG22purge_logs_before_dateEl+0x22a)[0x69777a]
/usr/sbin/mysqld[0x609702]
/usr/sbin/mysqld(main+0x6f9)[0x60c279]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3ebf01d974]
/usr/sbin/mysqld[0x559c19]
091008 11:28:36 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=536870912
read_buffer_size=1048576
max_used_connections=0
max_threads=300
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1448925 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
091008 11:28:36 mysqld_safe mysqld from pid file /var/lib/mysql/data/15000dmys02.pid ended

Ran in GDB. Unfortunately, I use the RPM's provided by MySQL so no debugging symbols are available.

(gdb) run
Starting program: /usr/sbin/mysqld
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[New Thread 0x2b500afb55f0 (LWP 13051)]
[New Thread 0x41d1f940 (LWP 13054)]
(no debugging symbols found)
[Thread 0x41d1f940 (LWP 13054) exited]
(no debugging symbols found)
[New Thread 0x41d1f940 (LWP 13055)]
[New Thread 0x42720940 (LWP 13056)]
[New Thread 0x43121940 (LWP 13057)]
[New Thread 0x43b22940 (LWP 13058)]
[New Thread 0x44523940 (LWP 13059)]
[New Thread 0x44f24940 (LWP 13060)]
[Thread 0x44523940 (LWP 13059) exited]
[New Thread 0x45925940 (LWP 13061)]
[New Thread 0x46326940 (LWP 13062)]

Program received signal SIGSEGV, Segmentation fault.
0x0000000000688720 in push_warning ()
(gdb) bt
#0  0x0000000000688720 in push_warning ()
#1  0x00000000006889d9 in push_warning_printf ()
#2  0x000000000069777a in MYSQL_BIN_LOG::purge_logs_before_date ()
#3  0x0000000000609702 in ?? ()
#4  0x000000000060c279 in main ()

How to repeat:
Have binary logging enabled. Force kill (-9) mysqld. Attempt to restart it.

Suggested fix:
The workaround at this time is to simply delete all the binary logs manually. Server restarts just fine. Have not been able to verify if data is missing.

this reminds me of and old bug #37027

Please, send your my.cnf file content.

Upgraded to latest Cluster RPMs. 5.1.35-ndb-7.0.7. Issue remains.

I read up on that other closed bug that was similar to this one. Yes, I have expire-log-days = 14 in my.cnf and I am able to reproduce the error by following bug #37027.

What I notice, is that if mysqld doesn't have to actually purge anything from a missing binlog file, then you just get a "091009 15:16:53 [Note] Failed to execute my_stat on file '/var/lib/mysql/binlog/binlog.000001'" and the server starts fine.

But if mysqld determines that it needs to purge based on your expire-logs-days setting, and the binlog doesn't exist, it will crash:

091009 15:18:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
091009 15:18:03  InnoDB: Started; log sequence number 0 507064
091009 15:18:03 [Note] NDB: NodeID is 4, management server '15000PUTL01:1186'
091009 15:18:03 [Note] NDB[0]: NodeID: 4, all storage nodes connected
091009 15:18:03 [Note] Starting Cluster Binlog Thread
091009 15:18:03 [Note] Failed to execute my_stat on file '/var/lib/mysql/binlog/binlog.000001'
091009 15:18:03 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=536870912
read_buffer_size=1048576
max_used_connections=0
max_threads=300
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1448953 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x33)[0x909be3]
/usr/sbin/mysqld(handle_segfault+0x322)[0x616932]
/lib64/libpthread.so.0[0x3ebfc0e4c0]
/usr/sbin/mysqld[0x79aea9]
/usr/sbin/mysqld[0x6eff4e]
/usr/sbin/mysqld(_Z26ha_binlog_index_purge_fileP3THDPKc+0x1c)[0x6f1d2c]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG10purge_logsEPKcbbbPy+0x3cc)[0x6ae20c]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG22purge_logs_before_dateEl+0x18b)[0x6ae7ab]
/usr/sbin/mysqld[0x618cbd]
/usr/sbin/mysqld(main+0x6ea)[0x61b12a]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3ebf01d974]
/usr/sbin/mysqld[0x565c19]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
091009 15:18:03 mysqld_safe mysqld from pid file /var/lib/mysql/data/15000dmys02.pid ended

The original hang is caused by bug #49536.  This will be fixed in 7.0.10.

Matthew,

could you please try version 7.0.10 to see if it solved the problem.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".