MySQL Bugs: #41751: expire_logs_days causes signal 11 when an SQL node starts

Bug #41751	expire_logs_days causes signal 11 when an SQL node starts
Submitted:	26 Dec 2008 3:38	Modified:	18 Aug 2010 17:32
Reporter:	Mikiya Okuno	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Replication	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-6.3	OS:	Linux
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	mysql-5.1-telco-6.3.20

Description:
MySQL Server, which is configured as an SQL node for MySQL Cluster system, a server started with --ndbcluster option in other words, crashes with signal 11 during startup if expire_logs_day is specified and there are old binary logs to delete. At that time, the oldest binary log is deleted. The following is the snippet from the error log which contains the stack trace.

081226 12:14:55 mysqld_safe Starting mysqld daemon with databases from /var/lib/telco-6.3/sql1
081226 12:14:55 [Warning] The syntax '--log_slow_queries' is deprecated and will be removed in MySQL 7.0. Please use '--slow_query_log'/'--slow_query_log_file' instead.                                                                                                                            
081226 12:14:55 [Note] NDB: NodeID is 15, management server '127.0.0.1:1186'                                                                      
081226 12:14:55 [Note] NDB[0]: NodeID: 15, all storage nodes connected                                                                            
081226 12:14:55 [Note] Starting Cluster Binlog Thread                                                                                             
081226 12:14:55 [Note] Failed to execute my_stat on file './mysql-clust-bin.000003'                                                               
081226 12:14:55 - mysqld got signal 11 ;                                                                                                          
This could be because you hit a bug. It is also possible that this binary                                                                         
or one of the libraries it was linked against is corrupt, improperly built,                                                                       
or misconfigured. This error can also be caused by malfunctioning hardware.                                                                       
We will try our best to scrape up some info that will hopefully help diagnose                                                                     
the problem, but since we have already crashed, something is definitely wrong                                                                     
and this may fail.                                                                                                                                

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0 
max_threads=300        
threads_connected=0    
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 664055 K
bytes of memory                                                               
Hope that's ok; if not, decrease some variables in the equation.              

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went   
terribly wrong...                                                      
stack_bottom = (nil) thread_stack 0x40000                              
/usr/local/telco-6.3/bin/mysqld(my_print_stacktrace+0x29) [0x882a09]   
/usr/local/telco-6.3/bin/mysqld(handle_segfault+0x322) [0x611cf2]      
/lib/libpthread.so.0 [0x7f7a54b2e0f0]                                  
/usr/local/telco-6.3/bin/mysqld [0x7acc20]                             
/usr/local/telco-6.3/bin/mysqld [0x7ae6d8]                             
/usr/local/telco-6.3/bin/mysqld [0x6fbcdf]
/usr/local/telco-6.3/bin/mysqld(ha_binlog_index_purge_file(THD*, char const*)+0x24) [0x6fbde4]
/usr/local/telco-6.3/bin/mysqld(MYSQL_BIN_LOG::purge_logs_before_date(long)+0x1ca) [0x6ab65a]
/usr/local/telco-6.3/bin/mysqld [0x61292d]
/usr/local/telco-6.3/bin/mysqld(main+0x1d1) [0x6166d1]
/lib/libc.so.6(__libc_start_main+0xe6) [0x7f7a539db466]
/usr/local/telco-6.3/bin/mysqld(fmod+0x6a) [0x5489ca]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
081226 12:14:55 mysqld_safe mysqld from pid file /var/lib/telco-6.3/sql1/mysql.pid ended

How to repeat:
0. Configure MySQL Cluster with binary logs.
1. Ensure that one of SQL nodes has old binary logs (e.g. 10 days ago)
2. Shutdown the SQL node.
3. Edit my.cnf and specify e.g. 'expire_logs_days=3'.
4. Start the server.

Then, crash!!

Suggested fix:
N/A

reminds me of bug #37027 !

Bug #44693 and bug #20408 were marked as duplicates of this one.

still affects mysql-cluster-gpl-7.0.6-linux-i686-glibc23

still affects with mysql-cluster-gpl-7.0.9-linux-i686-glibc23.tar.gz also.
It crashes in function ha_binlog_index_purge_file() on line 4007 because thd is NULL:

4006      binlog_func_foreach(thd, &bfn);
4007      if (thd->main_da.is_error())
4008        return 1;

Proposed patch on top of 6.3-bzr 2010-03-23

Attachment: bug41751.patch (application/octet-stream, text), 2.04 KiB.

Hi guys

Here is a quite ugly attempt to fix the problem.
Fixing so that it does crash is "easy",
however, if it also needs to work,
this trickery is needed.

Can someone please test if this seems to work (other than me)

/Jonas

Attempt 2, 6.3 2010-03-24

Attachment: bug41751.patch.v2 (application/octet-stream, text), 3.80 KiB.

How to get old binary logs:

A couple of flushes:
  mysql> FLUSH LOGS;

Go to the OS and touch them with a date in the past:
  shell> touch -d "2010-03-01"

Tried patch.v2 with latest 7.1.4b and --expire-logs-days works fine now, i.e. doesn't crash mysqld and removes the binary logs.

version 3

Attachment: bug41751.patch.v3 (application/octet-stream, text), 4.17 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/116064

3255 Jonas Oreland	2010-08-18
      ndb - bug#41751 - handle binlog purge during startup by temporarily created THD object (yuck)

Pushed into mysql-5.1-telco-6.3 5.1.47-ndb-6.3.37 (revid:jonas@mysql.com-20100818095910-9oq1gvlzgf5eie0l) (version source revid:jonas@mysql.com-20100818095910-9oq1gvlzgf5eie0l) (merge vers: 5.1.47-ndb-6.3.37) (pib:20)

Pushed into mysql-5.1-telco-7.0 5.1.47-ndb-7.0.18 (revid:jonas@mysql.com-20100818102121-bgdwgdwo9pdfltvh) (version source revid:jonas@mysql.com-20100818102121-bgdwgdwo9pdfltvh) (merge vers: 5.1.47-ndb-7.0.18) (pib:20)

pushed to 6.3.37, 7.0.18 and 7.1.7

Documented as follows in the NDB-6.3.37, 7.0.18, and 7.1.7 changelogs:

      Specifying the --expire_logs_days option when there were old 
      binary logs to delete caused SQL nodes to crash on startup.

Closed.

Workaround: 

remove expire_logs_days variable from configuration file and set it with SET syntax:

SET GLOBAL expire_logs_days=N;

Even thought the symptoms are completely different, http://bugs.mysql.com/bug.php?id=20408 is marked as a dup of this one, so I thought I'd add a note that this seems to still be happening to me in 5.1.56-ndb-7.1.18-cluster-gpl-log

I posted details in http://bugs.mysql.com/bug.php?id=20408