Bug #41751 expire_logs_days causes signal 11 when an SQL node starts
Submitted: 26 Dec 2008 3:38 Modified: 18 Aug 2010 17:32
Reporter: Mikiya Okuno Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S1 (Critical)
Version:mysql-5.1-telco-6.3 OS:Linux
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: mysql-5.1-telco-6.3.20

[26 Dec 2008 3:38] Mikiya Okuno
Description:
MySQL Server, which is configured as an SQL node for MySQL Cluster system, a server started with --ndbcluster option in other words, crashes with signal 11 during startup if expire_logs_day is specified and there are old binary logs to delete. At that time, the oldest binary log is deleted. The following is the snippet from the error log which contains the stack trace.

081226 12:14:55 mysqld_safe Starting mysqld daemon with databases from /var/lib/telco-6.3/sql1
081226 12:14:55 [Warning] The syntax '--log_slow_queries' is deprecated and will be removed in MySQL 7.0. Please use '--slow_query_log'/'--slow_query_log_file' instead.                                                                                                                            
081226 12:14:55 [Note] NDB: NodeID is 15, management server '127.0.0.1:1186'                                                                      
081226 12:14:55 [Note] NDB[0]: NodeID: 15, all storage nodes connected                                                                            
081226 12:14:55 [Note] Starting Cluster Binlog Thread                                                                                             
081226 12:14:55 [Note] Failed to execute my_stat on file './mysql-clust-bin.000003'                                                               
081226 12:14:55 - mysqld got signal 11 ;                                                                                                          
This could be because you hit a bug. It is also possible that this binary                                                                         
or one of the libraries it was linked against is corrupt, improperly built,                                                                       
or misconfigured. This error can also be caused by malfunctioning hardware.                                                                       
We will try our best to scrape up some info that will hopefully help diagnose                                                                     
the problem, but since we have already crashed, something is definitely wrong                                                                     
and this may fail.                                                                                                                                

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0 
max_threads=300        
threads_connected=0    
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 664055 K
bytes of memory                                                               
Hope that's ok; if not, decrease some variables in the equation.              

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went   
terribly wrong...                                                      
stack_bottom = (nil) thread_stack 0x40000                              
/usr/local/telco-6.3/bin/mysqld(my_print_stacktrace+0x29) [0x882a09]   
/usr/local/telco-6.3/bin/mysqld(handle_segfault+0x322) [0x611cf2]      
/lib/libpthread.so.0 [0x7f7a54b2e0f0]                                  
/usr/local/telco-6.3/bin/mysqld [0x7acc20]                             
/usr/local/telco-6.3/bin/mysqld [0x7ae6d8]                             
/usr/local/telco-6.3/bin/mysqld [0x6fbcdf]
/usr/local/telco-6.3/bin/mysqld(ha_binlog_index_purge_file(THD*, char const*)+0x24) [0x6fbde4]
/usr/local/telco-6.3/bin/mysqld(MYSQL_BIN_LOG::purge_logs_before_date(long)+0x1ca) [0x6ab65a]
/usr/local/telco-6.3/bin/mysqld [0x61292d]
/usr/local/telco-6.3/bin/mysqld(main+0x1d1) [0x6166d1]
/lib/libc.so.6(__libc_start_main+0xe6) [0x7f7a539db466]
/usr/local/telco-6.3/bin/mysqld(fmod+0x6a) [0x5489ca]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
081226 12:14:55 mysqld_safe mysqld from pid file /var/lib/telco-6.3/sql1/mysql.pid ended

How to repeat:
0. Configure MySQL Cluster with binary logs.
1. Ensure that one of SQL nodes has old binary logs (e.g. 10 days ago)
2. Shutdown the SQL node.
3. Edit my.cnf and specify e.g. 'expire_logs_days=3'.
4. Start the server.

Then, crash!!

Suggested fix:
N/A
[23 Apr 2009 14:15] MySQL Verification Team
reminds me of bug #37027 !
[16 Dec 2009 10:00] Sveta Smirnova
Bug #44693 and bug #20408 were marked as duplicates of this one.
[23 Dec 2009 9:55] MySQL Verification Team
still affects mysql-cluster-gpl-7.0.6-linux-i686-glibc23
[23 Dec 2009 10:24] MySQL Verification Team
still affects with mysql-cluster-gpl-7.0.9-linux-i686-glibc23.tar.gz also.
It crashes in function ha_binlog_index_purge_file() on line 4007 because thd is NULL:

4006      binlog_func_foreach(thd, &bfn);
4007      if (thd->main_da.is_error())
4008        return 1;
[23 Mar 2010 13:53] Jonas Oreland
Proposed patch on top of 6.3-bzr 2010-03-23

Attachment: bug41751.patch (application/octet-stream, text), 2.04 KiB.

[23 Mar 2010 13:56] Jonas Oreland
Hi guys

Here is a quite ugly attempt to fix the problem.
Fixing so that it does crash is "easy",
however, if it also needs to work,
this trickery is needed.

Can someone please test if this seems to work (other than me)

/Jonas
[24 Mar 2010 12:18] Jonas Oreland
Attempt 2, 6.3 2010-03-24

Attachment: bug41751.patch.v2 (application/octet-stream, text), 3.80 KiB.

[24 Mar 2010 12:39] Geert Vanderkelen
How to get old binary logs:

A couple of flushes:
  mysql> FLUSH LOGS;

Go to the OS and touch them with a date in the past:
  shell> touch -d "2010-03-01"
[15 Jun 2010 17:34] Geert Vanderkelen
Tried patch.v2 with latest 7.1.4b and --expire-logs-days works fine now, i.e. doesn't crash mysqld and removes the binary logs.
[18 Aug 2010 8:51] Jonas Oreland
version 3

Attachment: bug41751.patch.v3 (application/octet-stream, text), 4.17 KiB.

[18 Aug 2010 10:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/116064

3255 Jonas Oreland	2010-08-18
      ndb - bug#41751 - handle binlog purge during startup by temporarily created THD object (yuck)
[18 Aug 2010 10:27] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.47-ndb-6.3.37 (revid:jonas@mysql.com-20100818095910-9oq1gvlzgf5eie0l) (version source revid:jonas@mysql.com-20100818095910-9oq1gvlzgf5eie0l) (merge vers: 5.1.47-ndb-6.3.37) (pib:20)
[18 Aug 2010 10:27] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.47-ndb-7.0.18 (revid:jonas@mysql.com-20100818102121-bgdwgdwo9pdfltvh) (version source revid:jonas@mysql.com-20100818102121-bgdwgdwo9pdfltvh) (merge vers: 5.1.47-ndb-7.0.18) (pib:20)
[18 Aug 2010 10:44] Jonas Oreland
pushed to 6.3.37, 7.0.18 and 7.1.7
[18 Aug 2010 17:32] Jon Stephens
Documented as follows in the NDB-6.3.37, 7.0.18, and 7.1.7 changelogs:

      Specifying the --expire_logs_days option when there were old 
      binary logs to delete caused SQL nodes to crash on startup.

Closed.
[27 Sep 2011 14:29] Santo Leto
Workaround: 

remove expire_logs_days variable from configuration file and set it with SET syntax:

SET GLOBAL expire_logs_days=N;
[23 Mar 2012 17:41] Micah Stevens
Even thought the symptoms are completely different, http://bugs.mysql.com/bug.php?id=20408 is marked as a dup of this one, so I thought I'd add a note that this seems to still be happening to me in 5.1.56-ndb-7.1.18-cluster-gpl-log

I posted details in http://bugs.mysql.com/bug.php?id=20408