Bug #43075 rpl.rpl_sync fails sporadically on pushbuild
Submitted: 21 Feb 2009 11:38 Modified: 12 Nov 2009 12:08
Reporter: Alfranio Junior Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S3 (Non-critical)
Version:6.0-rpl OS:Any
Assigned to: Alfranio Junior CPU Architecture:Any
Tags: crash, pushbuild, sporadically

[21 Feb 2009 11:38] Alfranio Junior
Description:
rpl.rpl_sync fails sporadically on pushbuild:

090221  6:47:07 [Note] Slave I/O thread killed while reading event
090221  6:47:07 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 1332
090221  6:47:07 [Note] Error reading relay log event: slave SQL thread was killed
090221  6:47:07 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60685 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...

How to repeat:
https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-6.0-rpl&order=178 [sapsrv1, n_mix]
[5 Mar 2009 9:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68324

2820 Alfranio Correia	2009-03-05
      BUG#43075 rpl.rpl_sync fails sporadically on pushbuild
      
      The slave was crashing while failing to execute the init_slave() function.
      The issue stem from two different reasons:
      
      1 - A failure before allocating the master info structure
      thus generating a segfault while accessing a NULL structure.
      
      2 - A failure after allocating the master info structure structure
      thus generating a segfault due to a non-initialized relay log file.
      
      The patch tests if the master info structure and relay log file are allocated
      before accessing them.
[11 Mar 2009 22:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68970

2820 Alfranio Correia	2009-03-11
      BUG#43075 rpl.rpl_sync fails sporadically on pushbuild
      
      The slave was crashing while failing to execute the init_slave() function.
      The issue stems from two different reasons:
      
      1 - A failure while allocating the master info structure generated a
      segfault due to a NULL pointer.
      
      2 - A failure while recovering generated a segfault due to a non-initialized
      relay log file. In other words, the mi->init and rli->init were both set to true
      before executing the recovery process thus creating an inconsistent state as the
      relay log file was not initialized.
      
      To circumvent such problems, we verified if active_mi is not null before
      accessing it, refactored the recovery process which is now executed while
      initializing the relay log and any error is propagated thus avoiding
      to set mi->init and rli->init to true when the relay log is not initialized.
      The changes related to the refactory are described below:
      
      1 - Removed call to init_recovery from init_slave.
      
      2 - Changed the signature of the function init_recovery.
      
      3 - Removed flushes. They are called while initializing the relay log and master info.
[12 Mar 2009 11:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/69027

2821 Alfranio Correia	2009-03-12
      BUG#43075 rpl.rpl_sync fails sporadically on pushbuild
      
      The slave was crashing while failing to execute the init_slave() function.
      
      The issue stems from two different reasons:
      
      1 - A failure while allocating the master info structure generated a
          segfault due to a NULL pointer.
      
      2 - A failure while recovering generated a segfault due to a non-initialized
          relay log file. In other words, the mi->init and rli->init were both set to true
          before executing the recovery process thus creating an inconsistent state as the
          relay log file was not initialized.
      
      To circumvent such problems, we refactored the recovery process which is now executed while
      initializing the relay log. It is ensured that the master info structure is created
      before accessing it and any error is propagated thus avoiding to set mi->init and
      rli->init to true when for instance the relay log is not initialized or the relay info
      is not flushed.
      
      The changes related to the refactory are described below:
      
      1 - Removed call to init_recovery from init_slave.
      
      2 - Changed the signature of the function init_recovery.
      
      3 - Removed flushes. They are called while initializing the relay log and master info.
      
      4 - Made sure that if the relay info is not flushed the mi-init and rli-init are not
      set to true.
      
      In this patch, we also replaced the exit(1) in the fault injection by DBUG_ABORT() to
      make it compliant with the code guidelines.
[24 Mar 2009 17:20] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090324171507-s5aac9guj21l0jz6) (version source revid:alfranio.correia@sun.com-20090316115905-t2hij3yy7j0w05iz) (merge vers: 6.0.11-alpha) (pib:6)
[26 Mar 2009 16:59] Jon Stephens
Documented bugfix in the 6.0.11 changelog as follows:

        This fix handles 2 issues encountered on replication slaves during
        startup:

            1. A failure while allocating the master info structure caused 
            the slave to crash.

            2. A failure during recovery caused the relay log file not to 
            be properly initialized which led to a crash on the slave.

Closed.
[30 Sep 2009 21:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85276

3124 Alfranio Correia	2009-09-30
      BUG#43075 rpl.rpl_sync fails sporadically on pushbuild
      
      NOTE: Backporting the patch to next-mr.
            
      The slave was crashing while failing to execute the init_slave() function.
            
      The issue stems from two different reasons:
            
      1 - A failure while allocating the master info structure generated a
          segfault due to a NULL pointer.
            
      2 - A failure while recovering generated a segfault due to a non-initialized
          relay log file. In other words, the mi->init and rli->init were both set to true
          before executing the recovery process thus creating an inconsistent state as the
          relay log file was not initialized.
            
      To circumvent such problems, we refactored the recovery process which is now executed
      while initializing the relay log. It is ensured that the master info structure is
      created before accessing it and any error is propagated thus avoiding to set mi->init
      and rli->init to true when for instance the relay log is not initialized or the relay
      info is not flushed.
            
      The changes related to the refactory are described below:
            
      1 - Removed call to init_recovery from init_slave.
            
      2 - Changed the signature of the function init_recovery.
            
      3 - Removed flushes. They are called while initializing the relay log and master
          info.
            
      4 - Made sure that if the relay info is not flushed the mi-init and rli-init are not
          set to true.
            
      In this patch, we also replaced the exit(1) in the fault injection by DBUG_ABORT()
      to make it compliant with the code guidelines.
[27 Oct 2009 9:48] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers: 6.0.14-alpha) (pib:13)
[27 Oct 2009 17:58] Jon Stephens
Already documented in 6.0.11 changelog; closed w/o further action.
[12 Nov 2009 8:16] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 12:08] Jon Stephens
Also added changelog entry in 5.5.0 changelog; closed.