Bug #39323 slave sql thread crashes when reading relay log
Submitted: 8 Sep 2008 17:25 Modified: 11 Sep 2008 16:44
Reporter: Jochen Wersdörfer Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0.51a-12 OS:Linux (Debian GNU/Linux lenny/sid)
Assigned to: CPU Architecture:Any

[8 Sep 2008 17:25] Jochen Wersdörfer
Description:
Master and slave are both 64bit opterons running the latest debian mysql-version (5.0.51a-12). The slave-thread stopped sometimes in the past with a syntax error at the end of a relay-binlog file, but this time it crashed with a segfault - so, maybe someone should take a deeper look at this.

Replication worked fine for me on other databases - the only difference i'm aware of is that some tables on the crashed system contain pickled python datastructures in blob-attributes.

Here are some lines from the syslog:

Sep  8 17:06:09 nathapael mysqld[17298]: 080908 17:06:09 [Note] Slave SQL thread initialized, starting replication in log 'mysqld-bin.000020' at position 12360857, relay log './mysqld-relay-bin.000096' position: 112362
Sep  8 17:06:09 nathapael mysqld[17298]: 080908 17:06:09 - mysqld got signal 11;
Sep  8 17:06:09 nathapael mysqld[17298]: This could be because you hit a bug. It is also possible that this binary
Sep  8 17:06:09 nathapael mysqld[17298]: or one of the libraries it was linked against is corrupt, improperly built,
Sep  8 17:06:09 nathapael mysqld[17298]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep  8 17:06:09 nathapael mysqld[17298]: 080908 17:06:09 [Note] Slave I/O thread: connected to master 'repl@172.16.100.16:3306',  replication started in log 'mysqld-bin.000072' at position 79823801
Sep  8 17:06:09 nathapael mysqld[17298]: We will try our best to scrape up some info that will hopefully help diagnose
Sep  8 17:06:09 nathapael mysqld[17298]: the problem, but since we have already crashed, something is definitely wrong
Sep  8 17:06:09 nathapael mysqld[17298]: and this may fail.
Sep  8 17:06:09 nathapael mysqld[17298]: 
Sep  8 17:06:09 nathapael mysqld[17298]: key_buffer_size=209715200
Sep  8 17:06:09 nathapael mysqld[17298]: read_buffer_size=2093056
Sep  8 17:06:09 nathapael mysqld[17298]: max_used_connections=1
Sep  8 17:06:09 nathapael mysqld[17298]: max_connections=500
Sep  8 17:06:09 nathapael mysqld[17298]: threads_connected=1
Sep  8 17:06:09 nathapael mysqld[17298]: It is possible that mysqld could use up to 
Sep  8 17:06:09 nathapael mysqld[17298]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 2250796 K
Sep  8 17:06:09 nathapael mysqld[17298]: bytes of memory
Sep  8 17:06:09 nathapael mysqld[17298]: Hope that's ok; if not, decrease some variables in the equation.
Sep  8 17:06:09 nathapael mysqld[17298]: 
Sep  8 17:06:09 nathapael mysqld[17298]: thd=0x29114ad0
Sep  8 17:06:09 nathapael mysqld[17298]: Attempting backtrace. You can use the following information to find out
Sep  8 17:06:09 nathapael mysqld[17298]: where mysqld died. If you see no messages after this, something went
Sep  8 17:06:09 nathapael mysqld[17298]: terribly wrong...
Sep  8 17:06:09 nathapael mysqld[17298]: Cannot determine thread, fp=0x29114ad0, backtrace may not be correct.
Sep  8 17:06:09 nathapael mysqld[17298]: Bogus stack limit or frame pointer, fp=0x29114ad0, stack_bottom=0x40340000, thread_stack=262144, aborting backtrace.
Sep  8 17:06:09 nathapael mysqld[17298]: Trying to get some variables.
Sep  8 17:06:09 nathapael mysqld[17298]: Some pointers may be invalid and cause the dump to abort...
Sep  8 17:06:09 nathapael mysqld[17298]: thd->query at (nil)  is invalid pointer
Sep  8 17:06:09 nathapael mysqld[17298]: thd->thread_id=9
Sep  8 17:06:09 nathapael mysqld[17298]: The manual page at http://www.mysql.com/doc/en/Crashing.html contains
Sep  8 17:06:09 nathapael mysqld[17298]: information that should help you find out what is causing the crash.
Sep  8 17:06:09 nathapael mysqld_safe[17392]: Number of processes running now: 0
Sep  8 17:06:09 nathapael mysqld_safe[17394]: restarted
...

Mysqlbinlog dies too on the binlog, if it's not compiled with --with-debug:

nathapael:/home/jw# /usr/bin/mysqlbinlog --start-position 112335 mysqld-relay-bin.000096 
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 112335
#080820 11:48:04 server id 12345  end_log_pos 12360857  Xid = 48005493
COMMIT/*!*/;
Speicherzugriffsfehler

When compiled with --with-debug, there's an error message:

nathapael:/home/jw# /usr/local/bin/mysqlbinlog --start-position 112335 mysqld-relay-bin.000096 
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 112335
#080820 11:48:04 server id 12345  end_log_pos 12360857  Xid = 48005493
COMMIT/*!*/;
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 72, event_type: 2
Could not read entry at offset 112362:Error in log format or read error
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

Maybe this is a duplicate of #38718, but i tend to doubt it, because the --with-debug compiled version segfaults too on the attached binlog of #38718.

How to repeat:
Since i cannot disclose the complete binlog, here are the bytes after offset 112362 (hoping this is enough):

>>> y[112362:]
'T\xe8\xabH\x0290\x00\x00H\x00\x00\x00\xe1\x9c\xbc\x00\x00\x00\x85N\x0e\x00\x00\x00\x00\x00\x08\x00\x00\x1a0 CI Canon IR 6800 C Canon IR 6800 CN Canh"\xacH\x04\x9d\x86\x01\x002\x00\x00\x00d\xb7\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00mysqld-relay-bin.000097'
[9 Sep 2008 11:46] Susanne Ebrecht
Many thanks for writing a bug report.

Unfortunately there are some replication bugs at the packages made by Debian. So we have to figure out first, if this bug is related to us or to Debian. Therefor we need to know how you install MySQL. Did you download the source from our webpage and then compile it or did you use the Debian package and install by using apt-get or aptitude?
[9 Sep 2008 17:38] Jochen Wersdörfer
I used the precompiled package from Debian. Just a minute ago i tried with a self-compiled version of 5.0.67 and now there's only a parse error instead of a crash:

mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State: Queueing master event to the relay log
                Master_Host: 172.16.100.16
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysqld-bin.000072
        Read_Master_Log_Pos: 129561789
             Relay_Log_File: mysqld-relay-bin.000096
              Relay_Log_Pos: 112362
      Relay_Master_Log_File: mysqld-bin.000020
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB: foobar
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
               Skip_Counter: 0
        Exec_Master_Log_Pos: 12360857
            Relay_Log_Space: 60868833367
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL

So, it seems to be a debian-issue, tomorrow i'll try self-compiled mysql-versions on master and slaves, maybe this is enough to get rid of this replication-problems - thanks a lot.
[11 Sep 2008 16:44] Jochen Wersdörfer
Ok, now i tested with self-compiled 5.0.67 on master and slave and now replication works without problems.
[1 Oct 2008 20:29] MySQL Verification Team
i suspect this could have been due to bug #26489 caused by master sending wrong packets to the slave in the event of network failures..