Bug #38718 slave sql thread crashes when reading relay log
Submitted: 11 Aug 2008 13:19 Modified: 15 Nov 2010 11:55
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.0.66a, 5.1.28,5.1.46 OS:Any
Assigned to: Luis Soares CPU Architecture:Any

[11 Aug 2008 13:19] Shane Bester
Description:
during replication between 5.0.66a->5.0.66a, 
a slave got a crash when starting with the attached relay log

mysqld-nt.exe!Log_event::Log_event              Line 377
mysqld-nt.exe!Slave_log_event::Slave_log_event  Line 4311
mysqld-nt.exe!Log_event::read_log_event         Line 889
mysqld-nt.exe!Log_event::read_log_event         Line 828
mysqld-nt.exe!next_event                        Line 4989
mysqld-nt.exe!exec_relay_log_event              Line 3260
mysqld-nt.exe!handle_slave_sql                  Line 4030
mysqld-nt.exe!pthread_start

In Log_event::Log_event() we have:

when = uint4korr(buf);
  server_id = uint4korr(buf + SERVER_ID_OFFSET);
  data_written= uint4korr(buf + EVENT_LEN_OFFSET);
  if (description_event->binlog_version==1)   <-------crash
  {
    log_pos= 0;
    flags= 0;
    return;
  }

mysqlbinlog also crashes.

How to repeat:
I attach the relay log.  How the slave ended up with such a log is probably the topic of another bug report.  This bug concerns the gigo.
[11 Aug 2008 13:20] MySQL Verification Team
relay log to cause a crash

Attachment: xp64-relay-bin.000002 (application/octet-stream, text), 256.23 KiB.

[7 Dec 2009 9:20] MySQL Verification Team
this relay log still kills 5.1.41.  I just used mysqlbinlog on it:

mysqlbinlog.exe!Log_event::Log_event Line 707
mysqlbinlog.exe!Slave_log_event::Slave_log_event  Line 5836
mysqlbinlog.exe!Log_event::read_log_event  Line 1228 
mysqlbinlog.exe!Log_event::read_log_event  Line 1124 
mysqlbinlog.exe!dump_local_log_entries  Line 1990 
mysqlbinlog.exe!dump_log_entries(const char * logname=0x00000000)  Line 1436
mysqlbinlog.exe!main(int argc=0, char * * argv=0x00172d68)  Line 2076
mysqlbinlog.exe!__tmainCRTStartup()  Line 327 

when = uint4korr(buf);
  server_id = uint4korr(buf + SERVER_ID_OFFSET);
  data_written= uint4korr(buf + EVENT_LEN_OFFSET);
  if (description_event->binlog_version==1)  <---------- 0x0 description_event
  {
    log_pos= 0;
    flags= 0;
    return;
  }
[17 Sep 2010 9:47] Luis Soares
Some analysis...

Using mysql-next-mr, the output of:

   $ mysqlbinlog -v 38718/xp64-relay-bin.000002

ends up in Segmentation Fault. Apparently, this seems to be a
corrupted relay log. How it got corrupted in the first place, I
cannot tell (might even have been for any reason outside MySQL
server domain).

Analyzing the stack, I see that it crashes mysqlbinlog exactly
where shane pinpointed:

#0  0x0000000000414b5d in Log_event (this=0x9eda70, buf=0x7ffff7f9a010 "", description_event=0x0)
    at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:753
#1  0x0000000000420f0b in Slave_log_event (this=0x9eda70, buf=0x7ffff7f9a010 "", event_len=262144)
    at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:6155
#2  0x0000000000415650 in Log_event::read_log_event (buf=0x7ffff7f9a010 "", event_len=262144, error=0x7fffffffc6f0, 
    description_event=0xa231d0) at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:1262
#3  0x00000000004152ae in Log_event::read_log_event (file=0x7fffffffd7b0, description_event=0xa231d0)
    at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:1162
(...)

And the reason is because server/mysqlbinlog is loading a
Slave_log_event from the relay log which should never happen. So,
the server/mysqlbinlog goes on to forcing a NULL
Format_descriptor_event in Slave_log_event constructor. Some
snippets of the source code:

    case SLAVE_EVENT: /* can never happen (unused event) */
      ev = new Slave_log_event(buf, event_len);
      break;

(...)

/** This code is not used, so has not been updated to be format-tolerant. */
Slave_log_event::Slave_log_event(const char* buf, uint event_len)
  :Log_event(buf,0) /*unused event*/ ,mem_pool(0),master_host(0)
{

Anyway, eventually, everything blows up at the Log_event base
class constructor, when it tries to use the description event to
setup the event contents.

So it does not matter which version of the server one uses to
check the corrupted relay log. It will always seg fault. We
can/should however, make the server/mysqlbinlog fail gracefully
when such cases happen... And... Maybe this is the fix to this
bug...
[20 Sep 2010 20:09] Mats Kindahl
This problem will be resolved by WL#2540.
[20 Sep 2010 21:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/118645

3295 Luis Soares	2010-09-20
      BUG#38718: slave sql thread crashes when reading relay log
      
      Suprisingly, a Slave_log_event would show up in the binary
      log. This event is never used and should not appear in the
      logs. As such, when the slave (or the mysqlbinlog tool) reads the
      event, it will hit an invalid pointer (reference to the
      descriptor event when deserializing the Slave_log_event was
      purposodely set to NULL).
      
      The presence of the Slave_log_event denotes a corrupted log, but
      we cannot tell how the log got corrupted in the first
      place. However, we can make the server cope with such events when
      it reads them - in case of log corruption - and fail gracefully.
      
      This patch makes the server/mysqlbinlog to report that it has
      found an invalid log event when Slave_log_event is read.
[21 Sep 2010 12:11] Mats Kindahl
The immediate problem of reading a Slave_log_event is fixed by the attached patch, but the general problem of detecting corrupt binary logs require checksums to be implemented (WL#2540).
[6 Oct 2010 11:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120098

3523 Luis Soares	2010-10-06
      BUG#38718: slave sql thread crashes when reading relay log
            
      Suprisingly, a Slave_log_event would show up in the binary
      log. This event is never used and should not appear in the
      logs. As such, when the slave (or the mysqlbinlog tool) reads the
      event, it will hit an invalid pointer (reference to the
      descriptor event when deserializing the Slave_log_event was
      purposodely set to NULL).
            
      The presence of the Slave_log_event denotes a corrupted log, but
      we cannot tell how the log got corrupted in the first
      place. However, we can make the server cope with such events when
      it reads them - in case of log corruption - and fail gracefully.
           
      This patch makes the server/mysqlbinlog to report that it has
      found an invalid log event when Slave_log_event is read.
[13 Oct 2010 10:34] Luis Soares
Queued in mysql-5.1-bugteam, mysql-5.5-bugteam and mysql-trunk-merge:
- http://pb2.norway.sun.com/web.py?template=push_details&push=1647867
- http://pb2.norway.sun.com/web.py?template=push_details&push=1647840
- http://pb2.norway.sun.com/web.py?template=push_details&push=1647834
[29 Oct 2010 14:14] Jon Stephens
Documented in the 5.1.53 and 5.5.8 changelogs as follows:

        Trying to read from a binary log containing a log event of an
        invalid type caused the slave to crash.

Set Need Merge, waiting for push to trunk.
[4 Nov 2010 17:26] Jon Stephens
Still waiting for merge to trunk.
[13 Nov 2010 16:25] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (merge vers: 5.6.99-m5) (pib:21)
[13 Nov 2010 16:36] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:alexander.nozdrin@oracle.com-20101113152540-gxro4g0v29l27f5x) (pib:21)
[15 Nov 2010 11:55] Jon Stephens
Bug does not appear in a 5.6 release, no additional changelog entries required.

Closed.
[18 Nov 2010 15:54] Bugs System
Pushed into mysql-5.1 5.1.54 (revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (version source revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (merge vers: 5.1.54) (pib:21)
[16 Dec 2010 22:30] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)