| Bug #38718 | slave sql thread crashes when reading relay log | ||
|---|---|---|---|
| Submitted: | 11 Aug 2008 13:19 | Modified: | 15 Nov 2010 11:55 |
| Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
| Version: | 5.0.66a, 5.1.28,5.1.46 | OS: | Any |
| Assigned to: | Luis Soares | CPU Architecture: | Any |
[11 Aug 2008 13:20]
MySQL Verification Team
relay log to cause a crash
Attachment: xp64-relay-bin.000002 (application/octet-stream, text), 256.23 KiB.
[7 Dec 2009 9:20]
MySQL Verification Team
this relay log still kills 5.1.41. I just used mysqlbinlog on it:
mysqlbinlog.exe!Log_event::Log_event Line 707
mysqlbinlog.exe!Slave_log_event::Slave_log_event Line 5836
mysqlbinlog.exe!Log_event::read_log_event Line 1228
mysqlbinlog.exe!Log_event::read_log_event Line 1124
mysqlbinlog.exe!dump_local_log_entries Line 1990
mysqlbinlog.exe!dump_log_entries(const char * logname=0x00000000) Line 1436
mysqlbinlog.exe!main(int argc=0, char * * argv=0x00172d68) Line 2076
mysqlbinlog.exe!__tmainCRTStartup() Line 327
when = uint4korr(buf);
server_id = uint4korr(buf + SERVER_ID_OFFSET);
data_written= uint4korr(buf + EVENT_LEN_OFFSET);
if (description_event->binlog_version==1) <---------- 0x0 description_event
{
log_pos= 0;
flags= 0;
return;
}
[17 Sep 2010 9:47]
Luis Soares
Some analysis...
Using mysql-next-mr, the output of:
$ mysqlbinlog -v 38718/xp64-relay-bin.000002
ends up in Segmentation Fault. Apparently, this seems to be a
corrupted relay log. How it got corrupted in the first place, I
cannot tell (might even have been for any reason outside MySQL
server domain).
Analyzing the stack, I see that it crashes mysqlbinlog exactly
where shane pinpointed:
#0 0x0000000000414b5d in Log_event (this=0x9eda70, buf=0x7ffff7f9a010 "", description_event=0x0)
at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:753
#1 0x0000000000420f0b in Slave_log_event (this=0x9eda70, buf=0x7ffff7f9a010 "", event_len=262144)
at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:6155
#2 0x0000000000415650 in Log_event::read_log_event (buf=0x7ffff7f9a010 "", event_len=262144, error=0x7fffffffc6f0,
description_event=0xa231d0) at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:1262
#3 0x00000000004152ae in Log_event::read_log_event (file=0x7fffffffd7b0, description_event=0xa231d0)
at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:1162
(...)
And the reason is because server/mysqlbinlog is loading a
Slave_log_event from the relay log which should never happen. So,
the server/mysqlbinlog goes on to forcing a NULL
Format_descriptor_event in Slave_log_event constructor. Some
snippets of the source code:
case SLAVE_EVENT: /* can never happen (unused event) */
ev = new Slave_log_event(buf, event_len);
break;
(...)
/** This code is not used, so has not been updated to be format-tolerant. */
Slave_log_event::Slave_log_event(const char* buf, uint event_len)
:Log_event(buf,0) /*unused event*/ ,mem_pool(0),master_host(0)
{
Anyway, eventually, everything blows up at the Log_event base
class constructor, when it tries to use the description event to
setup the event contents.
So it does not matter which version of the server one uses to
check the corrupted relay log. It will always seg fault. We
can/should however, make the server/mysqlbinlog fail gracefully
when such cases happen... And... Maybe this is the fix to this
bug...
[20 Sep 2010 20:09]
Mats Kindahl
This problem will be resolved by WL#2540.
[20 Sep 2010 21:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118645 3295 Luis Soares 2010-09-20 BUG#38718: slave sql thread crashes when reading relay log Suprisingly, a Slave_log_event would show up in the binary log. This event is never used and should not appear in the logs. As such, when the slave (or the mysqlbinlog tool) reads the event, it will hit an invalid pointer (reference to the descriptor event when deserializing the Slave_log_event was purposodely set to NULL). The presence of the Slave_log_event denotes a corrupted log, but we cannot tell how the log got corrupted in the first place. However, we can make the server cope with such events when it reads them - in case of log corruption - and fail gracefully. This patch makes the server/mysqlbinlog to report that it has found an invalid log event when Slave_log_event is read.
[21 Sep 2010 12:11]
Mats Kindahl
The immediate problem of reading a Slave_log_event is fixed by the attached patch, but the general problem of detecting corrupt binary logs require checksums to be implemented (WL#2540).
[6 Oct 2010 11:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/120098 3523 Luis Soares 2010-10-06 BUG#38718: slave sql thread crashes when reading relay log Suprisingly, a Slave_log_event would show up in the binary log. This event is never used and should not appear in the logs. As such, when the slave (or the mysqlbinlog tool) reads the event, it will hit an invalid pointer (reference to the descriptor event when deserializing the Slave_log_event was purposodely set to NULL). The presence of the Slave_log_event denotes a corrupted log, but we cannot tell how the log got corrupted in the first place. However, we can make the server cope with such events when it reads them - in case of log corruption - and fail gracefully. This patch makes the server/mysqlbinlog to report that it has found an invalid log event when Slave_log_event is read.
[13 Oct 2010 10:34]
Luis Soares
Queued in mysql-5.1-bugteam, mysql-5.5-bugteam and mysql-trunk-merge: - http://pb2.norway.sun.com/web.py?template=push_details&push=1647867 - http://pb2.norway.sun.com/web.py?template=push_details&push=1647840 - http://pb2.norway.sun.com/web.py?template=push_details&push=1647834
[29 Oct 2010 14:14]
Jon Stephens
Documented in the 5.1.53 and 5.5.8 changelogs as follows:
Trying to read from a binary log containing a log event of an
invalid type caused the slave to crash.
Set Need Merge, waiting for push to trunk.
[4 Nov 2010 17:26]
Jon Stephens
Still waiting for merge to trunk.
[13 Nov 2010 16:25]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (merge vers: 5.6.99-m5) (pib:21)
[13 Nov 2010 16:36]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:alexander.nozdrin@oracle.com-20101113152540-gxro4g0v29l27f5x) (pib:21)
[15 Nov 2010 11:55]
Jon Stephens
Bug does not appear in a 5.6 release, no additional changelog entries required. Closed.
[18 Nov 2010 15:54]
Bugs System
Pushed into mysql-5.1 5.1.54 (revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (version source revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (merge vers: 5.1.54) (pib:21)
[16 Dec 2010 22:30]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)

Description: during replication between 5.0.66a->5.0.66a, a slave got a crash when starting with the attached relay log mysqld-nt.exe!Log_event::Log_event Line 377 mysqld-nt.exe!Slave_log_event::Slave_log_event Line 4311 mysqld-nt.exe!Log_event::read_log_event Line 889 mysqld-nt.exe!Log_event::read_log_event Line 828 mysqld-nt.exe!next_event Line 4989 mysqld-nt.exe!exec_relay_log_event Line 3260 mysqld-nt.exe!handle_slave_sql Line 4030 mysqld-nt.exe!pthread_start In Log_event::Log_event() we have: when = uint4korr(buf); server_id = uint4korr(buf + SERVER_ID_OFFSET); data_written= uint4korr(buf + EVENT_LEN_OFFSET); if (description_event->binlog_version==1) <-------crash { log_pos= 0; flags= 0; return; } mysqlbinlog also crashes. How to repeat: I attach the relay log. How the slave ended up with such a log is probably the topic of another bug report. This bug concerns the gigo.