Bug #38718 | slave sql thread crashes when reading relay log | ||
---|---|---|---|
Submitted: | 11 Aug 2008 13:19 | Modified: | 15 Nov 2010 11:55 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.0.66a, 5.1.28,5.1.46 | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
[11 Aug 2008 13:19]
Shane Bester
[11 Aug 2008 13:20]
MySQL Verification Team
relay log to cause a crash
Attachment: xp64-relay-bin.000002 (application/octet-stream, text), 256.23 KiB.
[7 Dec 2009 9:20]
MySQL Verification Team
this relay log still kills 5.1.41. I just used mysqlbinlog on it: mysqlbinlog.exe!Log_event::Log_event Line 707 mysqlbinlog.exe!Slave_log_event::Slave_log_event Line 5836 mysqlbinlog.exe!Log_event::read_log_event Line 1228 mysqlbinlog.exe!Log_event::read_log_event Line 1124 mysqlbinlog.exe!dump_local_log_entries Line 1990 mysqlbinlog.exe!dump_log_entries(const char * logname=0x00000000) Line 1436 mysqlbinlog.exe!main(int argc=0, char * * argv=0x00172d68) Line 2076 mysqlbinlog.exe!__tmainCRTStartup() Line 327 when = uint4korr(buf); server_id = uint4korr(buf + SERVER_ID_OFFSET); data_written= uint4korr(buf + EVENT_LEN_OFFSET); if (description_event->binlog_version==1) <---------- 0x0 description_event { log_pos= 0; flags= 0; return; }
[17 Sep 2010 9:47]
Luis Soares
Some analysis... Using mysql-next-mr, the output of: $ mysqlbinlog -v 38718/xp64-relay-bin.000002 ends up in Segmentation Fault. Apparently, this seems to be a corrupted relay log. How it got corrupted in the first place, I cannot tell (might even have been for any reason outside MySQL server domain). Analyzing the stack, I see that it crashes mysqlbinlog exactly where shane pinpointed: #0 0x0000000000414b5d in Log_event (this=0x9eda70, buf=0x7ffff7f9a010 "", description_event=0x0) at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:753 #1 0x0000000000420f0b in Slave_log_event (this=0x9eda70, buf=0x7ffff7f9a010 "", event_len=262144) at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:6155 #2 0x0000000000415650 in Log_event::read_log_event (buf=0x7ffff7f9a010 "", event_len=262144, error=0x7fffffffc6f0, description_event=0xa231d0) at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:1262 #3 0x00000000004152ae in Log_event::read_log_event (file=0x7fffffffd7b0, description_event=0xa231d0) at /home/lsoares/Workspace/bzr/work/bugfixing/55460/mysql-next-mr-bugfixing/sql/log_event.cc:1162 (...) And the reason is because server/mysqlbinlog is loading a Slave_log_event from the relay log which should never happen. So, the server/mysqlbinlog goes on to forcing a NULL Format_descriptor_event in Slave_log_event constructor. Some snippets of the source code: case SLAVE_EVENT: /* can never happen (unused event) */ ev = new Slave_log_event(buf, event_len); break; (...) /** This code is not used, so has not been updated to be format-tolerant. */ Slave_log_event::Slave_log_event(const char* buf, uint event_len) :Log_event(buf,0) /*unused event*/ ,mem_pool(0),master_host(0) { Anyway, eventually, everything blows up at the Log_event base class constructor, when it tries to use the description event to setup the event contents. So it does not matter which version of the server one uses to check the corrupted relay log. It will always seg fault. We can/should however, make the server/mysqlbinlog fail gracefully when such cases happen... And... Maybe this is the fix to this bug...
[20 Sep 2010 20:09]
Mats Kindahl
This problem will be resolved by WL#2540.
[20 Sep 2010 21:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118645 3295 Luis Soares 2010-09-20 BUG#38718: slave sql thread crashes when reading relay log Suprisingly, a Slave_log_event would show up in the binary log. This event is never used and should not appear in the logs. As such, when the slave (or the mysqlbinlog tool) reads the event, it will hit an invalid pointer (reference to the descriptor event when deserializing the Slave_log_event was purposodely set to NULL). The presence of the Slave_log_event denotes a corrupted log, but we cannot tell how the log got corrupted in the first place. However, we can make the server cope with such events when it reads them - in case of log corruption - and fail gracefully. This patch makes the server/mysqlbinlog to report that it has found an invalid log event when Slave_log_event is read.
[21 Sep 2010 12:11]
Mats Kindahl
The immediate problem of reading a Slave_log_event is fixed by the attached patch, but the general problem of detecting corrupt binary logs require checksums to be implemented (WL#2540).
[6 Oct 2010 11:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/120098 3523 Luis Soares 2010-10-06 BUG#38718: slave sql thread crashes when reading relay log Suprisingly, a Slave_log_event would show up in the binary log. This event is never used and should not appear in the logs. As such, when the slave (or the mysqlbinlog tool) reads the event, it will hit an invalid pointer (reference to the descriptor event when deserializing the Slave_log_event was purposodely set to NULL). The presence of the Slave_log_event denotes a corrupted log, but we cannot tell how the log got corrupted in the first place. However, we can make the server cope with such events when it reads them - in case of log corruption - and fail gracefully. This patch makes the server/mysqlbinlog to report that it has found an invalid log event when Slave_log_event is read.
[13 Oct 2010 10:34]
Luis Soares
Queued in mysql-5.1-bugteam, mysql-5.5-bugteam and mysql-trunk-merge: - http://pb2.norway.sun.com/web.py?template=push_details&push=1647867 - http://pb2.norway.sun.com/web.py?template=push_details&push=1647840 - http://pb2.norway.sun.com/web.py?template=push_details&push=1647834
[29 Oct 2010 14:14]
Jon Stephens
Documented in the 5.1.53 and 5.5.8 changelogs as follows: Trying to read from a binary log containing a log event of an invalid type caused the slave to crash. Set Need Merge, waiting for push to trunk.
[4 Nov 2010 17:26]
Jon Stephens
Still waiting for merge to trunk.
[13 Nov 2010 16:25]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (merge vers: 5.6.99-m5) (pib:21)
[13 Nov 2010 16:36]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:alexander.nozdrin@oracle.com-20101113152540-gxro4g0v29l27f5x) (pib:21)
[15 Nov 2010 11:55]
Jon Stephens
Bug does not appear in a 5.6 release, no additional changelog entries required. Closed.
[18 Nov 2010 15:54]
Bugs System
Pushed into mysql-5.1 5.1.54 (revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (version source revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (merge vers: 5.1.54) (pib:21)
[16 Dec 2010 22:30]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)