Bug #55263 assert in check_binlog_magic
Submitted: 14 Jul 2010 14:54 Modified: 15 Nov 2010 13:19
Reporter: Matthias Leich Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.1.49-debug, 5.1.50-bzr OS:Any
Assigned to: Luis Manuel Oliveira Soares CPU Architecture:Any
Tags: assert

[14 Jul 2010 14:54] Matthias Leich
Description:
The assert happens in log.cc:1744 :
int check_binlog_magic(IO_CACHE* log, const char** errmsg)
{
  char magic[4];
  DBUG_ASSERT(my_b_tell(log) == 0); <-------------

  if (my_b_read(log, (uchar*) magic, sizeof(magic)))
...

./mysql-test-run.pl  --mem --mysqld=--binlog-format=statement --repeat=2 rpl_cross_version --skip-ndb --testcase-timeout=20 --suite=rpl rpl_cross_version

Result on 5.1.49-debug
mysql-5.1 revno: 3411 2010-07-08 compiled with debug
----------------------------------------------------
Thread 1 (process 14291):
#0  0x00007f234cdeace6 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000b21a1f in my_write_core (sig=6) at stacktrace.c:329
#2  0x00000000006bd0ed in handle_segfault (sig=6) at mysqld.cc:2571
#3  <signal handler called>
#4  0x00007f234bff35c5 in raise () from /lib64/libc.so.6
#5  0x00007f234bff4bb3 in abort () from /lib64/libc.so.6
#6  0x00007f234bfec1e9 in __assert_fail () from /lib64/libc.so.6
#7  0x0000000000791f15 in check_binlog_magic (log=0x1212e50, errmsg=0x400f1a68) at log.cc:1744
#8  0x00000000008721fe in next_event (rli=0x1212670) at slave.cc:4361
#9  0x0000000000875be1 in exec_relay_log_event (thd=0x123dd78, rli=0x1212670) at slave.cc:2260
#10 0x0000000000876c48 in handle_slave_sql (arg=0x1211280) at slave.cc:3080
#11 0x00007f234cde6040 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f234c09408d in clone () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()

How to repeat:
See above
[14 Jul 2010 15:11] Valeriy Kravchuk
Verified on Mac OS X:

valeriy-kravchuks-macbook-pro:mysql-test openxs$ ./mysql-test-run.pl  --mem --mysqld=--binlog-format=statement --repeat=2 rpl_cross_version --skip-ndb --testcase-timeout=20 --suite=rpl rpl_cross_version
Logging: ./mysql-test-run.pl  --mem --mysqld=--binlog-format=statement --repeat=2 rpl_cross_version --skip-ndb --testcase-timeout=20 --suite=rpl rpl_cross_version

...

Installing system database...
Using server port 51143

==============================================================================

TEST                                      RESULT   TIME (ms)
------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
rpl.rpl_cross_version                    [ pass ]    892
rpl.rpl_cross_version                    [ fail ]
        Test ended at 2010-07-14 18:09:06

CURRENT_TEST: rpl.rpl_cross_version
mysqltest: In included file "./include/wait_for_slave_param.inc": At line 68: Error running query 'SHOW SLAVE STATUS': 2013 Lost connection to MySQL server during query

The result from queries just before the failure was:
==== Initialize ====
Setting up fake replication from MYSQL_TEST_DIR/suite/binlog/std_data/binlog_old_version_4_1.000001
==== Test ====
start slave sql_thread;

Server [mysqld.1 - pid: 74238, winpid: 74238, exit: 256] failed during test run
Server log from this test:
100714 18:08:57 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='dummy.localdomain', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='dummy.localdomain', master_port='3306', master_log_file='', master_log_pos='4'.
100714 18:08:57 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log 'slave-relay-bin-fake.000001' position: 4
100714 18:08:57 [Warning] Statement may not be safe to log in statement format. Statement: INSERT INTO t1 VALUES (RAND() * 1000000, 'random')
100714 18:08:57 [Note] next log './slave-relay-bin.000001' is currently active
Assertion failed: (((log)->pos_in_file + (size_t) (*(log)->current_pos - (log)->request_pos)) == 0), function unknown function, file log.cc, line 1744.
100714 18:08:57 - mysqld got signal 6 ;
...
[22 Jul 2010 10:50] Luis Manuel Oliveira Soares
The following test case seems to trigger the same assertion:

-- source include/master-slave.inc
-- source include/have_binlog_format_statement.inc

-- connection slave
-- source include/stop_slave.inc
FLUSH LOGS;
FLUSH LOGS;
FLUSH LOGS;

SHOW MASTER LOGS;
--query_vertical SHOW SLAVE STATUS

--eval change master to relay_log_file='./slave-relay-bin.000006', relay_log_pos=0; 
show variables like 'relay%';
START SLAVE SQL_THREAD;

## Ooopsss DBA was wrong, he meant slave-relay-bin.000004 instead of
## slave-relay-bin.000006
## So he reverts changes introduced by slave-relay-bin.000006 and 
## changes back to slave-relay-bin.000004

## Some DELETEs and DROPs are manually done by the DBA

STOP SLAVE SQL_THREAD;
--eval change master to relay_log_file='./slave-relay-bin.000004', relay_log_pos=0; 
show variables like 'relay%';
START SLAVE SQL_THREAD;

-- exit
[23 Jul 2010 14:38] Valeriy Kravchuk
Verified that new test case from Luis fails in 5.1.50 from bzr:

macbook-pro:mysql-test openxs$ ./mtr bug55263
Logging: ./mtr  bug55263
100723 17:35:39 [Warning] Setting lower_case_table_names=2 because file system for /var/folders/dX/dXCzvuSlHX4Op1g-o1jIWk+++TI/-Tmp-/wpzEbOYj9Y/ is case insensitive
100723 17:35:39 [Note] Plugin 'FEDERATED' is disabled.
100723 17:35:39 [Note] Plugin 'ndbcluster' is disabled.
MySQL Version 5.1.50
Checking supported features...
 - using ndbcluster when necessary, mysqld supports it
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
vardir: /Users/openxs/dbs/5.1/mysql-test/var
Checking leftover processes...
Removing old var directory...
 - WARNING: Using the 'mysql-test/var' symlink
Creating var directory '/Users/openxs/dbs/5.1/mysql-test/var'...
Installing system database...
Using server port 59732

==============================================================================

TEST                                      RESULT   TIME (ms)
------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
main.bug55263                            [ fail ]
        Test ended at 2010-07-23 17:35:55

CURRENT_TEST: main.bug55263
--- /Users/openxs/dbs/5.1/mysql-test/r/bug55263.result	2010-07-23 17:35:32.000000000 +0300
+++ /Users/openxs/dbs/5.1/mysql-test/r/bug55263.reject	2010-07-23 17:35:50.000000000 +0300
@@ -0,0 +1,74 @@
+stop slave;
+drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
+reset master;
+reset slave;
+drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
+start slave;
+include/stop_slave.inc
+FLUSH LOGS;
+FLUSH LOGS;
+FLUSH LOGS;
+SHOW MASTER LOGS;
+Log_name	File_size
+slave-bin.000001	149
+slave-bin.000002	149
+slave-bin.000003	149
+slave-bin.000004	106
+SHOW SLAVE STATUS;
+Slave_IO_State	
+Master_Host	127.0.0.1
+Master_User	root
+Master_Port	13000
+Connect_Retry	1
+Master_Log_File	master-bin.000001
+Read_Master_Log_Pos	106
+Relay_Log_File	slave-relay-bin.000003
+Relay_Log_Pos	252
+Relay_Master_Log_File	master-bin.000001
+Slave_IO_Running	No
+Slave_SQL_Running	No
+Replicate_Do_DB	
+Replicate_Ignore_DB	
+Replicate_Do_Table	
+Replicate_Ignore_Table	
+Replicate_Wild_Do_Table	
+Replicate_Wild_Ignore_Table	
+Last_Errno	0
+Last_Error	
+Skip_Counter	0
+Exec_Master_Log_Pos	106
+Relay_Log_Space	872
+Until_Condition	None
+Until_Log_File	
+Until_Log_Pos	0
+Master_SSL_Allowed	No
+Master_SSL_CA_File	
+Master_SSL_CA_Path	
+Master_SSL_Cert	
+Master_SSL_Cipher	
+Master_SSL_Key	
+Seconds_Behind_Master	NULL
+Master_SSL_Verify_Server_Cert	No
+Last_IO_Errno	0
+Last_IO_Error	
+Last_SQL_Errno	0
+Last_SQL_Error	
+change master to relay_log_file='./slave-relay-bin.000006', relay_log_pos=0; ;
+show variables like 'relay%';
+Variable_name	Value
+relay_log	slave-relay-bin
+relay_log_index	
+relay_log_info_file	relay-log.info
+relay_log_purge	OFF
+relay_log_space_limit	0
+START SLAVE SQL_THREAD;
+STOP SLAVE SQL_THREAD;
+change master to relay_log_file='./slave-relay-bin.000004', relay_log_pos=0; ;
+show variables like 'relay%';
+Variable_name	Value
+relay_log	slave-relay-bin
+relay_log_index	
+relay_log_info_file	relay-log.info
+relay_log_purge	OFF
+relay_log_space_limit	0
+START SLAVE SQL_THREAD;

mysqltest: Result length mismatch

Server [mysqld.2 - pid: 93824, winpid: 93824, exit: 256] failed during test run
Server log from this test:
100723 17:35:43 [Warning] '--default-character-set' is deprecated and will be removed in a future release. Please use '--character-set-server' instead.
100723 17:35:43 [Warning] 'for replication startup options' is deprecated and will be removed in a future release. Please use ''CHANGE MASTER'' instead.
100723 17:35:43 [Warning] Setting lower_case_table_names=2 because file system for /Users/openxs/dbs/5.1/mysql-test/var/mysqld.2/data/ is case insensitive
100723 17:35:43 [Note] Plugin 'FEDERATED' is disabled.
100723 17:35:43 [Note] Plugin 'InnoDB' is disabled.
100723 17:35:43 [Note] Plugin 'ndbcluster' is disabled.
100723 17:35:44 [Note] Event Scheduler: Loaded 0 events
100723 17:35:44 [Note] /Users/openxs/dbs/5.1/libexec/mysqld: ready for connections.
Version: '5.1.50-debug-log'  socket: '/Users/openxs/dbs/5.1/mysql-test/var/tmp/mysqld.2.sock'  port: 13001  Source distribution
100723 17:35:46 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './slave-relay-bin.000001' position: 4
100723 17:35:46 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:13000',replication started in log 'FIRST' at position 4
100723 17:35:46 [Note] next log './slave-relay-bin.000002' is currently active
100723 17:35:47 [Note] next log './slave-relay-bin.000003' is currently active
100723 17:35:47 [Note] Slave I/O thread killed while reading event
100723 17:35:47 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 106
100723 17:35:47 [Note] Error reading relay log event: slave SQL thread was killed
100723 17:35:50 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='13000', master_log_file='master-bin.000001', master_log_pos='106'. New state master_host='127.0.0.1', master_port='13000', master_log_file='master-bin.000001', master_log_pos='106'.
100723 17:35:50 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 106, relay log './slave-relay-bin.000006' position: 4
100723 17:35:50 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 106
100723 17:35:50 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='13000', master_log_file='master-bin.000001', master_log_pos='106'. New state master_host='127.0.0.1', master_port='13000', master_log_file='master-bin.000001', master_log_pos='106'.
100723 17:35:50 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 106, relay log './slave-relay-bin.000004' position: 4
100723 17:35:50 [Note] next log './slave-relay-bin.000005' is not active
100723 17:35:50 [Note] next log './slave-relay-bin.000006' is currently active
Assertion failed: (((log)->pos_in_file + (size_t) (*(log)->current_pos - (log)->request_pos)) == 0), function unknown function, file log.cc, line 1739.
100723 17:35:50 - mysqld got signal 6 ;
...
[23 Jul 2010 16:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114246

3489 Luis Soares	2010-07-23
      BUG#55263: assert in check_binlog_magic
      
      The procedure for setting up a fake binary log, by changing the
      relay log files manually, is run twice when we issue mtr with
      --repeat=2. However, when setting it up for the second time,
      neither the sql thread is reset nor the server is restarted. This
      means that previous stale relay log IO_CACHE - from first run -
      is left around. As a consequence, when the test is run for the
      second time, the IO_CACHE for the relay log has position in file
      different than zero, triggering the assertion.
            
      We fix this by changing the call to my_b_seek at the very
      beginning of init_master_info. Instead of doing a seek on the
      cur_log we now do a seek on the currently active relay log. This
      makes sure that even if the SQL thread had read the log while it
      was hot in the past, the position will be set to 0 if it
      continues hot when the SQL thread is restarted and switches back
      to it. If, on the other hand, the log became cold while the SQL
      was stopped and restarted, then when the thread switches back to it
      the log will have to be reopened, thus the position will start
      from the 0, meaning that there is no need to do a seek.
[24 Sep 2010 9:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/119015

3515 Luis Soares	2010-09-24
      BUG#55263: assert in check_binlog_magic
      
      The procedure for setting up a fake binary log, by changing the
      relay log files manually, is run twice when we issue mtr with
      --repeat=2. However, when setting it up for the second time,
      neither the sql thread is reset nor the server is restarted. This
      means that previous stale relay log IO_CACHE metadata - from 
      first run - is left around. As a consequence, when the test is 
      run for the second time, the IO_CACHE for the relay log has 
      position in file different than zero, triggering the assertion.
                  
      We fix this by deploying a call to my_b_seek before calling
      check_binlog_magic in next_event. This prevents the server 
      from asserting, in the cases that the SQL thread was reads
      from a hot log (relay.NNNNN), then is stopped, then is restarted 
      from a previous cold log (relay.MMMMM), and then it reaches 
      the hot log relay.NNNNN again, in which case, the read 
      coordinates are not set to zero, but to the old values.
      
      Additionally, some comments to the source code were added.
[13 Oct 2010 10:34] Luis Manuel Oliveira Soares
Queued in mysql-5.1-bugteam, mysql-5.5-bugteam and mysql-trunk-merge:
- http://pb2.norway.sun.com/web.py?template=push_details&push=1647867
- http://pb2.norway.sun.com/web.py?template=push_details&push=1647840
- http://pb2.norway.sun.com/web.py?template=push_details&push=1647834
[29 Oct 2010 15:05] Jon Stephens
Documented in the 5.1.53 and 5.5.8 changelogs as follows:

        When making changes to relay log settings using CHANGE MASTER
        TO, the IO cache was not cleared. This could result in
        replication failure when the slave attempted to read stale data
        from the cache and stopped with an assertion.
      
Set Need Merge, waiting for push to trunk.
[4 Nov 2010 17:28] Jon Stephens
Already documented in 5.5; waiting for merge to trunk.
[13 Nov 2010 16:24] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)
[13 Nov 2010 16:35] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[15 Nov 2010 13:19] Jon Stephens
Does not appear in a 5.6 release; no additional changelog entries required.

Closed.
[18 Nov 2010 15:53] Bugs System
Pushed into mysql-5.1 5.1.54 (revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (version source revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (merge vers: 5.1.54) (pib:21)
[16 Dec 2010 22:29] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)