Bug #38170 rpl_variables failed on pushbuild: could not sync with master
Submitted: 16 Jul 2008 13:34 Modified: 1 Oct 2008 14:23
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S7 (Test Cases)
Version:5.1-rpl OS:Any
Assigned to: Sven Sandberg CPU Architecture:Any
Tags: could not sync slave with master, pushbuild, rpl_variables, test failure

[16 Jul 2008 13:34] Sven Sandberg
Description:
rpl_variables failed with the following message:

rpl.rpl_variables                        [ fail ]

CURRENT_TEST: rpl.rpl_variables
=== SHOW MASTER STATUS ===
---- 1. ----
File	slave-bin.000001
Position	106
Binlog_Do_DB	
Binlog_Ignore_DB	
==========================

=== SHOW SLAVE STATUS ===
---- 1. ----
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	12040
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	34525
Relay_Log_File	slave-relay-bin.000001
Relay_Log_Pos	4
Relay_Master_Log_File	
Slave_IO_Running	Yes
Slave_SQL_Running	No
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	0
Last_Error	
Skip_Counter	0
Exec_Master_Log_Pos	0
Relay_Log_Space	34951
Until_Condition	None
Until_Log_File	
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File	
Master_SSL_CA_Path	
Master_SSL_Cert	
Master_SSL_Cipher	
Master_SSL_Key	
Seconds_Behind_Master	
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error	
Last_SQL_Errno	0
Last_SQL_Error	
=========================

mysqltest: At line 696: could not sync with master ('select master_pos_wait('master-bin.000001', 34525)' returned NULL)

The result from queries just before the failure was:
< snip >
17	NULL	3	NULL
18	NULL	4	NULL
19	NULL	5	NULL
20	NULL	6	NULL
21	NULL	NULL	sv_SE
22	NULL	NULL	sv_FI
23	NULL	NULL	ar_TN
24	NULL	NULL	ar_IQ
25	NULL	NULL	
26	NULL	NULL	IGNORE_SPACE,NO_AUTO_CREATE_USER
27	NULL	NULL	NO_DIR_IN_CREATE,NO_AUTO_VALUE_ON_ZERO,NO_BACKSLASH_ESCAPES,NO_ENGINE_SUBSTITUTION
28	NULL	NULL	NO_KEY_OPTIONS,NO_TABLE_OPTIONS,NO_FIELD_OPTIONS
29	NULL	20	NULL
30	NULL	30	NULL
31	NULL	NULL	Bergsbrunna
32	NULL	NULL	Centrum
Comparing tables master:test.tstmt and master:test.tproc
Comparing tables master:test.tstmt and master:test.tfunc
Comparing tables master:test.tstmt and master:test.ttrig
Comparing tables master:test.tstmt and master:test.tprep

More results from queries before failure can be found in /dev/shm/var-ps_row-121/6/log/rpl_variables.log

Warnings from just before the error:
Note 1051 Unknown table 't1' 
Note 1051 Unknown table 't2' 
Note 1051 Unknown table 't3' 
Note 1051 Unknown table 't4' 
Note 1051 Unknown table 't5' 
Note 1051 Unknown table 't6' 
Note 1051 Unknown table 't7' 
Note 1051 Unknown table 't8' 

 - saving '/dev/shm/var-ps_row-121/6/log/rpl.rpl_variables/' to '/dev/shm/var-ps_row-121/log/rpl.rpl_variables/'

Retrying test, attempt(2/3)...

How to repeat:
https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-5.1-rpl&order=21 , 'sles10-ia64-a' vanilla/ps_row and 'sles10-ia64-a' no innodb/n_mix

Funny, it has not failed in any other push, see xref here:
http://tinyurl.com/6dmlos
However, the source diff for that push does not look like it could affect this test case.

Note that the slave IO thread is running but slave SQL is not.
[17 Jul 2008 14:34] Sven Sandberg
The reason for the failure is that the test sets @@global.init_slave to some garbage value. The test executes START SLAVE before it modifies @@global.init_slave. This causes a race between the client thread that modifies @@global.init_slave and the slave SQL thread. With high probability, the old value of @@global.init_slave was used, which is ok. But the slave stops if the new (garbage) value of @@global.init_slave is used.

In the test, the actual value of @@global.init_slave is insignificant, the only thing that matters is that it is different from the value on master. Therefore, we can fix the test case by setting @@global.init_slave to valid SQL.
[17 Jul 2008 16:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49950

2703 Sven Sandberg	2008-07-17
      BUG#38170: rpl_variables failed on pushbuild: could not sync with master
      Problem: the test set @@global.init_slave to garbage at a time
      which was not guaranteed to be after the time when the slave's
      SQL thread used it. That would cause the slave's SQL thread to
      stop in rare cases.
      Fix: The test does not care about the value of
      @@global.init_slave, except that it should be different on
      master and slave. Hence, we set @@global.init_slave to
      something that is valid SQL.
[18 Jul 2008 11:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50023

2704 Sven Sandberg	2008-07-18
      BUG#38178: rpl_loaddata_map fails sporadically in pushbuild
      Problem: master binlog has 'create table t1'. Master binlog
      was removed before slave could replicate it. In test's cleanup
      code, master did 'drop table t1', which caused slave sql
      thread to stop with an error since slave sql thread did not
      know about t1.
      Fix: t1 is just an auxiliary construction, only needed on
      master. Hence, we turn off binlogging before t1 is created,
      drop t1 as soon as we don't need it anymore, and then turn
      on binlogging again.
[18 Jul 2008 11:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50024

2704 Sven Sandberg	2008-07-18
      BUG#38178: rpl_loaddata_map fails sporadically in pushbuild
      Problem: master binlog has 'create table t1'. Master binlog
      was removed before slave could replicate it. In test's cleanup
      code, master did 'drop table t1', which caused slave sql
      thread to stop with an error since slave sql thread did not
      know about t1.
      Fix: t1 is just an auxiliary construction, only needed on
      master. Hence, we turn off binlogging before t1 is created,
      drop t1 as soon as we don't need it anymore, and then turn
      on binlogging again.
[14 Aug 2008 7:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51600

2677 He Zhenxing	2008-08-14 [merge]
      Merge with mysql-5.1-rpl-testfixes
[15 Aug 2008 2:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51688

2769 He Zhenxing	2008-08-15 [merge]
      Merge 6.0-rpl-testfixes -> 6.0
[19 Aug 2008 19:56] Bugs System
Pushed into 6.0.7-alpha  (revid:sven@mysql.com-20080717162659-b7p6joq1zu1255ov) (version source revid:sven@mysql.com-20080818175803-c1nutd5773r6b4gd) (pib:3)
[19 Aug 2008 20:30] Paul DuBois
Test case changes. No changelog entry needed.

Setting report to Patch Queued pending push of fix into 5.1.x.
[25 Aug 2008 21:05] Chuck Bell
Released in 6.0.7
[27 Aug 2008 1:17] Paul DuBois
Setting report to NDI pending push into 5.1.x.
[28 Aug 2008 20:17] Bugs System
Pushed into 6.0.7-alpha  (revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (version source revid:cbell@mysql.com-20080822132131-uveo6wiuecy6m2b8) (pib:3)
[29 Aug 2008 0:43] Paul DuBois
Setting report to NDI pending push into 5.1.x.
[14 Sep 2008 1:24] Bugs System
Pushed into 6.0.7-alpha  (revid:sven@mysql.com-20080717162659-b7p6joq1zu1255ov) (version source revid:vvaintroub@mysql.com-20080804094710-jb2qpqxpf2ir2gf3) (pib:3)
[16 Sep 2008 4:25] Paul DuBois
Setting report to NDI pending push into 5.1.x.