Bug #34427 rpl_variables allows slave to miss rendezvous
Submitted: 8 Feb 2008 17:31 Modified: 26 Mar 2008 16:30
Reporter: Andrei Elkin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version:5.0+ OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any

[8 Feb 2008 17:31] Andrei Elkin
Description:
There is no instruction in the test that enforces the slave successfully connect
to the master.
The way the test is currently written allows the slave to be late for rendezvous 
so that about-connecting time queries to the master fail and are error-logged

to be seen in Warnings of pb like

 slave.err: rpl.rpl_variables: 080207 16:36:45 [ERROR] The slave I/O thread stops because querying master with 'SET @master_heartbeat_period= 60000002048' failed; error: 2013  'Lost connection to MySQL server at 'reading init

and in the slave's errorlog like

CURRENT_TEST: rpl.rpl_variables
080207 16:36:44 [Note] Plugin 'InnoDB' disabled by command line option
080207 16:36:44 [Note] Event Scheduler: Loaded 0 events
080207 16:36:44 [Note] /data0/pushbuild/pb/mysql-6.0-rpl/18/mysql-6.0-rpl-exp/sql/mysqld: ready for connections.
Version: '6.0.5-alpha-pb18-debug-log'  socket: '/dev/shm/pbtmp-n_mix-100/slave.sock'  port: 11002  Source distribution
080207 16:36:45 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log '/dev/shm/var-n_mix-100/log/slave-relay-bin.000001' position: 4
080207 16:36:45 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:11000',replication started in log 'FIRST' at position 4
080207 16:36:45 [Warning] "SELECT UNIX_TIMESTAMP()" failed on master, do not trust column Seconds_Behind_Master of SHOW SLAVE STATUS. Error: Lost connection to MySQL server at 'reading initial communication packet', system error: 111 (2013)
080207 16:36:45 [ERROR] The slave I/O thread stops because querying master with 'SET @master_heartbeat_period= 60000002048' failed; error: 2013  'Lost connection to MySQL server at 'reading initial communication packet', system error: 111'

How to repeat:
Make slave server behave slow e.g use sleep at get_master_clock_and_version(),
or watch pb results for rpl_variables.test.

https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=mysql-6.0-rpl&entry=mkindahl@dl1...
https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=mysql-6.0-rpl&entry=mkindahl@dl1...

Suggested fix:
Augment the test with adding a synchronization point which will guarantee the slave has connected to master, and therefore the error and warnings out from 
get_master_version_and_clock() can not be.
[8 Feb 2008 19:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41962

ChangeSet@1.2578, 2008-02-08 21:03:10+02:00, aelkin@mysql1000.dsl.inet.fi +1 -0
  bug#34427 slave misses rendezvous in rpl_variables
  
  There was no instruction in the test that enforces the slave successfully connect
  to the master.
  The way the test was been written allowed the slave to had been late for rendezvous 
  so that about-connecting time queries to the master failed and are error-logged
  to had been seen in Warnings of pb.
  
  Fixed with adding a sychronization primitive to the test.
  No test case is possible, observe error logs on pb.
  
  Todo: revise need of rpl_report.pl's rules due to failing execution of
  queries from get_master_verion_and_clock().
  Any test should try to use a synchornization primitive like the current fix
  makes and do not let the slave to miss successful connecting.
[12 Feb 2008 10:48] Andrei Elkin
Pushed to 5.0-rpl.
[25 Mar 2008 11:22] Bugs System
Pushed into 5.0.60
[25 Mar 2008 11:23] Bugs System
Pushed into 5.1.24-rc
[26 Mar 2008 16:30] Paul DuBois
Test case change. No changelog entry needed.
[26 Mar 2008 19:01] Bugs System
Pushed into 6.0.5-alpha