MySQL Bugs: #54820: rpl.rpl_heartbeat_basic fails sporadically again

Bug #54820	rpl.rpl_heartbeat_basic fails sporadically again
Submitted:	26 Jun 2010 7:28	Modified:	28 Jan 2011 12:21
Reporter:	Alexander Nozdrin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Tests: Replication	Severity:	S3 (Non-critical)
Version:	M3 (Celosia), M4 (Dahlia)	OS:	Any
Assigned to:	Serge Kozlov	CPU Architecture:	Any
Tags:	experimental, pb2, sporadic, test failure

Description:
rpl.rpl_heartbeat_basic started to fail sporadically again.

Symptoms:
---------------------------------------------------------------------
mysqltest: In included file "./include/wait_for_slave_param.inc": At line 83: 

The result from queries just before the failure was:
< snip >
SELECT NOW();
NOW()
2010-06-24 21:05:41
**** SHOW MASTER STATUS on master ****
SHOW MASTER STATUS;
File	master-bin.000002
Position	107
Binlog_Do_DB	
Binlog_Ignore_DB	

**** SHOW PROCESSLIST on master ****
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info
1	root	localhost:37729	test	Query	0	NULL	SHOW PROCESSLIST

**** SHOW BINLOG EVENTS on master ****
SHOW BINLOG EVENTS IN 'master-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	107	Server ver: 5.6.99-m4-debug-log, Binlog ver: 4
master-bin.000001	107	Stop	1	126	
---------------------------------------------------------------------

The test case has been marked experimental due to this bug.

How to repeat:
Xref: http://tinyurl.com/2w7qdhb

Thank you for the report.

Verfied as described using command `./mtr rpl.rpl_heartbeat_basic  --repeat=100` It was 80s run which failed

Almost all cases appeared under Valgrind environment when server works slowly. By default slave does 10 attempts to connect to master every second. It means that only 10 second slave tries to establish connection to master. Seems for Valgind it is not enough . The patch should increase timeout between attempts to connect to master and especially for Valgrind environment.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/113941

3128 Serge Kozlov	2010-07-20
      Bug#54820. Set separate values of MASTER_CONNECT_RETRY for usual execution and for Valgrind environment

Hi, Alfranio. The patch just removes some lacks of test code. Your sample of failure is an error in server(replication) code.

In addition, I also see (checked in recent valgrind runs):

101123 22:51:52 [Note] Error reading relay log event: slave SQL thread was killed
101123 22:51:53 [Note] Slave I/O thread: Start semi-sync replication to master 'root@127.0.0.1:13000' in log 'master-bin.000001' at position 1527
101123 22:51:53 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 1527, relay log './slave-relay-bin.000003' position: 1674
101123 22:51:53 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:13000',replication started in log 'master-bin.000001' at position 1527
==2361== Warning: invalid file descriptor -1 in syscall read()
101123 22:51:57 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)

However, this seems to have the exact same explanation (transient network
problems) that was given for BUG#50414. As such, I believe that currently
this warning is actually suppressed.

Judging from Serge's comment, it seems that this might be just a 
problem with the test case.

There are also another kind of failure exhibited by this test, but
it's not specific to this test only. See: BUG#58525 .

I was asked to have a look at this. While investigating this
issue, I have found three different problems:

  P1. Warning: invalid file descriptor -1 in syscall read()

  P2. Failure B:
      mysqltest: In included file "./include/wait_for_status_var.inc": At line 68: 
  
  P3. Failure A:
      mysqltest: In included file "./include/wait_for_slave_param.inc": At line 83:

P1, seems to be the same issue reported in BUG#50414, in
particular the case about transient network errors. According to
the explanation for BUG#50414 fix, this is invalid file
descriptor (-1) symptom is expected on master/slave stop/restart,
wich seems to be the same case for this test. So, apparently,
this must have been fixed by BUG#50414.

P2, I have found this in PB2 failure logs. This seems to have
been caused by a bad implementation of
include/wait_for_status_var.inc . The problem was that the
instructions in this include file, were actually comparing
numbers as if they were strings, thence, in some cases, making
the test case to fail. This is BUG#58525.

P3, this seems to be the actual bug reported. Apparently, running
this test in slow hosts requires more than 10 sec reconnection
period for it not to timeout. If the host is slow enough, the
test case fails (which happens sporadically on PB2). So, just to
conclude, this must be BUG#54820 (this one).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/126268

3210 Serge Kozlov	2010-12-08
      Bug#54820: 
      master_connect_retty increased for Valgrind environment
      Bug#58525:
      separate string and numbers for comparing

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127690

3449 Serge Kozlov	2010-12-29
      Bug#54820: master_connect_retry increased for Valgrind environment support
      Bug#58525: separate string and numbers for comparing

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128691

3241 Serge Kozlov	2011-01-14
      Bug#54820: master_connect_retry increased for Valgrind environment support
      Bug#58525: separate string and numbers for comparing

Pushed into mysql-trunk 5.6.2 (revid:serge.kozlov@oracle.com-20110113213621-4qccnwyy72pxtnhh) (version source revid:serge.kozlov@oracle.com-20110113213621-4qccnwyy72pxtnhh) (merge vers: 5.6.2) (pib:24)

Pushed into mysql-5.5 5.5.9 (revid:serge.kozlov@oracle.com-20110113211817-xzrirqyu3adanfvv) (version source revid:serge.kozlov@oracle.com-20110113211817-xzrirqyu3adanfvv) (merge vers: 5.5.9) (pib:24)

Test changes only. Closed.

Setting back to Closed.