Bug #54820 rpl.rpl_heartbeat_basic fails sporadically again
Submitted: 26 Jun 2010 7:28 Modified: 28 Jan 2011 12:21
Reporter: Alexander Nozdrin Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Replication Severity:S3 (Non-critical)
Version:M3 (Celosia), M4 (Dahlia) OS:Any
Assigned to: Serge Kozlov CPU Architecture:Any
Tags: experimental, pb2, sporadic, test failure

[26 Jun 2010 7:28] Alexander Nozdrin
Description:
rpl.rpl_heartbeat_basic started to fail sporadically again.

Symptoms:
---------------------------------------------------------------------
mysqltest: In included file "./include/wait_for_slave_param.inc": At line 83: 

The result from queries just before the failure was:
< snip >
SELECT NOW();
NOW()
2010-06-24 21:05:41
**** SHOW MASTER STATUS on master ****
SHOW MASTER STATUS;
File	master-bin.000002
Position	107
Binlog_Do_DB	
Binlog_Ignore_DB	

**** SHOW PROCESSLIST on master ****
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info
1	root	localhost:37729	test	Query	0	NULL	SHOW PROCESSLIST

**** SHOW BINLOG EVENTS on master ****
SHOW BINLOG EVENTS IN 'master-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	107	Server ver: 5.6.99-m4-debug-log, Binlog ver: 4
master-bin.000001	107	Stop	1	126	
---------------------------------------------------------------------

The test case has been marked experimental due to this bug.

How to repeat:
Xref: http://tinyurl.com/2w7qdhb
[26 Jun 2010 10:53] Sveta Smirnova
Thank you for the report.

Verfied as described using command `./mtr rpl.rpl_heartbeat_basic  --repeat=100` It was 80s run which failed
[20 Jul 2010 10:55] Serge Kozlov
Almost all cases appeared under Valgrind environment when server works slowly. By default slave does 10 attempts to connect to master every second. It means that only 10 second slave tries to establish connection to master. Seems for Valgind it is not enough . The patch should increase timeout between attempts to connect to master and especially for Valgrind environment.
[20 Jul 2010 11:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/113941

3128 Serge Kozlov	2010-07-20
      Bug#54820. Set separate values of MASTER_CONNECT_RETRY for usual execution and for Valgrind environment
[20 Jul 2010 15:28] Serge Kozlov
Hi, Alfranio. The patch just removes some lacks of test code. Your sample of failure is an error in server(replication) code.
[26 Nov 2010 14:29] Luis Soares
In addition, I also see (checked in recent valgrind runs):

101123 22:51:52 [Note] Error reading relay log event: slave SQL thread was killed
101123 22:51:53 [Note] Slave I/O thread: Start semi-sync replication to master 'root@127.0.0.1:13000' in log 'master-bin.000001' at position 1527
101123 22:51:53 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 1527, relay log './slave-relay-bin.000003' position: 1674
101123 22:51:53 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:13000',replication started in log 'master-bin.000001' at position 1527
==2361== Warning: invalid file descriptor -1 in syscall read()
101123 22:51:57 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)

However, this seems to have the exact same explanation (transient network
problems) that was given for BUG#50414. As such, I believe that currently
this warning is actually suppressed.

Judging from Serge's comment, it seems that this might be just a 
problem with the test case.
[26 Nov 2010 18:02] Luis Soares
There are also another kind of failure exhibited by this test, but
it's not specific to this test only. See: BUG#58525 .
[29 Nov 2010 10:55] Luis Soares
I was asked to have a look at this. While investigating this
issue, I have found three different problems:

  P1. Warning: invalid file descriptor -1 in syscall read()

  P2. Failure B:
      mysqltest: In included file "./include/wait_for_status_var.inc": At line 68: 
  
  P3. Failure A:
      mysqltest: In included file "./include/wait_for_slave_param.inc": At line 83:

P1, seems to be the same issue reported in BUG#50414, in
particular the case about transient network errors. According to
the explanation for BUG#50414 fix, this is invalid file
descriptor (-1) symptom is expected on master/slave stop/restart,
wich seems to be the same case for this test. So, apparently,
this must have been fixed by BUG#50414.

P2, I have found this in PB2 failure logs. This seems to have
been caused by a bad implementation of
include/wait_for_status_var.inc . The problem was that the
instructions in this include file, were actually comparing
numbers as if they were strings, thence, in some cases, making
the test case to fail. This is BUG#58525.

P3, this seems to be the actual bug reported. Apparently, running
this test in slow hosts requires more than 10 sec reconnection
period for it not to timeout. If the host is slow enough, the
test case fails (which happens sporadically on PB2). So, just to
conclude, this must be BUG#54820 (this one).
[7 Dec 2010 21:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/126268

3210 Serge Kozlov	2010-12-08
      Bug#54820: 
      master_connect_retty increased for Valgrind environment
      Bug#58525:
      separate string and numbers for comparing
[29 Dec 2010 20:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127690

3449 Serge Kozlov	2010-12-29
      Bug#54820: master_connect_retry increased for Valgrind environment support
      Bug#58525: separate string and numbers for comparing
[13 Jan 2011 21:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128691

3241 Serge Kozlov	2011-01-14
      Bug#54820: master_connect_retry increased for Valgrind environment support
      Bug#58525: separate string and numbers for comparing
[13 Jan 2011 21:52] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:serge.kozlov@oracle.com-20110113213621-4qccnwyy72pxtnhh) (version source revid:serge.kozlov@oracle.com-20110113213621-4qccnwyy72pxtnhh) (merge vers: 5.6.2) (pib:24)
[13 Jan 2011 21:54] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:serge.kozlov@oracle.com-20110113211817-xzrirqyu3adanfvv) (version source revid:serge.kozlov@oracle.com-20110113211817-xzrirqyu3adanfvv) (merge vers: 5.5.9) (pib:24)
[24 Jan 2011 15:54] Jon Stephens
Test changes only. Closed.
[28 Jan 2011 12:21] Jon Stephens
Setting back to Closed.