Bug #54820 | rpl.rpl_heartbeat_basic fails sporadically again | ||
---|---|---|---|
Submitted: | 26 Jun 2010 7:28 | Modified: | 28 Jan 2011 12:21 |
Reporter: | Alexander Nozdrin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Tests: Replication | Severity: | S3 (Non-critical) |
Version: | M3 (Celosia), M4 (Dahlia) | OS: | Any |
Assigned to: | Serge Kozlov | CPU Architecture: | Any |
Tags: | experimental, pb2, sporadic, test failure |
[26 Jun 2010 7:28]
Alexander Nozdrin
[26 Jun 2010 10:53]
Sveta Smirnova
Thank you for the report. Verfied as described using command `./mtr rpl.rpl_heartbeat_basic --repeat=100` It was 80s run which failed
[20 Jul 2010 10:55]
Serge Kozlov
Almost all cases appeared under Valgrind environment when server works slowly. By default slave does 10 attempts to connect to master every second. It means that only 10 second slave tries to establish connection to master. Seems for Valgind it is not enough . The patch should increase timeout between attempts to connect to master and especially for Valgrind environment.
[20 Jul 2010 11:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/113941 3128 Serge Kozlov 2010-07-20 Bug#54820. Set separate values of MASTER_CONNECT_RETRY for usual execution and for Valgrind environment
[20 Jul 2010 15:28]
Serge Kozlov
Hi, Alfranio. The patch just removes some lacks of test code. Your sample of failure is an error in server(replication) code.
[26 Nov 2010 14:29]
Luis Soares
In addition, I also see (checked in recent valgrind runs): 101123 22:51:52 [Note] Error reading relay log event: slave SQL thread was killed 101123 22:51:53 [Note] Slave I/O thread: Start semi-sync replication to master 'root@127.0.0.1:13000' in log 'master-bin.000001' at position 1527 101123 22:51:53 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 1527, relay log './slave-relay-bin.000003' position: 1674 101123 22:51:53 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:13000',replication started in log 'master-bin.000001' at position 1527 ==2361== Warning: invalid file descriptor -1 in syscall read() 101123 22:51:57 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) However, this seems to have the exact same explanation (transient network problems) that was given for BUG#50414. As such, I believe that currently this warning is actually suppressed. Judging from Serge's comment, it seems that this might be just a problem with the test case.
[26 Nov 2010 18:02]
Luis Soares
There are also another kind of failure exhibited by this test, but it's not specific to this test only. See: BUG#58525 .
[29 Nov 2010 10:55]
Luis Soares
I was asked to have a look at this. While investigating this issue, I have found three different problems: P1. Warning: invalid file descriptor -1 in syscall read() P2. Failure B: mysqltest: In included file "./include/wait_for_status_var.inc": At line 68: P3. Failure A: mysqltest: In included file "./include/wait_for_slave_param.inc": At line 83: P1, seems to be the same issue reported in BUG#50414, in particular the case about transient network errors. According to the explanation for BUG#50414 fix, this is invalid file descriptor (-1) symptom is expected on master/slave stop/restart, wich seems to be the same case for this test. So, apparently, this must have been fixed by BUG#50414. P2, I have found this in PB2 failure logs. This seems to have been caused by a bad implementation of include/wait_for_status_var.inc . The problem was that the instructions in this include file, were actually comparing numbers as if they were strings, thence, in some cases, making the test case to fail. This is BUG#58525. P3, this seems to be the actual bug reported. Apparently, running this test in slow hosts requires more than 10 sec reconnection period for it not to timeout. If the host is slow enough, the test case fails (which happens sporadically on PB2). So, just to conclude, this must be BUG#54820 (this one).
[7 Dec 2010 21:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126268 3210 Serge Kozlov 2010-12-08 Bug#54820: master_connect_retty increased for Valgrind environment Bug#58525: separate string and numbers for comparing
[29 Dec 2010 20:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127690 3449 Serge Kozlov 2010-12-29 Bug#54820: master_connect_retry increased for Valgrind environment support Bug#58525: separate string and numbers for comparing
[13 Jan 2011 21:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/128691 3241 Serge Kozlov 2011-01-14 Bug#54820: master_connect_retry increased for Valgrind environment support Bug#58525: separate string and numbers for comparing
[13 Jan 2011 21:52]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:serge.kozlov@oracle.com-20110113213621-4qccnwyy72pxtnhh) (version source revid:serge.kozlov@oracle.com-20110113213621-4qccnwyy72pxtnhh) (merge vers: 5.6.2) (pib:24)
[13 Jan 2011 21:54]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:serge.kozlov@oracle.com-20110113211817-xzrirqyu3adanfvv) (version source revid:serge.kozlov@oracle.com-20110113211817-xzrirqyu3adanfvv) (merge vers: 5.5.9) (pib:24)
[24 Jan 2011 15:54]
Jon Stephens
Test changes only. Closed.
[28 Jan 2011 12:21]
Jon Stephens
Setting back to Closed.