Bug #58546 | test rpl_packet timeout failure sporadically on PB | ||
---|---|---|---|
Submitted: | 29 Nov 2010 4:36 | Modified: | 28 Jan 2011 12:21 |
Reporter: | Zhenxing He | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.1 | OS: | Any |
Assigned to: | Libing Song | CPU Architecture: | Any |
Tags: | sporadic, timeout |
[29 Nov 2010 4:36]
Zhenxing He
[1 Dec 2010 3:11]
Zhenxing He
Probably related to BUG#43460
[17 Dec 2010 10:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127165 3525 Li-Bing.Song@sun.com 2010-12-17 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, SQL thread is always stopped first. It guarantees IO thread stops just after the transaction which cannot be rolled back safely and is being replied by SQL thread has been replicated completely.
[22 Dec 2010 15:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127515 3534 Li-Bing.Song@sun.com 2010-12-22 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, SQL thread is always stopped first. It guarantees IO thread stops just after the transaction which cannot be rolled back safely and is being replied by SQL thread has been replicated completely. and added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[24 Dec 2010 3:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127574 3534 Li-Bing.Song@sun.com 2010-12-24 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, SQL thread is always stopped first. It guarantees IO thread stops just after the transaction which cannot be rolled back safely and is being replied by SQL thread has been replicated completely. and added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[24 Dec 2010 4:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127577 3534 Li-Bing.Song@sun.com 2010-12-24 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[17 Jan 2011 7:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/128890 3555 anders.song@greatopensource.com 2011-01-17 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129080 3559 Li-Bing.Song@sun.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129084 3560 anders.song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129085 3559 Anders.Song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129086 3560 anders.song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129089 3560 anders.song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[19 Jan 2011 1:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129116 3560 Libing Song 2011-01-19 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[19 Jan 2011 1:54]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:anders.song@greatopensource.com-20110118175419-0b0m3j5iv5gsfqpn) (version source revid:anders.song@greatopensource.com-20110118175419-0b0m3j5iv5gsfqpn) (merge vers: 5.6.2) (pib:24)
[19 Jan 2011 2:00]
Bugs System
Pushed into mysql-5.1 5.1.56 (revid:anders.song@greatopensource.com-20110118172349-vlqfdghj773e6m0d) (version source revid:anders.song@greatopensource.com-20110118172349-vlqfdghj773e6m0d) (merge vers: 5.1.56) (pib:24)
[19 Jan 2011 2:00]
Bugs System
Pushed into mysql-5.5 5.5.10 (revid:anders.song@greatopensource.com-20110118175758-unu85pyyjklxl6uu) (version source revid:anders.song@greatopensource.com-20110118173810-v1omp2rf8ryjq1ku) (merge vers: 5.5.10) (pib:24)
[24 Jan 2011 3:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129385 3564 anders.song@greatopensource.com 2011-01-24 Postfix BUG#58546 Updated Copyright.
[24 Jan 2011 3:59]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:anders.song@greatopensource.com-20110124035822-to3cj6hrdb5sa9zm) (version source revid:anders.song@greatopensource.com-20110124035822-to3cj6hrdb5sa9zm) (merge vers: 5.6.2) (pib:24)
[24 Jan 2011 4:00]
Bugs System
Pushed into mysql-5.1 5.1.56 (revid:anders.song@greatopensource.com-20110124034854-2i1k6vlf6gs0qdpy) (version source revid:anders.song@greatopensource.com-20110124034854-2i1k6vlf6gs0qdpy) (merge vers: 5.1.56) (pib:24)
[24 Jan 2011 4:00]
Bugs System
Pushed into mysql-5.5 5.5.10 (revid:anders.song@greatopensource.com-20110124035229-y4og7g6v9dtout9x) (version source revid:anders.song@greatopensource.com-20110124035229-y4og7g6v9dtout9x) (merge vers: 5.5.10) (pib:24)
[27 Jan 2011 20:31]
Jon Stephens
Documented as follows in the 5.1.56, 5.5.10, and 5.6.2 changelogs: Formerly STOP SLAVE stopped the slave I/O thread first and then the slave SQL thread; thus, it was possible for the I/O thread stop after replicating only part of a transaction which SQL thread was executing, in wich case--if the transaction could not be rolled back safely--the SQL thread could hang. Now STOP SLAVE stops the slave SQL thread first and then the I/O thread; this guarantees that the I/O thread can fetch any remaining events in the transaction that the SQL thread is executing, so that the SQL thread can finish the transaction if it cannot be rolled back safely. Closed.
[28 Jan 2011 12:21]
Jon Stephens
Already coumented as noted above; returning to Closed state.
[13 Mar 2012 22:21]
Justin Tolmer
The patch submitted does not actually fix the bug. And the documentation update is incorrect. Specifically, this guarantees that the I/O thread can fetch any remaining events in the transaction that the SQL thread is executing is not accurate. Stopping the I/O thread second makes it more likely that the entire transaction will be fetched but it does not guarantee it. handle_slave_io makes no attempt to ensure that it fetches complete transactions. It will gladly stop at an event in the middle of a transaction if it is killed. Thus, there is still a race and the I/O thread may terminate before fetching the entire transaction, and it sometimes does. That is why the change to rpl.rpl_stop_slave which was done in the patch for this bug is causing the test to intermittently fail, as mentioned in bug 61091.