| Bug #58546 | test rpl_packet timeout failure sporadically on PB | ||
|---|---|---|---|
| Submitted: | 29 Nov 2010 4:36 | Modified: | 28 Jan 2011 12:21 |
| Reporter: | Zhenxing He | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
| Version: | 5.1 | OS: | Any |
| Assigned to: | Libing Song | CPU Architecture: | Any |
| Tags: | sporadic, timeout | ||
[1 Dec 2010 3:11]
Zhenxing He
Probably related to BUG#43460
[17 Dec 2010 10:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127165 3525 Li-Bing.Song@sun.com 2010-12-17 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, SQL thread is always stopped first. It guarantees IO thread stops just after the transaction which cannot be rolled back safely and is being replied by SQL thread has been replicated completely.
[22 Dec 2010 15:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127515 3534 Li-Bing.Song@sun.com 2010-12-22 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, SQL thread is always stopped first. It guarantees IO thread stops just after the transaction which cannot be rolled back safely and is being replied by SQL thread has been replicated completely. and added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[24 Dec 2010 3:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127574 3534 Li-Bing.Song@sun.com 2010-12-24 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, SQL thread is always stopped first. It guarantees IO thread stops just after the transaction which cannot be rolled back safely and is being replied by SQL thread has been replicated completely. and added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[24 Dec 2010 4:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/127577 3534 Li-Bing.Song@sun.com 2010-12-24 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[17 Jan 2011 7:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/128890 3555 anders.song@greatopensource.com 2011-01-17 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129080 3559 Li-Bing.Song@sun.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129084 3560 anders.song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129085 3559 Anders.Song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129086 3560 anders.song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[18 Jan 2011 12:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129089 3560 anders.song@greatopensource.com 2011-01-18 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[19 Jan 2011 1:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129116 3560 Libing Song 2011-01-19 Bug#58546 test rpl_packet timeout failure sporadically on PB rpl_packet got a timeout failure sporadically on PB when stopping slave. The real reason of this bug is that STOP SLAVE stopped IO thread first and then stopped SQL thread. It was possible that IO thread stopped after replicating part of a transaction which SQL thread was executing. SQL thread would be hung if the transaction could not be rolled back safely. After this patch, STOP SLAVE will stop SQL thread first and then stop IO thread, which guarantees that IO thread will fetch the reset of the events of the transaction that SQL thread is executing, so that SQL thread can finish the transaction if it cannot be rolled back safely. Added below auxiliary files to make the test code neater. restart_slave_sql.inc rpl_connection_master.inc rpl_connection_slave.inc rpl_connection_slave1.inc
[19 Jan 2011 1:54]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:anders.song@greatopensource.com-20110118175419-0b0m3j5iv5gsfqpn) (version source revid:anders.song@greatopensource.com-20110118175419-0b0m3j5iv5gsfqpn) (merge vers: 5.6.2) (pib:24)
[19 Jan 2011 2:00]
Bugs System
Pushed into mysql-5.1 5.1.56 (revid:anders.song@greatopensource.com-20110118172349-vlqfdghj773e6m0d) (version source revid:anders.song@greatopensource.com-20110118172349-vlqfdghj773e6m0d) (merge vers: 5.1.56) (pib:24)
[19 Jan 2011 2:00]
Bugs System
Pushed into mysql-5.5 5.5.10 (revid:anders.song@greatopensource.com-20110118175758-unu85pyyjklxl6uu) (version source revid:anders.song@greatopensource.com-20110118173810-v1omp2rf8ryjq1ku) (merge vers: 5.5.10) (pib:24)
[24 Jan 2011 3:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129385 3564 anders.song@greatopensource.com 2011-01-24 Postfix BUG#58546 Updated Copyright.
[24 Jan 2011 3:59]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:anders.song@greatopensource.com-20110124035822-to3cj6hrdb5sa9zm) (version source revid:anders.song@greatopensource.com-20110124035822-to3cj6hrdb5sa9zm) (merge vers: 5.6.2) (pib:24)
[24 Jan 2011 4:00]
Bugs System
Pushed into mysql-5.1 5.1.56 (revid:anders.song@greatopensource.com-20110124034854-2i1k6vlf6gs0qdpy) (version source revid:anders.song@greatopensource.com-20110124034854-2i1k6vlf6gs0qdpy) (merge vers: 5.1.56) (pib:24)
[24 Jan 2011 4:00]
Bugs System
Pushed into mysql-5.5 5.5.10 (revid:anders.song@greatopensource.com-20110124035229-y4og7g6v9dtout9x) (version source revid:anders.song@greatopensource.com-20110124035229-y4og7g6v9dtout9x) (merge vers: 5.5.10) (pib:24)
[27 Jan 2011 20:31]
Jon Stephens
Documented as follows in the 5.1.56, 5.5.10, and 5.6.2 changelogs:
Formerly STOP SLAVE stopped the slave I/O thread first and then
the slave SQL thread; thus, it was possible for the I/O thread
stop after replicating only part of a transaction which SQL
thread was executing, in wich case--if the transaction
could not be rolled back safely--the SQL thread could hang.
Now STOP SLAVE stops the slave SQL thread first and then the
I/O thread; this guarantees that the I/O thread can fetch any
remaining events in the transaction that the SQL thread is
executing, so that the SQL thread can finish the transaction if
it cannot be rolled back safely.
Closed.
[28 Jan 2011 12:21]
Jon Stephens
Already coumented as noted above; returning to Closed state.
[13 Mar 2012 22:21]
Justin Tolmer
The patch submitted does not actually fix the bug. And the documentation update is incorrect. Specifically,
this guarantees that the I/O thread can fetch any
remaining events in the transaction that the SQL thread is
executing
is not accurate. Stopping the I/O thread second makes it more likely that the entire transaction will be fetched but it does not guarantee it. handle_slave_io makes no attempt to ensure that it fetches complete transactions. It will gladly stop at an event in the middle of a transaction if it is killed. Thus, there is still a race and the I/O thread may terminate before fetching the entire transaction, and it sometimes does. That is why the change to rpl.rpl_stop_slave which was done in the patch for this bug is causing the test to intermittently fail, as mentioned in bug 61091.

Description: rpl_packet.test fails sporadically due to timeout on PB with the following backtrace: Program terminated with signal 6, Aborted. [New process 19147] [New process 19168] [New process 19165] [New process 19150] #0 0x00007fd221664c36 in pthread_kill () from /lib/libpthread.so.0 #0 0x00007fd221664c36 in pthread_kill () from /lib/libpthread.so.0 #1 0x00000000006300d7 in handle_segfault (sig=6) at mysqld.cc:2607 #2 <signal handler called> #3 0x00007fd220af7ce2 in select () from /lib/libc.so.6 #4 0x000000000062dfcd in handle_connections_sockets ( arg=<value optimized out>) at mysqld.cc:5098 #5 0x00000000006329dc in main (argc=<value optimized out>, argv=<value optimized out>) at mysqld.cc:4582 Thread 4 (process 19150): #0 0x00007fd221667797 in do_sigwait () from /lib/libpthread.so.0 #1 0x00007fd22166783d in sigwait () from /lib/libpthread.so.0 #2 0x000000000062fb63 in signal_hand (arg=<value optimized out>) at mysqld.cc:2801 #3 0x00007fd22165ffc7 in start_thread () from /lib/libpthread.so.0 #4 0x00007fd220afe5ad in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 3 (process 19165): #0 0x00007fd221663fad in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00000000007490a4 in terminate_slave_thread (thd=0x7fd21c000970, term_lock=0x1d45bb8, term_cond=0x1d45c10, slave_running=0x1d46344, skip_lock=146) at slave.cc:535 #2 0x00000000007491ac in terminate_slave_threads (mi=0x1d434a0, thread_mask=2, skip_lock=true) at slave.cc:426 #3 0x000000000075000b in stop_slave (thd=0x1d4eaa0, mi=0x1d434a0, net_report=true) at sql_repl.cc:968 #4 0x000000000063d318 in mysql_execute_command (thd=0x1d4eaa0) at sql_parse.cc:2888 #5 0x000000000063fe6b in mysql_parse (thd=0x1d4eaa0, rawbuf=0x1d57d80 "STOP SLAVE", length=10, found_semicolon=0x41d6a070) at sql_parse.cc:6068 #6 0x0000000000640f92 in dispatch_command (command=COM_QUERY, thd=0x1d4eaa0, packet=0x1d24581 "STOP SLAVE", packet_length=<value optimized out>) at sql_parse.cc:1261 #7 0x0000000000642176 in do_command (thd=0x1d4eaa0) at sql_parse.cc:889 #8 0x0000000000635f26 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1149 #9 0x00007fd22165ffc7 in start_thread () from /lib/libpthread.so.0 #10 0x00007fd220afe5ad in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () Thread 2 (process 19168): #0 0x00007fd221663d29 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00000000006bbfe7 in MYSQL_BIN_LOG::wait_for_update ( this=<value optimized out>, thd=0x7fd21c000970, is_slave=143) at log.cc:4916 #2 0x000000000074b482 in handle_slave_sql (arg=<value optimized out>) at slave.cc:4279 #3 0x00007fd22165ffc7 in start_thread () from /lib/libpthread.so.0 #4 0x00007fd220afe5ad in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 1 (process 19147): #0 0x00007fd221664c36 in pthread_kill () from /lib/libpthread.so.0 #1 0x00000000006300d7 in handle_segfault (sig=6) at mysqld.cc:2607 #2 <signal handler called> #3 0x00007fd220af7ce2 in select () from /lib/libc.so.6 #4 0x000000000062dfcd in handle_connections_sockets ( arg=<value optimized out>) at mysqld.cc:5098 #5 0x00000000006329dc in main (argc=<value optimized out>, argv=<value optimized out>) at mysqld.cc:4582 How to repeat: Check: http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&search=yes&push_id=17805... And also: http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_output_id=daeea558b...