Bug #22725 | Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events | ||
---|---|---|---|
Submitted: | 27 Sep 2006 6:57 | Modified: | 6 Jun 2007 15:11 |
Reporter: | James Day | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 4.0,5.0 | OS: | Linux (linux) |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
Tags: | bfsm_2006_12_21 |
[27 Sep 2006 6:57]
James Day
[6 Dec 2006 10:27]
Oli Sennhauser
We had similar problems on 5.0.26. Can anybody let me know if this bug is still in 5.0 tree? Error 1053 was propagated in the binary.log and binary log seems somehow corrupted... Then slave stopped. But unfortunately we had the crash 1 h AFTER.
[14 Dec 2006 16:29]
Jonathan Miller
I will talk to Omer about adding a system level test for this bug. /jeb
[31 Jan 2007 15:12]
Guilhem Bichot
suggested fix: ===== sql/log_event.cc 1.262 vs edited ===== *** /tmp/bk_log_event.cc-1.262_ODJ3N1 2007-01-31 16:02:11 +01:00 --- edited/sql/log_event.cc 2007-01-31 16:02:00 +01:00 *************** *** 1456,1463 **** data_buf(0), query(query_arg), catalog(thd_arg->catalog), db(thd_arg->db), q_len((uint32) query_length), error_code((thd_arg->killed != THD::NOT_KILLED) ? ! ((thd_arg->system_thread & SYSTEM_THREAD_DELAYED_INSERT) ? ! 0 : thd->killed_errno()) : thd_arg->net.last_errno), thread_id(thd_arg->thread_id), /* save the original thread id; we already know the server id */ slave_proxy_id(thd_arg->variables.pseudo_thread_id), --- 1456,1464 ---- data_buf(0), query(query_arg), catalog(thd_arg->catalog), db(thd_arg->db), q_len((uint32) query_length), error_code((thd_arg->killed != THD::NOT_KILLED) ? ! (((thd_arg->system_thread & SYSTEM_THREAD_DELAYED_INSERT) || ! using_trans) ? 0 : thd->killed_errno()) : ! thd_arg->net.last_errno), thread_id(thd_arg->thread_id), /* save the original thread id; we already know the server id */ slave_proxy_id(thd_arg->variables.pseudo_thread_id), I.e. if using_trans is true (using_trans tells if updating a transactional table), we don't record the "killed" information inside the event (because, if we come to building the event, the statement will not interrupt itself, it will reach the handler's commit). But note, it's a known BUG#23333 that callers pass an argument "using_trans" of TRUE if the main table is transactional, but not taking into account the tables updated via side-effects (triggers, stored functions called by the statement).
[31 Jan 2007 15:20]
Guilhem Bichot
Also, please consider extending this bug to other errors than shutdown. A network error could also happen, storing ER_NET_READ_ERROR (and others, see check_expected_error() in slave.cc) in the event, causing the same problem. Then the patch has to be slightly changed to become: ===== sql/log_event.cc 1.262 vs edited ===== *** /tmp/bk_log_event.cc-1.262_6BFkbt 2007-01-31 16:18:01 +01:00 --- edited/sql/log_event.cc 2007-01-31 16:17:58 +01:00 *************** *** 1455,1463 **** using_trans), data_buf(0), query(query_arg), catalog(thd_arg->catalog), db(thd_arg->db), q_len((uint32) query_length), ! error_code((thd_arg->killed != THD::NOT_KILLED) ? ! ((thd_arg->system_thread & SYSTEM_THREAD_DELAYED_INSERT) ? ! 0 : thd->killed_errno()) : thd_arg->net.last_errno), thread_id(thd_arg->thread_id), /* save the original thread id; we already know the server id */ slave_proxy_id(thd_arg->variables.pseudo_thread_id), --- 1455,1464 ---- using_trans), data_buf(0), query(query_arg), catalog(thd_arg->catalog), db(thd_arg->db), q_len((uint32) query_length), ! error_code(using_trans ? 0 : ((thd_arg->killed != THD::NOT_KILLED) ? ! ((thd_arg->system_thread & SYSTEM_THREAD_DELAYED_INSERT) ! ? 0 : thd->killed_errno()) : ! thd_arg->net.last_errno)), thread_id(thd_arg->thread_id), /* save the original thread id; we already know the server id */ slave_proxy_id(thd_arg->variables.pseudo_thread_id), Alternatives to testing "using_trans" are testing thd->no_trans_update and OPTION_STATUS_NO_TRANS_UPDATE (find out what is best, I don't know).
[21 Feb 2007 14:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/20266
[23 Mar 2007 8:58]
MySQL Verification Team
will this be fixed in 5.0 ?
[24 Mar 2007 9:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22862 ChangeSet@1.2490, 2007-03-24 11:17:11+02:00, aelkin@andrepl.(none) +3 -0 Bug #22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. The solution treats queries on ta and non-ta tables differently. For ta-table the query rolls back if got killed because there is no guarantee that a fuction the query might invoke performed its work completely; otherwise, partial result can not be repeated on slave. Non-ta-table query is binlogged though it's rather optimistical decision. A problem remains that partially completed results of stored routines on master can not be reproduced on slave.
[29 Mar 2007 19:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/23363 ChangeSet@1.2490, 2007-03-29 22:22:20+03:00, aelkin@dsl-hkibras1-ff1dc300-249.dhcp.inet.fi +3 -0 Bug #22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. The solution treats queries on ta and non-ta tables differently. For ta-table the query rolls back if got killed because there is no guarantee that a fuction the query might invoke performed its work completely; otherwise, partial result can not be repeated on slave. Non-ta-table query is binlogged without the KILLED error if killing happened but the INSERT did not exectute any stored routine, i.e it practically completed its work. For the INSERT that called a stored routine binlogging is pessimistical since the query might be performed partially and it's safer to stop on slave.
[3 Apr 2007 7:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/23623 ChangeSet@1.2490, 2007-04-03 10:11:38+03:00, aelkin@dsl-hkibras1-ff1dc300-249.dhcp.inet.fi +3 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on ta-table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing. The offered solution introduced the following rules for binlogging of INSERT that accounts its specifics. For ta-table the query rolls back if got killed and `error' was set to non-zero. The only raised flag without the error was set is harmless even though insert invoked a stored routine. For not-ta-table the combination forces to binlog the query with KILLED error to indicate that there was potentially partial execution on master and consistency is under the question. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was caught raised in the routine execution.
[4 Apr 2007 11:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/23775 ChangeSet@1.2490, 2007-04-04 14:41:22+03:00, aelkin@dsl-hkibras1-ff1dc300-249.dhcp.inet.fi +6 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on ta-table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing. The offered solution introduced the following rules for binlogging of INSERT that accounts its specifics. For ta-table the query rolls back if got killed and `error' was set to non-zero. The only raised flag without the error was set is harmless even though insert invoked a stored routine. For not-ta-table the combination forces to binlog the query with KILLED error to indicate that there was potentially partial execution on master and consistency is under the question. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was caught raised in the routine execution. The patch adds an arg with error_code-unset-denoting default value to Query_log_event::Query_log_event.
[6 Apr 2007 18:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/24002 ChangeSet@1.2490, 2007-04-06 21:20:29+03:00, aelkin@dsl-hkibras1-ff1dc300-249.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on ta-table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing. The offered solution introduced the following rules for binlogging of INSERT that accounts its specifics. For ta-table the query rolls back if got killed and `error' was set to non-zero. The only raised flag without the error was set is harmless even though insert invoked a stored routine. For not-ta-table the combination forces to binlog the query with KILLED error to indicate that there was potentially partial execution on master and consistency is under the question. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was caught raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[22 Apr 2007 14:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/25073 ChangeSet@1.2490, 2007-04-22 17:51:55+03:00, aelkin@dsl-hkibras1-ff1dc300-249.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on ta-table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing. The offered solution introduced the following rules for binlogging of INSERT that accounts its specifics. For ta-table the query rolls back only if the error was set to non-zero regardless on the value of the killed flag. The only raised flag without the error was set is harmless even though insert invoked a stored routine. For not-ta-table the combination forces to binlog the query with KILLED error to indicate that there was potentially partial execution on master and consistency is under the question. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was caught raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[7 May 2007 19:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/26244 ChangeSet@1.2490, 2007-05-07 21:26:30+03:00, aelkin@dsl-hkibras-fe31f900-164.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on ta-table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing. The offered solution introduced the following rules for binlogging of INSERT that accounts its specifics. For ta-table the query rolls back only if the error was set to non-zero regardless on the value of the killed flag. The only raised flag without the error was set is harmless even though insert invoked a stored routine. For not-ta-table the combination forces to binlog the query with KILLED error to indicate that there was potentially partial execution on master and consistency is under the question. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[10 May 2007 19:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/26468 ChangeSet@1.2490, 2007-05-10 22:49:52+03:00, aelkin@dsl-hkibras-fe31f900-164.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on transactional table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing. The offered solution added the following rule for binlogging of INSERT that accounts the above specifics: For INSERT on transactional-table if the error was not set the only raised flag is harmless and is ignored via masking out on time of creation of binlog event. For both table types the combination of raised error and KILLED flag indicates that there was potentially partial execution on master and consistency is under the question. In that case the code continues to binlog an event with an appropriate killed error. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[17 May 2007 16:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/26917 ChangeSet@1.2490, 2007-05-17 19:56:37+03:00, aelkin@dsl-hkibras-fe31f900-164.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on transactional table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing (and then there must be an error returned out of the loop). The offered solution added the following rule for binlogging of INSERT that accounts the above specifics: For INSERT on transactional-table if the error was not set the only raised flag is harmless and is ignored via masking out on time of creation of binlog event. For both table types the combination of raised error and KILLED flag indicates that there was potentially partial execution on master and consistency is under the question. In that case the code continues to binlog an event with an appropriate killed error. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[23 May 2007 10:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27195 ChangeSet@1.2490, 2007-05-23 13:10:26+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on transactional table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing (and then there must be an error returned out of the loop). The offered solution added the following rule for binlogging of INSERT that accounts the above specifics: For INSERT on transactional-table if the error was not set the only raised flag is harmless and is ignored via masking out on time of creation of binlog event. For both table types the combination of raised error and KILLED flag indicates that there was potentially partial execution on master and consistency is under the question. In that case the code continues to binlog an event with an appropriate killed error. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[25 May 2007 13:32]
Guilhem Bichot
approved provided a minor change, asked by email, is made.
[28 May 2007 11:47]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27454 ChangeSet@1.2490, 2007-05-28 14:47:12+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on transactional table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing (and then there must be an error returned out of the loop). The offered solution added the following rule for binlogging of INSERT that accounts the above specifics: For INSERT on transactional-table if the error was not set the only raised flag is harmless and is ignored via masking out on time of creation of binlog event. For both table types the combination of raised error and KILLED flag indicates that there was potentially partial execution on master and consistency is under the question. In that case the code continues to binlog an event with an appropriate killed error. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[28 May 2007 19:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27507 ChangeSet@1.2503, 2007-05-28 22:20:22+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +7 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events The reason for the bug was that replaying of a query on slave could not be possible since its event was recorded with the killed error. Due to the specific of handling INSERT, which per-row-while-loop is unbreakable to killing, the query on transactional table should have not appeared in binlog unless there was a call to a stored routine that got interrupted with killing (and then there must be an error returned out of the loop). The offered solution added the following rule for binlogging of INSERT that accounts the above specifics: For INSERT on transactional-table if the error was not set the only raised flag is harmless and is ignored via masking out on time of creation of binlog event. For both table types the combination of raised error and KILLED flag indicates that there was potentially partial execution on master and consistency is under the question. In that case the code continues to binlog an event with an appropriate killed error. The fix relies on the specified behaviour of stored routine that must propagate the error to the top level query handling if the thd->killed flag was raised in the routine execution. The patch adds an arg with the default killed-status-unset value to Query_log_event::Query_log_event.
[29 May 2007 10:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27549 ChangeSet@1.2504, 2007-05-29 13:12:04+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +2 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events Refining the tests since pb revealed the older version's fragality - the error from SF() due to killed may be different on different env:s.
[29 May 2007 13:29]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27571 ChangeSet@1.2504, 2007-05-29 16:27:55+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +3 -0 Bug#22725 Replication outages from ER_SERVER_SHUTDOWN (1053) set in replication events Refining the tests since pb revealed the older version's fragality - the error from SF() due to killed may be different on different env:s. DBUG_ASSERT instead of assert.
[30 May 2007 7:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27645 ChangeSet@1.2509, 2007-05-30 10:56:18+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +2 -0 bug#22725 test comments correction
[30 May 2007 8:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27648 ChangeSet@1.2521, 2007-05-30 11:18:55+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +1 -0 bug#22725 merge 5.0 with 5.1
[30 May 2007 13:15]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27681 ChangeSet@1.2527, 2007-05-30 16:14:55+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +1 -0 bug#22725 the test is not supposed for row format. the include-guard is set.
[30 May 2007 19:29]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27724 ChangeSet@1.2510, 2007-05-30 22:29:15+03:00, aelkin@dsl-hkibras1-ff5dc300-70.dhcp.inet.fi +1 -0 bug#22725 refining the test because of Bug #28786 'reset master' does not reset binlogging on embeded server
[1 Jun 2007 19:21]
Bugs System
Pushed into 5.0.44
[1 Jun 2007 19:25]
Bugs System
Pushed into 5.1.20-beta
[6 Jun 2007 15:11]
MC Brown
A note has been added to the 5.0.44 and 5.1.20 changelog.