| Bug #45852 | Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook | ||
|---|---|---|---|
| Submitted: | 30 Jun 2009 11:48 | Modified: | 12 Nov 2009 14:27 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
| Version: | 5.4 | OS: | Any |
| Assigned to: | Zhenxing He | CPU Architecture: | Any |
| Tags: | experimental, pb2, test failure | ||
[30 Jun 2009 13:30]
Zhenxing He
The error is on slave, not on master, maybe the error message 'Set 'rpl_semi_sync_master_reply_log_file_pos' on master failed' is misleading, better say 'Send semi-sync reply to master failed'. This error is mostly caused by connection problem. Fatal here means that the slave IO thread has to stop because of the error, but it's possible to restart slave IO thread.
[30 Jun 2009 13:51]
Philip Stoev
The test uses only STOP SLAVE IO_THREAD and it does not attempt to disrupt the connection between the master and the slave in any other way. Can we have extra debugging information in case this error happens so that this can be debugged further?
[5 Sep 2009 8:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82521 2802 He Zhenxing 2009-09-05 Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook Errors when send reply to master should never cause the IO thread to stop, because master can fall back to async replication if it does not get reply from slave. The problem is fixed by deliberately ignore the return value of slaveReply. @ plugin/semisync/semisync_slave_plugin.cc Deliberately ignore the return value of slaveReply so that errors while sending slave reply will not cause the IO thread to stop.
[8 Sep 2009 4:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82622 2802 He Zhenxing 2009-09-08 Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook Errors when send reply to master should never cause the IO thread to stop, because master can fall back to async replication if it does not get reply from slave. The problem is fixed by deliberately ignore the return value of slaveReply. @ plugin/semisync/semisync_slave.cc Change the error message when send reply failed @ plugin/semisync/semisync_slave_plugin.cc Deliberately ignore the return value of slaveReply so that errors while sending slave reply will not cause the IO thread to stop.
[8 Sep 2009 6:15]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82625 2802 He Zhenxing 2009-09-08 Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook Errors when send reply to master should never cause the IO thread to stop, because master can fall back to async replication if it does not get reply from slave. The problem is fixed by deliberately ignoring the return value of slaveReply. @ plugin/semisync/semisync_slave.cc Change the error message when send reply failed @ plugin/semisync/semisync_slave_plugin.cc Deliberately ignore the return value of slaveReply so that errors while sending slave reply will not cause the IO thread to stop.
[26 Sep 2009 10:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/84715 2837 He Zhenxing 2009-09-26 [merge] Merge BUG#45852 to 6.0-codebase-bugfixing
[30 Sep 2009 3:58]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85120 3627 He Zhenxing 2009-09-30 Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook Errors when send reply to master should never cause the IO thread to stop, because master can fall back to async replication if it does not get reply from slave. The problem is fixed by deliberately ignoring the return value of slaveReply. @ plugin/semisync/semisync_slave_plugin.cc Deliberately ignore the return value of slaveReply so that errors while sending slave reply will not cause the IO thread to stop.
[9 Oct 2009 8:47]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@ibmvm-20091009083208-0o0f0i9w1sq3c1kn) (version source revid:zhenxing.he@sun.com-20090930035829-z5k0sbc0s6pjx30e) (merge vers: 6.0.14-alpha) (pib:12)
[9 Oct 2009 10:43]
Jon Stephens
Documented bugfix in the 6.0.14 changelog as follows:
When using semi-synchronous replication, an error raised on the
slave while sending a response to the master caused the slave IO
thread to stop.
Set status to NDI, waiting for push to 5.4.
[12 Oct 2009 12:57]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86574 3115 He Zhenxing 2009-10-12 Backport Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook Errors when send reply to master should never cause the IO thread to stop, because master can fall back to async replication if it does not get reply from slave. The problem is fixed by deliberately ignoring the return value of slaveReply.
[27 Oct 2009 9:49]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers: 6.0.14-alpha) (pib:13)
[28 Oct 2009 6:37]
Jon Stephens
Already documented for 6.0.14; re-closing.
[12 Nov 2009 8:18]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 14:27]
Jon Stephens
Also documented in the 5.5.0 changelog; closed.
[18 Dec 2009 15:41]
Paul DuBois
Removed 5.5.0 changelog entry. In 5.5, semisync replication first appears in 5.5.0, so this bug affects no 5.5.x releases.

Description: When executing a replication workload with semisynch replication turned off and periodic STOP SLAVE IO_THREAD thread, START SLAVE IO_THREAD (in order to simulate network failures), the slave stopped with the following information in SHOW SLAVE STATUS: mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: 127.0.0.1 Master_User: root Master_Port: 19306 Connect_Retry: 1 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1900343 Relay_Log_File: slave-relay-bin.000003 Relay_Log_Pos: 1283333 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1900343 Relay_Log_Space: 1900793 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 1593 Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 1 row in set (0.00 sec) The slave error log contains: 090630 13:58:12 [ERROR] Set 'rpl_semi_sync_master_reply_log_file_pos' on master failed 090630 13:58:12 [ERROR] Slave I/O: Fatal error: Failed to run 'after_queue_event' hook, Error_code: 1593 090630 13:58:12 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 1900343 which implies that the master is at fault, however the master error log does not contain any information about what happened. Also, the error is reported as "Fatal", but START SLAVE IO_THREAD is able to reestablish replication and Rpl_semi_sync_master_status converts back to ON, so how "Fatal" the error was remains an open question. How to repeat: With the Random Query Generator from the mysql-test-extra-6.0 tree, mysql-test/gentest, run: $ perl runall.pl \ --basedir=/build/bzr/azalea-bugfixing \ --grammar=conf/replication_simple.yy \ --rpl_mode=statement \ --mem \ --mysqld=--plugin-dir=/build/bzr/azalea-bugfixing/plugin/semisync/.libs \ --mysqld=--plugin-load=rpl_semi_sync_master=libsemisync_master.so:rpl_semi_sync_slave=libsemisync_slave.so \ --mysqld=--rpl_semi_sync_master_enabled=1 \ --mysqld=--rpl_semi_sync_slave_enabled=1 \ --duration=1800 \ --queries=100000000 \ --threads=10 \ --validator=ReplicationSlaveStatus \ --reporter=ReplicationSemiSync The ReplicationSemiSync reporter causes the slave to stop for random periods of time and then attempt to reestablish semisynchronous replication. The ReplicationSlaveStatus monitors SHOW SLAVE STATUS and will terminate the test as soon as the error occurs, usually within a minute (a few START|STOP SLAVE IO THREAD cycles). Suggested fix: Semisynchronous replication should survive IO thread issues without throwing error messages. The actual cause of each failure must be present if not in SHOW SLAVE STATUS, then at least in the error log. If the slave error log talks about a failure on the master, the error log on the master needs to contain an entry and debug information for that failure.