Bug #6522 Replication fails due to a rolled back transaction in the binlog
Submitted: 9 Nov 2004 14:00 Modified: 10 Nov 2004 14:15
Reporter: Tony Little Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.0.18 OS:Linux (linux)
Assigned to: Guilhem Bichot CPU Architecture:Any

[9 Nov 2004 14:00] Tony Little
On the master the BEGIN statement fails and is written to the binlog with an error.  All subsequent queries that are a part of the transaction are then written to the binlog with no errors and the ROLLBACK is written to the binlog with an error.  Both the BEGIN and ROLLBACK statements are written with an error but all in between are written to the binlog with no error.  The slaves then all fail with an error message like:

query partially completed on the master (error on master: 1158) and was aborted. There is a chance that your master is inconsistent at this point. If you are sure that your master is ok, run this query manually on the slave and then restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; . Query: 'BEGIN'

The problem is if a transaction fails shouldn't the entire seqence be left out of the binlog.  This has happened twice to our server and it causes all replication servers to fail as a result.  The only way to recover is to start the slaves back up after changing their master log position using CHANGE MASTER TO to the query following the ROLLBACK that failed.

None if the queries between the bad BEGIN and ROLLBACK are added to the master database and it continues to function as normal.

How to repeat:
I am not sure of the exact sequence that causes this to happen.  Obviously the BEGIN is failing on the master.  I don't know the reason.

Suggested fix:
Don't write any part of a failed transaction to the binlog.
[9 Nov 2004 15:12] Guilhem Bichot
If BEGIN/ROLLBACK got into the binlog, it's likely that one table updated by the transaction was _not_ a transactional table. You can check this by typing SHOW TABLE STATUS LIKE 'each_table_of_the_transaction'; what do you get?
Error 1158 means that there was interruption of the client/server communication.
So, what probably happened is that the connection started a transaction, updated some non-transactional table (like MyISAM, HEAP, MERGE...), then communication got interrupted so the MySQL database server had to rollback. But as a non-transactional table was updated, not writing anything to the binlog would not reflect those updates; those are better reflected by logging BEGIN; transaction; ROLLBACK;
Now you are right, there is no reason 1158 found its way into binlog; as server got the error after executing the updates, logged error should be 0 (no error). I'm fixing this now.
[9 Nov 2004 15:38] Tony Little
I double checked the entire transaction and all tables used in the queries are INNODB.  No matter what, setting the error to 0 for the BEGIN and ROLLBACK in the binlog would take care of the issue.
[10 Nov 2004 14:15] Guilhem Bichot
Ah, tables are InnoDB... I'd be interested in seeing the output of mysqlbinlog (one BEGIN/ROLLBACK block), and in knowing if any involved table was created with the TEMPORARY word (CREATE TEMPORARY TABLE).
I just changed things so that an error code of 0 is logged for BEGIN/ROLLBACK/COMMIT, in the next 4.0.
Thank you for your bug report.
ChangeSet@1.2070.1.1, 2004-11-10 15:07:55+01:00, guilhem@mysql.com