Bug #13090 rbr slave crashes at end of rpl_insert_id, rpl_insert_ignore
Submitted: 9 Sep 2005 14:06 Modified: 13 Sep 2005 19:29
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0-wl1012 OS:Linux (linux)
Assigned to: Guilhem Bichot CPU Architecture:Any

[9 Sep 2005 14:06] Guilhem Bichot
./mysql-test-run --mysqld=--binlog-format=row rpl_insert_id
The test passes but the slave crashes when stopping its SQL slave thread (at shutdown),
assertion failure log.cc line 82 (binlog_close_connection, trans_log is not empty).
If you do a STOP SLAVE after the first INSERT into InnoDB, makes failure more obvious in the testsuite.

How to repeat:
see above
[11 Sep 2005 20:31] Guilhem Bichot
simpler testcase:
-- source include/have_innodb.inc
-- source include/master-slave.inc

create table t1(a int auto_increment, key(a)) engine=innodb;
insert into t1 values (10);
stop slave;
Assigning it to me as I've done some debugging already.
[11 Sep 2005 21:58] Lars Thalmann
Also applies to rpl_row_basic_3innodb:

#0  0x0000003fea7096a7 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000007507fc in write_core (sig=6) at stacktrace.c:220
#2  0x00000000005cbc7e in handle_segfault (sig=6) at mysqld.cc:2063
#3  <signal handler called>
#4  0x0000003fe9e2f3b0 in raise () from /lib64/libc.so.6
#5  0x0000003fe9e30860 in abort () from /lib64/libc.so.6
#6  0x0000003fe9e283eb in __assert_fail () from /lib64/libc.so.6
#7  0x000000000066c580 in binlog_close_connection (thd=0x17dcd58) at log.cc:82
#8  0x00000000006c1f28 in ha_close_connection (thd=0x17dcd58) at handler.cc:510
#9  0x00000000005b5a9c in ~THD (this=0x17dcd58) at sql_class.cc:432
#10 0x0000000000741407 in handle_slave_sql (arg=0x179c9a0) at slave.cc:3566
#11 0x0000003fea70697c in start_thread () from /lib64/libpthread.so.0
#12 0x0000003fe9ec9c2e in clone () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()
[11 Sep 2005 22:00] Guilhem Bichot
In Rows_log_event::exec_event, TRANS_END_F is set, ha_commit() is called, calls ha_commit_trans with all==true, but thd->transaction.all is empty (has nht==0 so ha_commit_trans() does nothing).
Passing all=0 instead fixes the problem (bec then thd->transaction.stmt is used, which has innodb and binlog as registered handlertons, so nht==2 and binlog gets flushed to disk).
ha_commit() does nothing, then close_thread_tables() is called which unlocks tables thus thd->lock goes to 0 and so "if (thd->lock)" is not entered, which prevents ha_autocommit_or_rollback() from running.
If we do a more abstract reasoning: in the test, Write_rows_log_event::exec_event() inserts rows in an autocommit way (it never starts a transaction). So ha_commit() is not the proper function to call, ha_autocommit_or_rollback() or ha_commit_stmt are.
IF the master has wrapped his INSERT in a BEGIN/COMMIT, then BEGIN goes into binlog, so slave starts a transaction and Write_rows_log_event::exec_event() runs in a transaction,
then ha_commit() does the job (and then the test does not crash the slave). But ha_commit() would have been called anyway by Xid_log_event::exec_event().
Proposed fix of Write_rows_log_event::exec_event():
- don't call ha_commit() (bec if it's a transaction, Xid_log_event::exec_event() will)
- if TRANS_END_F, call ha_autocommit_or_rollback() (if it's not a transaction, will flush binlog and commit the stmt; if it's a transactio, will do nothing).
pretty much like mysql_insert() does.
Will work on that.
[13 Sep 2005 17:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

[13 Sep 2005 19:29] Guilhem Bichot
Fixed. Nothing to document (private tree).