Bug #59622 | Data loss after crash at point crash_commit_after_prepare for binlog crash-safe | ||
---|---|---|---|
Submitted: | 19 Jan 2011 22:29 | Modified: | 10 Feb 2011 8:03 |
Reporter: | Serge Kozlov | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 5.5, trunk | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | crash, crash_commit_after_prepare, innodb, nuts, replication |
[19 Jan 2011 22:29]
Serge Kozlov
[21 Jan 2011 4:56]
Daogang Qu
The test case1 in rpl_crash_safe_master_innodb.test is addressing the problem. But it's running fine and the data is consistent on master and slave after master crash at crash_commit_after_prepare point. Please don't forget suppress the following warnings and errors: call mtr.add_suppression("Attempting backtrace"); call mtr.add_suppression("allocated tablespace *., old maximum was 0"); call mtr.add_suppression("Error in Log_event::read_log_event()"); call mtr.add_suppression("Buffered warning: Performance schema disabled"); CREATE TABLE t1(a LONGBLOB) ENGINE=INNODB; -- echo # Test case1: Set DEBUG POINT before binlog to make -- echo # the master crash for transaction BEGIN; let $rows= 3; WHILE($rows) { INSERT INTO t1 (a) VALUES (REPEAT('a',2)); dec $rows; } # Write file to make mysql-test-run.pl expect crash and restart -- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect SET SESSION debug="d,crash_commit_after_prepare"; # Run the crashing query -- error 2013 COMMIT; -- source include/wait_until_disconnected.inc -- enable_reconnect -- echo # Restart the master server -- exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect -- source include/wait_until_connected_again.inc -- disable_reconnect -- echo # Test the transaction statements will not be binlogged -- source include/show_binlog_events.inc -- echo # On master, test the data will be rolled back after restart. SELECT COUNT(*) FROM t1; sync_slave_with_master; -- echo # On slave, test replication will work fine, and the data -- echo # is not replicated SELECT COUNT(*) FROM t1; Result: ...... # Restart the master server # Test the transaction statements will not be binlogged show binlog events in 'master-bin.000001' from <binlog_start>; Log_name Pos Event_type Server_id End_log_pos Info master-bin.000001 # Query # # use `test`; CREATE TABLE t1(a LONGBLOB) ENGINE=INNODB # On master, test the data will be rolled back after restart. SELECT COUNT(*) FROM t1; COUNT(*) 0 # On slave, test replication will work fine, and the data # is not replicated SELECT COUNT(*) FROM t1; COUNT(*) 0 DROP TABLE t1; What is the difference with yours? I will create a test case base on the difference to double check the problem.
[21 Jan 2011 10:02]
Serge Kozlov
You use simple table structure for MTR test case. Nuts test generates random statements in transaction. It could be a reason why results are different for MTR/Nuts
[24 Jan 2011 7:48]
Daogang Qu
I reproduced the problem base on the newest version of nuts as following: ok - synchronize slave 2 with master 1 # Dump file 1: /home/daogang/bzrwork/pushwl5493/nuts/var/tmp/rep_func::crash_safe_master-row-crashpoint1/1_test.t1.sql # Dump file 2: /home/daogang/bzrwork/pushwl5493/nuts/var/tmp/rep_func::crash_safe_master-row-crashpoint1/2_test.t1.sql not ok - Query 'SELECT COUNT(*) FROM t1' returns different number of rows: 3, 4 # Failed test 'Query 'SELECT COUNT(*) FROM t1' returns different number of rows: 3, 4' # at /home/daogang/bzrwork/pushwl5493/nuts/lib/My/Nuts/Library/Kernel/ServerResult.pm line 156. # Query 'SELECT COUNT(*) FROM t1' returns different number of rows: 3, 4 Failed 1/14 subtests Please check the log: var/log/rep_func\:\:crash_safe_master-row-crashpoint1/rep_func\:\:crash_safe_master-row-crashpoint1-report.log I found out the following executed queries: QUERY 1 "INSERT INTO t1 VALUES (0, 0, 0, 16849, 5025236370650103808, 0.22E-22, '6695-06-17 14:46:03', 0)" GET_CONNECTION 1 root QUERY 1 "DELETE FROM t1 WHERE c_05 = 5025236370650103808" GET_CONNECTION 1 root The row is deleted after insert. It's the reason that the different number of rows: 3, 4 from Query 'SELECT COUNT(*) FROM t1'. So please update test to delete the QUERY 1 "DELETE FROM t1 WHERE c_05 = 5025236370650103808". Then it will work fine. So it's a test issue, not a crash safe issue.
[24 Jan 2011 8:15]
Daogang Qu
Please make sure the transaction plugin is loaded. But it's really strange. I'm generating a mtr test case for it.
[24 Jan 2011 9:51]
Daogang Qu
CREATE TABLE t1 (c_01 TINYINT SIGNED, c_02 DOUBLE SIGNED, c_03 MEDIUMINT SIGNED, c_04 SMALLINT UNSIGNED NULL, c_05 BIGINT UNSIGNED NOT NULL, c_06 DOUBLE UNSIGNED, c_07 DATETIME NULL, c_08 TINYINT SIGNED, PRIMARY KEY (c_05)); What's the default engine for t1 in nuts?
[24 Jan 2011 11:16]
Daogang Qu
The problem still can't be reproduced even if we are using complex date structure for table as following: -- source include/not_valgrind.inc -- source include/not_embedded.inc -- source include/master-slave.inc -- source include/have_debug.inc -- source include/have_binlog_format_row.inc # Reset master connection slave; --source include/stop_slave.inc connection master; RESET MASTER; connection slave; --source include/start_slave.inc connection master; call mtr.add_suppression("Attempting backtrace"); call mtr.add_suppression("allocated tablespace *., old maximum was 0"); call mtr.add_suppression("Error in Log_event::read_log_event()"); call mtr.add_suppression("Buffered warning: Performance schema disabled"); -- let $binlog_start= query_get_value(SHOW MASTER STATUS, Position, 1) -- let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1) #CREATE TABLE t1(a LONGBLOB) ENGINE=INNODB; CREATE TABLE t1 (c_01 TINYINT SIGNED, c_02 DOUBLE SIGNED, c_03 MEDIUMINT SIGNED, c_04 SMALLINT UNSIGNED NULL, c_05 BIGINT UNSIGNED NOT NULL, c_06 DOUBLE UNSIGNED, c_07 DATETIME NULL, c_08 TINYINT SIGNED, PRIMARY KEY (c_05)) ENGINE=INNODB; CREATE TABLE t2(a LONGBLOB) ENGINE=INNODB; -- echo # Test case1: Set DEBUG POINT before binlog to make -- echo # the master crash for transaction DELETE FROM t1 WHERE c_05 = 9785199741266690048; UPDATE t1 SET c_05 = 9785199741266690048, c_04= 14772, c_02= 0 WHERE c_05 = 9785199741266690048; BEGIN; INSERT INTO t1 VALUES (0, 0, 0, 16849, 5025236370650103808, 0.22E-22, '6695-06-17 14:46:03', 0); UPDATE t1 SET c_05 = 13382162220930826240, c_01= 0, c_03= -2556622, c_02= 0, c_08= -28 WHERE c_05 = 13382162220930826240; COMMIT; BEGIN; REPLACE INTO t1 VALUES (0, 0, -2780698, 36412, 5169992599251648512, 0, '9991-09-27 13:33:12', 0); UPDATE t1 SET c_05 = 5169992599251648512, c_04= 16331, c_07= '3827-03-28 01:24:56', c_03= -3763596, c_06= 0, c_08= 0 WHERE c_05 = 5169992599251648512; REPLACE INTO t1 VALUES (0, 0.43E-113, -3896527, 4379, 5555829872288268288, 0.80E-284, '6241-04-04 07:10:44', 0); BEGIN; COMMIT; INSERT INTO t1 VALUES (-32, 0, -2042839, 51290, 8041760290208677888, 0.59E-27, '9391-09-17 15:07:00', 0); DELETE FROM t1 WHERE c_05 = 18047999901170139136; UPDATE t1 SET c_05 = 5555829872288268288, c_07= '9238-02-24 20:18:45' WHERE c_05 = 5555829872288268288; BEGIN; ALTER TABLE t1 DROP c_08; REPLACE INTO t1 VALUES (46, 0, 0, 8894, 10602834005396160512, 0, '9623-12-02 13:34:23'); BEGIN; COMMIT; BEGIN; INSERT INTO t2 (a) VALUES (REPEAT('a',4)); INSERT INTO t2 (a) VALUES (REPEAT('a',4)); DELETE FROM t1 WHERE c_05 = 5025236370650103808; INSERT INTO t2 (a) VALUES (REPEAT('a',4)); # Write file to make mysql-test-run.pl expect crash and restart -- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect SET SESSION debug="d,crash_commit_after_prepare"; # Run the crashing query -- error 2013 COMMIT; -- source include/wait_until_disconnected.inc -- enable_reconnect -- echo # Restart the master server -- exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect -- source include/wait_until_connected_again.inc -- disable_reconnect -- echo # Test the transaction statements will not be binlogged -- source include/show_binlog_events.inc -- echo # On master, test the data will be rolled back after restart. SELECT COUNT(*) FROM t1; sync_slave_with_master; -- echo # On slave, test replication will work fine, and the data -- echo # is not replicated SELECT COUNT(*) FROM t1; connection master; DROP TABLE t1, t2; sync_slave_with_master; -- source include/stop_slave.inc So the above test guarantee that the transaction is crash safe when setting crash_commit_after_prepare. So it must be other problem cause the inconsistent. So the problem goes out the scope of the WL#5493.
[25 Jan 2011 8:34]
Daogang Qu
All the tests are pasted if we just set crash_commit_before and crash_commit_after_prepare as crash point as following: my @crash_points = ( "crash_commit_before", "crash_commit_after_prepare" #"crash_commit_after_log", #"crash_commit_after", #"crash_before_writing_xid", #"half_binlogged_transaction" ); daogang@PC:~/bzrwork/pushwl5493/nuts$ ./bin/nuts.pl --build=../../mysql-trunk-wl5493 --suite=rep_func --test=crash_safe_master --verbose --combination=row-crashpoint1 [16:19:16] rep_func::crash_safe_master-row-crashpoint1 .. 1..14 ok - executing command SET GLOBAL BINLOG_FORMAT='ROW' ok - executing command SET SESSION BINLOG_FORMAT='ROW' ok - executing command SET GLOBAL BINLOG_FORMAT='ROW' ok - attach slave 2 to master 1 ok - synchronize slave 2 with master 1 ok - executing command CREATE DATABASE IF NOT EXISTS test ok - executing command USE test # BEGIN ...... # SET SESSION DEBUG='d,crash_commit_after_prepare' # COMMIT ok - executing command SET SESSION DEBUG='d,crash_commit_after_prepare' ok - executing command SET GLOBAL BINLOG_FORMAT='ROW' ok - executing command SET SESSION BINLOG_FORMAT='ROW' ok - start slave 2 ok - wait that slave 2 starts. ok - synchronize slave 2 with master 1 ok - Databases have no difference ok 152863 ms [16:21:50] All tests successful. Files=1, Tests=14, 154 wallclock secs ( 0.54 usr 0.14 sys + 8.05 cusr 0.53 csys = 9.26 CPU) Result: PASS daogang@PC:~/bzrwork/pushwl5493/nuts$
[25 Jan 2011 8:46]
Daogang Qu
No test is executed if we just set crash_commit_after_prepare as crash point as following: my @crash_points = ( "crash_commit_after_prepare" ); daogang@PC:~/bzrwork/pushwl5493/nuts$ ./bin/nuts.pl --build=../../mysql-trunk-wl5493 --suite=rep_func --test=crash_safe_master --verbose --combination=row-crashpoint1 [16:38:06] Files=0, Tests=0, 0 wallclock secs ( 0.02 usr + 0.00 sys = 0.02 CPU) Result: NOTESTS So I don't understand why data will loss after crash at point crash_commit_after_prepare for binlog crash-safe even if we have no tests on the crash_commit_after_prepare point. So It's a test issue, not a crash safe issue.
[25 Jan 2011 10:18]
Serge Kozlov
This Nuts test uses "combination": same test run several times with different options. Original tests is rep_func::crash_safe_master. Combinations are the list (array) of binlog formats and the list of crash points. Running test with combination has following format: <test_name>-<combination1>-<combination2>-<combinationN> Nuts option '--combination' filters combination. So command line <NUTS_HOME>./bin/nuts.pl --build=<PATH_TO_TREE> --suite=rep_func --test=crash_safe_master --verbose --combination=row-crashpoint1 means "run test crash_safe_master from suite rep_func only for combination row-crashpoint1". "crashpoint1" is $crash_points[1] or in other words the second value of that array. It is mistake if you have 1 element in array but ask 2nd for combination.
[25 Jan 2011 19:16]
Serge Kozlov
Test case for MTR
Attachment: bug59622.tgz (application/x-gzip, text), 148.62 KiB.
[25 Jan 2011 19:18]
Serge Kozlov
Added test case above is based on NUTS dump of statements for rep_func::crash_safe_master and MTR test (test1) rpl_crash_safe_master.test
[26 Jan 2011 20:31]
Serge Kozlov
The test is updated. Now it supports engines and the correct command line is following: <NUTS_HOME>./bin/nuts.pl --build=<PATH_TO_TREE> --suite=rep_func --test=crash_safe_master --verbose --combination=row-innodb-crashpoint1
[27 Jan 2011 11:05]
Daogang Qu
We can reproduce the problem base on the above huge test. The problem is that the crashed transaction can't be rollbacked by innodb engine(maybe other engines) successfully when server recovering. But the strange thing is that the hton->rollback_by_xid(...) is called and executed successfully. Please take a look: static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin, void *arg) { ..... hton->rollback_by_xid(hton, info->list+i); ..... } innobase_rollback_by_xid( /*=====================*/ handlerton* hton, /*!< in: InnoDB handlerton */ XID* xid) /*!< in: X/Open XA transaction identification */ { trx_t* trx; DBUG_ASSERT(hton == innodb_hton_ptr); trx = trx_get_trx_by_xid(xid); if (trx) { return(innobase_rollback_trx(trx)); } else { return(XAER_NOTA); } } static int innobase_rollback_trx( /*==================*/ trx_t* trx) /*!< in: transaction */ { int error = 0; DBUG_ENTER("innobase_rollback_trx"); DBUG_PRINT("trans", ("aborting transaction")); /* Release a possible FIFO ticket and search latch. Since we will reserve the trx_sys_t::mutex, we have to release the search system latch first to obey the latching order. */ innobase_release_stat_resources(trx); /* If we had reserved the auto-inc lock for some table (if we come here to roll back the latest SQL statement) we release it now before a possibly lengthy rollback */ lock_unlock_table_autoinc(trx); error = trx_rollback_for_mysql(trx); DBUG_RETURN(convert_error_code_to_mysql(error, 0, NULL)); } The above call link works fine. But why innodb can't rollback the transaction successfully when server recovering?
[27 Jan 2011 12:22]
Marko Mäkelä
I have trouble understanding the note [27 Jan 12:05] Daogang Qu. Crash recovery is supposed to work like this in InnoDB: (1) apply the redo log (ib_logfile*) to reconstruct the pages that were dirty in the buffer pool during the crash (2) if a DDL transaction was running during the crash, roll it back (3) initiate a background thread to roll back all transactions that were active (not rolled back, committed, or XA PREPAREd) during the crash Client requests can be served during step (3). Clients can do XA RECOVER to list the prepared transactions inside InnoDB, and then do XA ROLLBACK or XA COMMIT. Bug #59440 fixes a race condition there. I do not know the crash recovery outside InnoDB. Apparently, the server must store some XA information in the binlog, and it may automatically issue XA ROLLBACK (or possibly XA COMMIT) under certain conditions. Daogang, can you please restate your question? Are you suggesting that XA ROLLBACK returns OK, even though InnoDB is not rolling back the transaction? If so, how did you determine that the transaction still exists? SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX?
[28 Jan 2011 9:50]
Daogang Qu
Hi Marko, The patch of Bug#59440 is not based on the source code mysql-trunk tree. So It's not compatible with the source code of mysql-trunk tree. The compiler complained that it can't find the conc_state 'conc_state' member for trx_t.
[28 Jan 2011 9:51]
Daogang Qu
Hi Marko, The scenario is as following: 1. Set DEBUG POINT at 'crash_commit_after_prepare' crash point to make a transaction crashed at the point. 2. Then restart the server after the crash. The prepared transaction should be rollbacked when server recovering, normally the INSERT stmt in the transcation will not really insert data into inndb tables, because the transaction is rollbacked by invoking innobase_rollback_by_xid(...) (Please correct me if I'm wrong). But the data is inserted into the innodb table in the above crashed transaction in above huge test. That's the problem. 3. The strange thing is that the xid saved in prepare phase of the transaction is retrieved truely and the transaction is also got rightly base on the xid when invoking innobase_rollback_by_xid(...). Looks the crashed data is rollbacked. But why the data is inserted into the innodb table in the transaction? 4. I added some test info in the innobase_rollback_by_xid(...) and ha_commit_trans(...) fuction before the 'crash_commit_after_prepare' crash point for checking step 3 as following: innobase_rollback_by_xid( /*=====================*/ handlerton* hton, /*!< in: InnoDB handlerton */ XID* xid) /*!< in: X/Open XA transaction identification */ { trx_t* trx; DBUG_ASSERT(hton == innodb_hton_ptr); fprintf(stderr, "rollback input formatid== %ld, gtrid_length== %ld, bqual_length== %ld, data== %s. \n", xid->formatID, xid->gtrid_length, xid->bqual_length, xid->data); trx = trx_get_trx_by_xid(xid); fprintf(stderr, "rollback get trx formatid== %ld, gtrid_length== %ld, bqual_length== %ld, data== %s. \n", trx->xid.formatID, trx->xid.gtrid_length, trx->xid.bqual_length, trx->xid.data); if (trx) { return(innobase_rollback_trx(trx)); } else { return(XAER_NOTA); } } int ha_commit_trans(THD *thd, bool all) { ...... if (! ha_info->is_trx_read_write()) continue; /* Sic: we know that prepare() is not NULL since otherwise trans->no_2pc would have been set. */ if ((err= ht->prepare(ht, thd, all))) { my_error(ER_ERROR_DURING_COMMIT, MYF(0), err); error= 1; } status_var_increment(thd->status_var.ha_prepare_count); } if (is) sql_print_information("prepare input formatid== %ld, gtrid_length== %ld, bqual_length== %ld, data== %s. \n", thd->transaction.xid_state.xid.formatID, thd->transaction.xid_state.xid.gtrid_length, thd->transaction.xid_state.xid.bqual_length, thd->transaction.xid_state.xid.data); DBUG_EXECUTE_IF("crash_commit_after_prepare", DBUG_SUICIDE();); ...... } 5. Run the test which can reproduce the problem. Then the see the following log info: 110128 12:17:32 [Note] prepare input formatid== 1, gtrid_length== 20, bqual_length== 0, data== MySQLXid^A. ...... 110128 12:17:37 [Note] Starting crash recovery... 110128 12:17:37 InnoDB: Starting recovery for XA transactions... 110128 12:17:37 InnoDB: Transaction 6E7 in prepared state after recovery 110128 12:17:37 InnoDB: Transaction contains changes to 1 rows 110128 12:17:37 InnoDB: 1 transactions in prepared state after recovery 110128 12:17:37 [Note] Found 1 prepared transaction(s) in InnoDB 110128 12:17:37 [Note] rollback xid 'MySQLXid\1\0\0\0x\3\0\0\0\0\0\0' rollback input formatid== 1, gtrid_length== 20, bqual_length== 0, data== MySQLXid^A. rollback get trx formatid== 1, gtrid_length== 20, bqual_length== 0, data== MySQLXid^A. 110128 12:17:37 [Note] Crash recovery finished. 110128 12:17:37 [Note] Event Scheduler: Loaded 0 events We can infer that it is a innodb problem base on the above analyse. Marko, please take over the bug if the inference is reasonable. Thanks!
[28 Jan 2011 9:54]
Daogang Qu
The problem can be reproduced the attached test file.
Attachment: binlog_bug59622.test (application/octet-stream, text), 151.51 KiB.
[28 Jan 2011 9:55]
Daogang Qu
s/ reproduced / reproduced by
[3 Feb 2011 11:05]
Marko Mäkelä
With some help from Luis, I tweaked the binlog_bug59622.test so that I can reproduce the bug. I am now reducing the test case. The difference is on table t3. Curiously, if I replace CREATE TABLE t0 SELECT * FROM t10; RENAME TABLE t0 TO t3; with CREATE TABLE t3 SELECT * FROM t10; then everything seems to be fine.
[3 Feb 2011 12:10]
Marko Mäkelä
Reduced test case (fails some of the time)
Attachment: innodb_bug59622.test (application/octet-stream, text), 3.01 KiB.
[9 Feb 2011 9:25]
Marko Mäkelä
During the binlog XA recovery, there is a delete-marked duplicate entry for SYS_TABLES.ID=0x11 that points to test/t0. Its successor entry is not delete-marked and points to test/t3. We could work around this inconsistency by advancing to the next matching record in dict_load_table_on_id() as long as the first matching records are delete-marked. Because the clustered index on SYS_TABLES.NAME is definitely unique, there is no need for a similar workaround in dict_load_table(). I will have to find out why the duplicate entry is possible and if it will eventually be purged. The index on SYS_TABLES.ID (DICT_TABLE_IDS_ID) is supposed to be unique.
[9 Feb 2011 12:43]
Marko Mäkelä
The immediate bug is that dict_load_table_on_id() fails to skip matching but delete-marked records. If the function is invoked before the purge gets a chance to run, it may fail to load the table. While testing this, I noticed a lot of garbage (delete-marked records) in the index ID_IND on SYS_TABLES.ID that was never purged. This appears to be caused by row_ins_must_modify(), which returns 0 when it should return ROW_INS_NEXT. The only caller of row_ins_must_modify() is prepared for the ROW_INS_NEXT return value, performing an insert as update-in-place when cursor->up_match >= enough_match. Looking at the source code, both bugs have been there ever since MySQL 3.23.53. Next, I will try to repeat this on MySQL 5.5 and 5.1.
[9 Feb 2011 13:47]
Marko Mäkelä
The dict_load_table_on_id() bug was fixed in Bug #53756 in MySQL 5.1 (built-in and InnoDB plugin), but for some reason it was apparently not merged to 5.5 or trunk. This explains why the test case is not repeatable in 5.1 but is in 5.5. The delete-marked garbage is there in 5.1 built-in and 5.1 plugin and probably all versions of InnoDB. I will filed Bug #60049 for that.
[9 Feb 2011 13:59]
Marko Mäkelä
Jimmy, please find out why the merge of Bug #53756 was lost in 5.5 and trunk, and merge it again.
[10 Feb 2011 8:03]
Marko Mäkelä
This is a duplicate of Bug #53756. Let us track the lost merge there.