Bug #75697 Add a consistency check against DB_TRX_ID being in the future
Submitted: 30 Jan 2015 12:28 Modified: 11 May 2015 13:25
Reporter: Marko Mäkelä Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:3.23.53 OS:Any
Assigned to: CPU Architecture:Any

[30 Jan 2015 12:28] Marko Mäkelä
Description:
This was originally reported in a forum thread:
http://forums.mysql.com/read.php?22,604726,604726

InnoDB can seemingly be rolled back to the past, not seeing any records that were committed days or weeks ago.

I finally repeated this by shutting down the server, resetting TRX_SYS_TRX_ID_STORE to a smaller value, and starting the server.
All records would be visible at the READ UNCOMMITTED isolation level or when using --innodb-read-only, but not by default (REPEATABLE READ).

Note: This is not a way to properly roll back to the past. We do not have the history information for records that were updated or deleted. I only repeated this by using the INSERT statement where MVCC is based on a simple trx_id comparison (it does not attempt to access any undo log records).

When investigating this, I noticed that we do not have any debug assertions failing. It is a serious error if an index page contains transaction IDs that are in the future.

How to repeat:
Start up an empty database with --innodb-checksum-algorithm=none

Create a table and insert some records.
Shut down and restart the server.
Insert some more records.
Shut down the server.

In ibdata1, in the TRX_SYS page, reset TRX_SYS_TRX_ID_STORE to 0x500.
This will cause trx_sys->max_trx_id to be initialized to 0x700 at the next startup.
This 8-byte field is on page 5 at byte offset 38, that is, 5*16384+38 = 81958 bytes from the start of the file.

Start up the server.
SELECT from your table, and see that not all records are present.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Repeat the SELECT, and see that all records are present.

Note that no debug assertion failed, even though our transaction system is logically corrupted.

Suggested fix:
In ReadView::changes_visible(trx_id_t id), add
ut_ad(id <= trx_sys->max_trx_id);
[23 Feb 2015 12:27] Marko Mäkelä
Posted by developer:
 
I reproduced this in MySQL 5.5.43. I am convinced that every InnoDB version is affected.

I created the following test as innodb.foo:

CREATE TABLE foo(a INT PRIMARY KEY)engine=innoDB;
INSERT INTO foo VALUES(1),(2),(3),(4),(5);

let MYSQLD_DATADIR=`select @@datadir`;
-- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
-- shutdown_server
-- source include/wait_until_disconnected.inc
perl;
my $file = "$ENV{MYSQLD_DATADIR}/ibdata1";
open(FILE, "+<", $file) || die "Unable to open $file";
seek(FILE, 5*16384+38+4, 0) || die "Unable to seek $file";
die unless read(FILE, $_, 4) == 4;
print "trx_id=", unpack("H*", $_), "\n";
seek(FILE, 5*16384+38, 0) || die "Unable to seek $file";
syswrite(FILE, pack("N*", 0, 0)) || die "Unable to write $file";
close(FILE)
EOF

-- exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
-- enable_reconnect
-- source include/wait_until_connected_again.inc
-- disable_reconnect

SELECT * FROM foo;
DROP TABLE foo;

Then, run it as
./mtr --mysqld=--skip-innodb-checksums innodb.foo

The DROP TABLE will crash, because the transaction system is corrupted:

Version: '5.5.43-debug-log'  socket: '/dev/shm/u/mysql-test/var/tmp/mysqld.1.sock'  port: 13000  Source distribution
150223 15:21:10  InnoDB: Error: transaction id associated with record
PHYSICAL RECORD: n_fields 5; 1-byte offsets; info bits 32
 0: len 8; hex 000000000000000f; asc         ;;
 1: len 4; hex 00000000; asc     ;;
 2: len 6; hex 000000000500; asc       ;;
 3: len 7; hex 810000012d014e; asc     - N;;
 4: len 1; hex 61; asc a;;
InnoDB: in index "CLUST_IND" of table "SYS_FIELDS"
InnoDB: is 500 which is higher than the global trx id counter 202!
InnoDB: The table is corrupt. You have to do dump + drop + reimport.
150223 15:21:10  InnoDB: Assertion failure in thread 140031783208704 in file row0upd.ic line 176
InnoDB: Failing assertion: lock_check_trx_id_sanity(trx_read_trx_id(rec + offset), rec, index, offsets, (0))

But, the SELECT returned wrong (empty) result without flagging any problems to the error log or the client.
The result to the mtr client was as follows:

CREATE TABLE foo(a INT PRIMARY KEY)engine=innoDB;
INSERT INTO foo VALUES(1),(2),(3),(4),(5);
trx_id=00000500
SELECT * FROM foo;
a
innodb.foo                               [ fail ]

Note that this test is only a proof-of-concept. For a proper mtr test, some special measures would be needed in order to bump the trx_id back to the correct value and to continue running other tests. Also, without disabling checksums, the server restart would fail:

CREATE TABLE foo(a INT PRIMARY KEY)engine=innoDB;
INSERT INTO foo VALUES(1),(2),(3),(4),(5);
trx_id=00000500
innodb.foo                               [ fail ]
        Test ended at 2015-02-23 14:25:22

CURRENT_TEST: innodb.foo

Server [mysqld.1 - pid: 11503, winpid: 11503, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
[snip]
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 5.
InnoDB: You may have to recover from a backup.
[11 May 2015 13:25] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.7.8, 5.8.0 releases, and here's the changelog entry:

A warning message is now printed if "DB_TRX_ID" stored in a record is
found to be greater than "max_trx_id". In debug builds, an assertion is
raised.