MySQL Bugs: #34211: Innodb data file corruption

Bug #34211	Innodb data file corruption
Submitted:	1 Feb 2008 2:17	Modified:	30 Dec 2012 10:20
Reporter:	Mark Callaghan	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.0.37	OS:	Any
Assigned to:	Heikki Tuuri	CPU Architecture:	Any
Tags:	corruption, dump, innodb, page

Description:
I want to track this as a possible problem that needs more data. I am gather more data, and trying to get more machines for reproduction tests. Maybe someone else has the same problem.

After switching from 4.0.26 to 5.0.37, we began to use innodb_file_per_table and we noticed more errors like this:
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 84447.
InnoDB: You may have to recover from a backup.
080120 8:53:52 InnoDB: Page dump in ascii and hex (16384 bytes):

To be honest, we think we had more errors as we began to measure the errors after switching. To rule out innodb_file_per_table, I have begun to run IO stress tests on two sets of 10 servers that are all running replication and query workloads. One set uses innodb_file_per_table, the other uses SW RAID 0. The test has run for 560 days of machine times (56 days * 10 machines) per set. The set with innodb_file_per_table has 2 failures. The set with SW RAID 0 has 0 failures. This is interesting, but not significant yet. From other measurements, I think these servers have an MTBF of 600 days, so there is a 39% chance of 0 failures in 560 machine days (599/600) ^ 560. I need 0 failures in a few thousand machine days before I will be certain that innodb_file_per_table is a problem.

These servers have also been changed to compute the checksum on buffer cache pages after they have been written to confirm that corrupt data has not been written by MySQL. All of those checks have passed.

In most or all of these cases, the new style checksum does not match:
080120 8:53:52 InnoDB: Page checksum 532784171, prior-to-4.0.14-form checksum 475158686
InnoDB: stored checksum 2491121728, prior-to-4.0.14-form stored checksum 475158686
InnoDB: Page lsn 22 3889944710, low 4 bytes of lsn at page end 3889944710
InnoDB: Page number (if stored to page already) 84447,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 16
InnoDB: Page may be an index page where index id is 0 64
InnoDB: (index PRIMARY of table foo/bar)
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 84447.

How to repeat:
read above. Use many machines for many days with 10+ concurrent query sessions and a replication workload.

Suggested fix:
NA

Mark,

please show the .err log. Most of these reports are OS/driver/hardware problems. But some are due to InnoDB bugs.

Regards,

Heikki

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Mark,

Heikki pasted his question internal by accident. Here is the question: 

Mark,

is the .err log available somewhere?

--Heikki

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

As we are currently also facing one of these page checksum mismatch errors:
Has anybody noted that in about half of the reported cases of checksum mismatches, the prior-to-4.0.14-form DOES match the stored value?
Is that checksum just computed via a different algorithm or over a different part of the data set?
If it's just a different algorithm: How likely is it that a hardware failure (these checksum mismatches seem to be routinely attributed to hardware issues) corrupts one checksum, but leaves the other intact?

The server we are having the issue on has ECC RAM and data is on a software RAID.

The old checksum is computed over a few (10?, 20?) bytes at the head and tail of the page. The new checksum is computed over all of the page.

Edgar, it's very likely if you have RAID. RAID will place a page so that part of the page is on one disk drive and part on another. If one drive loses a change (power loss or some other problem) then you can get a checksum mismatch. If the change is only to the data part then only the newer checksum value will detect the difference. It's much less likely that a single drive system will show one passing and one failing.

How often a page will be split depends on the RAID stripe size. A 16k stripe size will split every page unless there is extremely careful aligning of stripe start position, that's usually not done. A 256k stripe size will split 2/16 pages. This also means that smaller stripe sizes can be very inefficient for reads and writes because the split cases require twice as many drive accesses as the unsplit ones. The best stripe size depends on the workload and system but it's more likely to be 256k or larger than smaller.

Mark,  am setting this to 'cant repeat' since the versions (4.0, 5.0) are far outdated by now.