Bug #31038 if the server locks up, mysql appears to write gibberish into the data file
Submitted: 15 Sep 2007 5:29 Modified: 24 Sep 2007 17:44
Reporter: Maurice Volaski Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.0.48 OS:Linux (Gentoo )
Assigned to: Heikki Tuuri CPU Architecture:Any
Tags: database page corruption locking up corrupt doublewrite

[15 Sep 2007 5:29] Maurice Volaski
Description:
The OS locked up independently of MySQL. On reboot, the database was hosed. I don't know if this same bug as http://bugs.mysql.com/31008, but I don't think so because that bug seems wholly intrinsic to mysql. Just run mysqldump and mysql corrupts the data files. Here the underlying disk has become unresponsive and that somehow causes gibberish to be written into the data files. Or it could be that the data files are intact and the recovery mechanism is but. Either way, this looks like a separate bug. And I've never seen in it 4.0.x. That version didn't mind server crashes. 

How to repeat:
You could try yanking the power cord out of a server and see if the database get trashed. And were' talking about database that probably was not being accessed let along written to at the time. 

I can provide privately the data set to which this and the other crash occur on in case it is environmentally related to the data set.
[15 Sep 2007 5:30] Maurice Volaski
Here are the details of the crash

070915  1:14:01  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Warning: database page corruption or a failed
InnoDB: file read of page 206.
InnoDB: Trying to recover it from the doublewrite buffer.
InnoDB: Dump of the page:

070915  1:14:02  InnoDB: Page checksum 2267073817, prior-to-4.0.14-form checksum 3028325383
InnoDB: stored checksum 10432729, prior-to-4.0.14-form stored checksum 3028325383
InnoDB: Page lsn 0 514076, low 4 bytes of lsn at page end 514076
InnoDB: Page number (if stored to page already) 206,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an index page where index id is 0 24
InnoDB: Also the page in the doublewrite buffer is corrupt.
InnoDB: Cannot continue operation.
InnoDB: You can try to recover the database with the my.cnf
InnoDB: option:
InnoDB: set-variable=innodb_force_recovery=6
[17 Sep 2007 11:27] Heikki Tuuri
Maurice,

what Linux version on which hardware are you using?

Do you use NFS or some other exotic file system?

Page checksum errors on disk are probably caused by bad hardware or OS bugs. InnoDB is a transactional database. It should survive an OS crash or a power outage.

Can you attach the entire .err log, gzipped? It is often the first error print which is the most interesting one.

Regards,

Heikki
[17 Sep 2007 12:23] Heikki Tuuri
Here a large scale test from CERN about Linux file corruption:

http://fuji.web.cern.ch/fuji/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf
[17 Sep 2007 12:42] MySQL Verification Team
Please answer Heikki's question. Thanks in advance.
[17 Sep 2007 16:12] Maurice Volaski
It is Gentoo 64-bit, with kernel 2.6.22-r3 and then before the last crash, r6.

The filesystem is ext3 and it is running on top of drbd, which is a network RAID-1 kernel module. That was version 8.0.5 during the crashes.

I am sending the error log which goes back a few weeks, so you will see several crashes on there. Many times after a dump was taken (3:10 AM timestamps) and perhaps a few other times. The last time was after the server locked up.
[17 Sep 2007 16:14] Maurice Volaski
mysql error log

Attachment: mysqld.err.gz (application/x-gzip, text), 67.30 KiB.

[17 Sep 2007 16:49] Heikki Tuuri
Maurice, my first guess is to suspect the RAID-1 driver.
[24 Sep 2007 17:41] Maurice Volaski
This bug can be closed. The general consensus on the mailing lists is that this was due to faulty hardware and I indeed confirmed it was a bad PCI riser card.
[24 Sep 2007 17:44] MySQL Verification Team
Thank you for the feedback.