MySQL Bugs: #31641: Recurring InnoDB Crashes

Bug #31641	Recurring InnoDB Crashes
Submitted:	16 Oct 2007 15:16	Modified:	17 Oct 2007 22:33
Reporter:	Daniel Khan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.0.45	OS:	Linux (2.6.18-4-amd64 (debian etch backport))
Assigned to:	Heikki Tuuri	CPU Architecture:	Any

Description:
Hello,

I have database with large InnoDB Tables (> 1.500.000 rows).
Those tables sometimes went corrupt when I restart the server. It is reproducable if I try to do an optimize table on one of the large ones.
Then the error log shows:

071016 16:29:33  InnoDB: Started; log sequence number 3 196810722
071016 16:29:34 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.45-Debian_1~bpo.1-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Debian etch distribution
InnoDB: Error: page n:o stored in the page read in is 1847621221, should be 11843!
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 11843.
InnoDB: You may have to recover from a backup.
071016 16:29:37  InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex 3b7373656e207665727a6175626572742077656c74776569742042616e6765722c20
[..]

ch >>Kunst & Kultur>>ArchitekturF   bac46fb72ab64252

                                                                                              p+I%  U     c'  -    ;InnoDB: End of page dump
071016 16:29:37  InnoDB: Page checksum 1775654167, prior-to-4.0.14-form checksum 215858221
InnoDB: stored checksum 997421925, prior-to-4.0.14-form stored checksum 666180653
InnoDB: Page lsn 544695660 1953981801, low 4 bytes of lsn at page end 2885491102
InnoDB: Page number (if stored to page already) 1847621221,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 1684628768
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 11843.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html
InnoDB: about forcing recovery.
InnoDB: Ending processing because of a corrupt database page.
InnoDB: Warning: we did not need to do crash recovery, but log scan
InnoDB: progressed past the checkpoint lsn 3 196810722 up to lsn 3 196810732
071016 16:29:39  InnoDB: Started; log sequence number 3 196810722
071016 16:29:39 [Note] Recovering after a crash using mysql-bin
071016 16:29:39 [Note] Starting crash recovery...
071016 16:29:39 [Note] Crash recovery finished.
071016 16:29:39 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.45-Debian_1~bpo.1-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Debian etch distribution
InnoDB: Error: page n:o stored in the page read in is 1847621221, should be 11843!
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 11843.
InnoDB: You may have to recover from a backup.
071016 16:29:41  InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex 3b7373656e207665727a6175626572742077
[..]

Afterwards I get something like this:

InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also InnoDB: http://dev.mysql.com/doc/refman/5.0/en/forcing-recovery.html
InnoDB: about forcing recovery.
InnoDB: Ending processing because of a corrupt database page.
071016 16:56:00  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
071016 16:56:00  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 3 2313616646.
InnoDB: Doing recovery: scanned up to log sequence number 3 2318859264
InnoDB: Doing recovery: scanned up to log sequence number 3 2324102144
InnoDB: Doing recovery: scanned up to log sequence number 3 2329345024
InnoDB: Doing recovery: scanned up to log sequence number 3 2334587904
InnoDB: Doing recovery: scanned up to log sequence number 3 2339830784
InnoDB: Doing recovery: scanned up to log sequence number 3 2345073664
InnoDB: Doing recovery: scanned up to log sequence number 3 2350316544
InnoDB: Doing recovery: scanned up to log sequence number 3 2355559424
InnoDB: Doing recovery: scanned up to log sequence number 3 2360802304
InnoDB: Doing recovery: scanned up to log sequence number 3 2366045184
InnoDB: Doing recovery: scanned up to log sequence number 3 2371288064
InnoDB: Doing recovery: scanned up to log sequence number 3 2376530944
InnoDB: Doing recovery: scanned up to log sequence number 3 2381773824
InnoDB: Doing recovery: scanned up to log sequence number 3 2387016704
InnoDB: Doing recovery: scanned up to log sequence number 3 2392259584
InnoDB: Doing recovery: scanned up to log sequence number 3 2397502464
InnoDB: Doing recovery: scanned up to log sequence number 3 2402745344
InnoDB: Doing recovery: scanned up to log sequence number 3 2407988224
InnoDB: Doing recovery: scanned up to log sequence number 3 2413231104
InnoDB: Doing recovery: scanned up to log sequence number 3 2418473984
InnoDB: Doing recovery: scanned up to log sequence number 3 2423716864

I upgraded mysql from 5.0.32 with no result, I exchanged disks, raid controller and ram. Nothing helped.
I am lost.

Thanks in advance

-- 
Daniel Khan

How to repeat:
optimize table foo

Thank you for the bug report. Could you please provide the show create
table table_name of one of those which doing optimize presents that
behavior?. Thanks in advance.

By the way I also saw very high cpu loads (100%-120%).
I now powered down the server and started with no luck.
The error log reports corruptions all the time but it somehow recovers and restarts afterwards.

Here is the requested information + I'll also attach my my.cnf

CREATE TABLE `ShpItem` (
  `itemID` int(10) unsigned NOT NULL auto_increment,
  `shopID` int(11) unsigned NOT NULL default '0',
  `itemNr` varchar(64) collate latin1_german2_ci NOT NULL,
  `itemNrMD5` varchar(32) collate latin1_german2_ci NOT NULL default '',
  `matchcode` varchar(32) collate latin1_german2_ci NOT NULL default '',
  `aktiv` enum('-1','1') collate latin1_german2_ci NOT NULL default '-1',
  `ean_isbn` varchar(32) collate latin1_german2_ci NOT NULL default '',
  `vendor` varchar(256) collate latin1_german2_ci NOT NULL,
  `topangebot` enum('-1','1') collate latin1_german2_ci NOT NULL default '-1',
  `topangebot_shop` enum('-1','1') collate latin1_german2_ci NOT NULL default '-1',
  `gueltig_von` int(10) unsigned NOT NULL default '0',
  `gueltig_bis` int(10) unsigned NOT NULL default '0',
  `bezeichnung` varchar(128) collate latin1_german2_ci NOT NULL default '',
  `bezeichnung2` varchar(128) collate latin1_german2_ci NOT NULL default '',
  `kbeschreibung` mediumtext collate latin1_german2_ci NOT NULL,
  `lbeschreibung` mediumtext collate latin1_german2_ci NOT NULL,
  `preis` decimal(6,2) NOT NULL default '0.00',
  `preis_statt` decimal(6,2) NOT NULL default '0.00',
  `ab` enum('-1','1') collate latin1_german2_ci NOT NULL default '-1',
  `aufanfrage` enum('-1','1') collate latin1_german2_ci NOT NULL default '-1',
  `lieferinfo` text collate latin1_german2_ci NOT NULL,
  `link` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userCat1` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userCat2` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userCat3` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userCat4` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userCat5` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userImageUrl1` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userImageUrl2` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userImageUrl3` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userImageUrl4` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `userImageUrl5` varchar(255) collate latin1_german2_ci NOT NULL default '',
  `attributes` text collate latin1_german2_ci NOT NULL,
  `datesortkey` int(10) unsigned NOT NULL default '0',
  `keywords` mediumtext collate latin1_german2_ci NOT NULL,
  `bulkActionID` varchar(32) collate latin1_german2_ci NOT NULL default '',
  `rowStatus` enum('N','A','O','D') collate latin1_german2_ci NOT NULL default 'A',
  `indexStatus` enum('P','I') collate latin1_german2_ci NOT NULL default 'P',
  PRIMARY KEY  (`itemID`),
  KEY `shopID` (`shopID`),
  KEY `bulkActionID` (`bulkActionID`),
  KEY `userCat1` (`userCat1`),
  KEY `userCat2` (`userCat2`),
  KEY `userCat3` (`userCat3`),
  KEY `userCat4` (`userCat4`),
  KEY `userCat5` (`userCat5`),
  KEY `ean_isbn` (`ean_isbn`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 COLLATE=latin1_german2_ci COMMENT='Artikeltabelle' AUTO_INCREMENT=13346375 ;

my.cnf

Attachment: my.cnf.txt (text/plain), 5.14 KiB.

Here's another error that occurs if everything goes wrong:

 Error: index `katID` of table `shopping-live/ShpItem2Cat` contains 1762016 entries, should be 18446744073709551615
Error: index `itemID` of table `shopping-live/ShpItem2Cat` contains 1762016 entries, should be 18446744073709551615
Error: index `shopID` of table `shopping-live/ShpItem2Cat` contains 1762016 entries, should be 18446744073709551615

This is probably file corruption.

071016 16:29:37  InnoDB: Page checksum 1775654167, prior-to-4.0.14-form checksum
215858221
InnoDB: stored checksum 997421925, prior-to-4.0.14-form stored checksum 666180653
InnoDB: Page lsn 544695660 1953981801, low 4 bytes of lsn at page end 2885491102
InnoDB: Page number (if stored to page already) 1847621221,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 1684628768

InnoDB stamps the checksums just prior writing the page to the file. It is unlikely that InnoDB could corrupt them.

Looks like the page number and the space id in the page header are garbage.

Please check your disk subsystem.

Thank's for the reply.

Meanwhile I migrated the data to another host and it runs painfree now.
So although disks, ram and raid controller were changed the hardware was obviously still faulty.

I'll close this report.

Greetings

-- 
Daniel Khan