Bug #5910 InnoDB: Next record offset nonsensical
Submitted: 5 Oct 2004 22:20 Modified: 8 Oct 2004 4:46
Reporter: Eric Korpela Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:4.0.21-linux-x86_64 (Official binary) OS:Linux (Linux 2.4.21-20.ELsmp)
Assigned to: Heikki Tuuri CPU Architecture:Any

[5 Oct 2004 22:20] Eric Korpela
Description:
While reloading a ~7 GB dump of a database generated by mysqldump, mysqld crashes with a message...

InnoDB: Next record offset nonsensical 60952 for rec 12575
041005 11:14:33  InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex a88fb3f2000a65ee000a65ed000a65ef00000002f97940f945bf00000000
0000000000000000007a3ee101e6008600210000000201e201e3000000000000227c00000000
0000000003500000000000000000000000000000000000000000080100000300a7696e66696d
756d000904000803000073757072656d756d0018100800001007000000000000000000000000
000000000000800000054c78ba441810080000180700c8000000000000000000000000000000
#### PARTS OFDUMP DELETED FOR BREVITY.  IF YOU WANT TO SEE IT ALL, PLEASE ASK.  STUFF LIKE THIS CONTINUES UNTIL... ####
08000f2807007400000000000000000000000000000000800000000191b751f52ba52b552b05
2ab52a652a1529c52975292528d52885283527e52795274526f526a500743e663de23d5e3cda
3c563bd23b4e3aca3a4639c2393e38ba383637b2372e36aa362635a2351e349a34163392330e
328a3206318230fe307a2ff62f722eee2e6a2de62d622cde2c5a2bd62b522ace2a4a29c62942
28be283a27b6273226ae262a25a62522249e241a23962312228e220a21862102207e1ffa1f76
1ef21e6e1dea1d661ce21c5e1bda1b561ad21a4e19ca194618c2183e17ba173616b2162e15aa
152614a2141e139a13161292120e118a110610820ffe0f7a0ef60e720dee0d6a0ce60c620bde
0b5a0ad60a5209ce094a08c6084207be073a06b6063205ae052a04a60422039e031a02960212
018e010a0065a9af93a4f97940f9; asc ### ALSO DELETED###

How to repeat:
Any large reload this, or another similar sized database seems to cause the same problem.  Smaller reloads seem to work, but multiple small reloads also crash after a while.  I don't know if it's related to the schema or to the data in these tables.  I am willing to make the schema and the data files available to anyone who wants to attempt to recreate the problem.

Suggested fix:
No idea.  I haven't yet found a work around.
[5 Oct 2004 22:25] Eric Korpela
Oops, I truncated part of the log.  Following the page dump is...
041005 11:14:33  InnoDB: Page checksum 2827990002, prior-to-4.0.14-form chec
ksum 2846856100
InnoDB: stored checksum 2827990002, prior-to-4.0.14-form stored checksum 284
6856100
InnoDB: Page lsn 2 4185473273, low 4 bytes of lsn at page end 4185473273
InnoDB: Page may be an index page where index id is 0 848
InnoDB: (index `ra` of table `new_spear_fims/photon`)
041005 11:14:33  InnoDB: Apparent corruption of an index page n:o 681454 in 
space 0
InnoDB: to be written to data file. We intentionally crash server
InnoDB: to prevent corrupt data from ending up in data
InnoDB: files.
041005 11:14:33InnoDB: Assertion failure in thread 1147140464 in file buf0fl
u.c line 268
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. See section 6.1 of
InnoDB: http://www.innodb.com/ibman.php about forcing recovery.
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnos
e
the problem, but since we have already crashed, something is definitely wron
g
and this may fail.

key_buffer_size=268435456
read_buffer_size=8384512
max_used_connections=2
max_connections=100
threads_connected=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 19
00143 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Number of processes running now: 0
041005 11:14:37  mysqld restarted

It appears to be random whether the restart succeeds or crashes.
[5 Oct 2004 22:31] Eric Korpela
mysqld log

Attachment: zork.err (application/octet-stream, text), 74.33 KiB.

[5 Oct 2004 22:32] Eric Korpela
mysql configuration file

Attachment: my.cnf (application/octet-stream, text), 5.01 KiB.

[6 Oct 2004 8:36] Heikki Tuuri
Eric,

please send the entire .err log to heikki.tuuri@innodb.com

Do not cut anything off.

Also print what

SHOW CREATE TABLE ...

says about the suspect table(s) new_spear_fims.photon and others that crash.

If this is a repeatable way to create corruption in InnoDB, this is the first report in a long time.

"Next record offset nonsensical 60952 for rec 12575": this means that the index record at offset 12575 bytes has the next record pointer pointing to offset 60952 bytes. This is impossible, because the InnoDB page size is 16 kB.

Regards,

Heikki
[6 Oct 2004 8:41] Heikki Tuuri
Eric,

oops I noticed that you already uploaded the .err file for this crash. Please send also the .err files and SHOW CREATE TABLE about the other crashes that you have seen.

Regards,

Heikki
[6 Oct 2004 8:43] Heikki Tuuri
Eric,

I notice that you are using a 64-bit Opteron or Nocona. Please describe your hardware.

Regards,

Heikki
[6 Oct 2004 9:20] Heikki Tuuri
Hi!

This is interesting. Looks like 64 bytes have been copied from offset 0x1100 on the page to offset 0x3100!

110600000000000000000000000000000000800000000191b5ed18100804041007
112700000000000000000000000000000000800000000191b5ee18100800041807
114800000000000000000000000000000000800000000191b5ef18100800042007
116900000000000000000000000000000000800000000191b5f018100800042807
118a00000000000000000000000000000000800000000191b5f118100804043007

...

30bc00000000000000000000000000000000800000000191b6e3181008000bc007
30dd00000000000000000000000000000000800000000191b6e4181008000bc807
30fe00000000000000000000000000000000800000000191b6e5181008040bd007
311f000004041007112700000000000000000000000000000000800000000191b5
ee18100800041807114800000000000000000000000000000000800000000191b5
ef1800000000000000000000000000000000800000000191b6e8181008000be807
318200000000000000000000000000000000800000000191b6e9181008040bf007
31a300000000000000000000000000000000800000000191b6ea181008000bf807
31c400000000000000000000000000000000800000000191b6eb181008000c0007

The copied block:

04041007112700000000
00000000000000000000
0000800000000191b5ee
18100800041807114800
00000000000000000000
00000000008000000001
91b5ef18

This might be bad RAM. But let us first look at the other crashes that you have seen.

Regards,

Heikki
[6 Oct 2004 18:20] Eric Korpela
Thanks for the response!

The hardware is a dual Operton 246 with 6GB EEC RAM.  EEC is on as is EEC scrub and EEC correction writeback.  4 GB RAM is used for innodb buffers.   Innodb logs are on an internal 250 GB SATA drive.  The myisam and innodb data files are on a 360 GB SCSI RAID-5 array.

We tried loading another large database last night, which crashed again this morning.  I'll upload the error log and the SHOW CREATE TABLE output once the rollbacks are complete.
[6 Oct 2004 23:21] Eric Korpela
output of show create table new_spear_fims.photon

Attachment: photon (application/octet-stream, text), 982 bytes.

[6 Oct 2004 23:37] Eric Korpela
Error log for crash of 2004/10/06

Attachment: latest_crash.err (application/octet-stream, text), 197.96 KiB.

[6 Oct 2004 23:41] Eric Korpela
I've uploaded the "show create table" output and the error log from today's crash.

I can't be sure that today's crash isn't due to corruption from yesterdays, so I will rebuild the innodb files and logs in order to generate a newer crash dump.

I will also use the opportunity to run more detailed memory/burn-in tests than I had run earlier to see if there might be an intermittent hardware problem.
[7 Oct 2004 7:08] Heikki Tuuri
Eric,

in the new printout, looks like something has zeroed bytes at offsets

0x20fc to 0x20ff (inclusive)

or

0x20fc to 0x20fe (inclusive)

on the page.

The previous corruption occurred at 0x3100. Some similarity in offsets.

Again, this looks like a fault in hardware, like in RAM, processor page tables, processor caches, memory controller. Or a bug in the OS memory management.

Experience from at least two other users is that there are unexplained crashes in Opteron computers. I think we have to have this bug repeated in some other 64-bit (best to try with little-endian processors) computer to show that this is an InnoDB bug. In principle, this might be a bug in some InnoDB code that handles with addresses divisible by 256. For example, the additional mem pool. Bad pointers there might cause symptoms like this.

Regards,

Heikki

20dd800001118000000002f7e0700c04000d0005
20f1800001118000000002f7e0710c04000d0805
2105800001118000000002f7e0000000000d1005
2119800001118000000002f7e0730c04000d1805
212d800001118000000002f7e0740c04000d2005
2141800001118000000002f7e0750c04000d2805
2155800001118000000002f7e0760c04040d3005
[7 Oct 2004 15:57] Eric Korpela
The memory test has been running all night (memtest86) with no errors reported.  memtest86 is a pretty stringent test suite, so I think hardware memory issues are probably out.

Once we've rebooted, I'll run a disk test suite to ensure we don't have a subtle disk problem.  Beyond that we have either innodb or the Linux x86_64 kernel.   I suppose it could also be a glibc bug?
[7 Oct 2004 16:00] Eric Korpela
Anyone have an IA64 machine with a couple hundred gigs of disk space I could try to recreate this bug on?
[7 Oct 2004 23:05] Eric Korpela
It turns out that innodb is off the hook.  I wrote a simple program to memory map a 20GB file as an array of off_t.  Then I wrote into each element of the array its own offset.  During readback, some items were misplaced.

Error at offset 641248800 != 641248288
Error at offset 641248801 != 641248289
Error at offset 641248802 != 641248290
Error at offset 641248803 != 641248291
Error at offset 641248804 != 641248292
Error at offset 641248805 != 641248293
Error at offset 641248806 != 641248294
Error at offset 641248807 != 641248295
Error at offset 718280736 != 722050080
Error at offset 718280737 != 722050081
Error at offset 718280738 != 722050082
Error at offset 718280739 != 722050083
Error at offset 718280740 != 722050084
Error at offset 718280741 != 722050085
Error at offset 718280742 != 722050086
Error at offset 718280743 != 722050087

Look familiar?

I'm now doing some tests to see whether this is a problem with the disk hardware, the linux kernel, or the driver.  Sorry to trouble you with this.
[8 Oct 2004 4:40] Heikki Tuuri
Eric,

it would be great if you can send your test program to the vendor of your computer, to AMD, and the producer of your Linux kernel and drivers. Problems in Opterons with Linux seem to be so common that there probably is some bug or hardware design flaw that needs to be corrected. In Itanium computers there also were problems, but they seem to work well now.

The standard advice in cases like this is to try upgrading to the latest Linux kernel and drivers.

By the way, your family name Korpela is a Finnish name.

Thank you,

Heikki
[8 Oct 2004 4:46] Heikki Tuuri
I am changing the status of the bug to 'Can't repeat', because bug is not in MySQL. But please report to the bugs database if and how you were able to resolve the problem.

Regards,

Heikki
[19 Oct 2004 18:32] Eric Korpela
A little additional info in case someone else stumbles into this.

The bug is apparently related in some way to the IOMMU mechanism in the Linux kernel.  One workaround is to specify "mem=4032M" (4GB-64MB) as a kernel parameter.  Of course that throws away any RAM above 4G, so it's an unsatisfactory solution.  

I've submitted a bug report to RedHat, since it's a RedHat distribution I'm using.  The bugzilla URL for this item is https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=135017

Eric