MySQL Bugs: #18886: InnoDB: Database page corruption on disk or a failed file read of page

Bug #18886	InnoDB: Database page corruption on disk or a failed file read of page
Submitted:	7 Apr 2006 14:30	Modified:	5 Jul 2006 14:44
Reporter:	Philip Molter	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	4.1.16	OS:	Linux (Linux (Fedora Core 4, x86_64))
Assigned to:	Marko Mäkelä	CPU Architecture:	Any
Tags:	3ware

Description:
I have a 3.23.58 system replicating to two 4.1.16 systems (this 4.1.16 system contains one patch, the one for the slow update performance bug fixed in 4.1.18).  Both of the slave systems received an update command last night which triggered both of them to crash with a page corruption error in InnoDB.  I will attach logs for each system.  You'll see that the corruption for both databases is identical.  That master system also slaves to another 3.23.58 system.  The older slave did not have any errors.  Dumping the data with select * into outfile recreating the table, and reloading the data allowed the slaving to continue without issue.  The data for both of these systems was created by doing a mysqldump of the master, then importing that dump via the source command (this allowed us to go from a shared-tablespace setup on the 3.23 master to a per-file tablespace setup on the two new slaves).

Both affected slave systems are dual Opteron 265 machines with 16GB of RAM running Fedora Core 4, kernel 2.6.16-1.2069_FC4 (latest release) built for x86_64.  They have 3ware 9500S 12-drive controllers using Linux software RAID (raid0 of raid1s for the database filesystem).  We have plenty of experience with this hardware setup and have never had hardware-related corruption with it.  The operating system is pretty verbose about reporting possible issues, both with memory errors and disk issues.  Given that it happened identically to two machines, I do not believe the issue is related to hardware.

Here is the relevant information from their MySQL configs:

[mysqld]
set-variable = key_buffer=128M
set-variable = table_cache=800
set-variable = sort_buffer=64M
set-variable = myisam_sort_buffer_size=128M
set-variable = long_query_time=3
set-variable = max_binlog_size=128M
set-variable = max_connections=500
set-variable = thread_cache_size=8
set-variable = thread_concurrency=8

delay-key-write-for-all-tables

default-table-type=InnoDB
innodb_data_file_path = ibdata1:2048M;ibdata2:2048M
innodb_file_per_table

set-variable = innodb_mirrored_log_groups=1
set-variable = innodb_log_files_in_group=4
set-variable = innodb_log_file_size=256M
set-variable = innodb_log_buffer_size=32M

innodb_flush_log_at_trx_commit=0
innodb_log_archive=0

set-variable = innodb_buffer_pool_size=1G
set-variable = innodb_additional_mem_pool_size=128M
set-variable = innodb_file_io_threads=4
set-variable = innodb_lock_wait_timeout=60

How to repeat:
Cannot reproduce, but did happen on two machines.

Suggested fix:
A corrupt page certainly shouldn't bring down the entire database process, especially since in a replicated environment, that just leads to repeated crashes as the system comes back up, tries to run the command that caused the problem, then immediately crashes again.

Error log, server #1

Attachment: mysqld.gn.log.1 (application/octet-stream, text), 49.39 KiB.

Error log, server #2

Attachment: mysqld.gn.log.2 (application/octet-stream, text), 49.39 KiB.

Philip,

please post the COMPLETE .err log from both slaves.

This bug report is remarkable, since this shows InnoDB probably corrupts a page just before writing it to the data file, or InnoDB corrupts it just after reading the page in. Or there is an extraordinarily deterministic OS/driver bug that causes corruption.

The lsn that is stamped to the start of the page is slightly higher than the lsn at the end.

InnoDB: Page lsn 9 2872311775, low 4 bytes of lsn at page end 2872310629

InnoDB checks that the lsn at the start and the end match just before writing to the file.

Regards,

Heikki

os0file.c:
        /* Do the i/o with ordinary, synchronous i/o functions: */
        if (slot->type == OS_FILE_WRITE) {
                if (array == os_aio_write_array) {
                        if ((total_len % UNIV_PAGE_SIZE != 0)
                            || (slot->offset % UNIV_PAGE_SIZE != 0)) {
                                fprintf(stderr,
"InnoDB: Error: trying a displaced write to %s %lu %lu, len %lu\n",
                                        slot->name, (ulong) slot->offset_high,
                                        (ulong) slot->offset,
                                        (ulong) total_len);
                                ut_error;
                        }

                        /* Do a 'last millisecond' check that the page end
                        is sensible; reported page checksum errors from
                        Linux seem to wipe over the page end */

                        for (len2 = 0; len2 + UNIV_PAGE_SIZE <= total_len;
                                                len2 += UNIV_PAGE_SIZE) {
                                if (mach_read_from_4(combined_buf + len2
                                                + FIL_PAGE_LSN + 4)
                                    != mach_read_from_4(combined_buf + len2
                                                + UNIV_PAGE_SIZE
                                        - FIL_PAGE_END_LSN_OLD_CHKSUM + 4)) {
                                        ut_print_timestamp(stderr);
                                        fprintf(stderr,
"  InnoDB: ERROR: The page to be written seems corrupt!\n");
                                        fprintf(stderr,
"InnoDB: Writing a block of %lu bytes, currently writing at offset %lu\n",
                                        (ulong)total_len, (ulong)len2);
                                        buf_page_print(combined_buf + len2);
                                        fprintf(stderr,
"InnoDB: ERROR: The page to be written seems corrupt!\n");
                                }
                        }
                }

                ret = os_file_write(slot->name, slot->file, combined_buf,
                                slot->offset, slot->offset_high, total_len);

I can post more (I have 100MB of log file), but it's all just the same stuff I've posted as the server would get restarted by mysqld_safe, start up replication, read in the replicated command, and then crash with the same error.

Is there something specific in the log you're looking for?  I chopped out anything before the error (startup stuff from the last server startup) and anything after the error (Number of processes running now: 0, mysqld restarted, etc.).  I did verify that the error messages that were repeating were the same as that which I already posted.

I do want to correct something.  I said the code that caused the error was an UPDATE command.  That's false.  The error was a DELETE command:

060406 19:21:22 [ERROR] Slave: Error 'Got error -1 from storage engine' on query
. Default database: 'gn'. Query: 'DELETE FROM service_attr WHERE sid='146764' AN
D name='hash_password'', Error_code: 1030.

That line was pulled from a later log that was generated when the server was in recovery mode, hence the '-1 from storage engine' error (this was during recovery).

Philip,

ok, then  post all the .err log before the first error. I am interested in seeing how the InnoDB tablespace was created etc.

Regards,

Heikki

Ahh, okay.  I have that information for only one of the servers.  The other server has since rotated it out.  I'll attach the log from the initial setup through the crash past the restart and up right until the point of the second crash.

Error log, server #1 (full)

Attachment: mysqld.log (application/octet-stream, text), 63.22 KiB.

Philip,

the .err log does not show anything out of the ordinary.

This bug is really strange:

1) it is highly unlikely that any InnoDB memory corruption bug could wipe over the lsn at the start and the end of the page just before it gets written to the data file; there have never been any failure reports of the lsn check just before the write;

2) since the page read repeatedly gave the same error message, it is also unlikely that any InnoDB bug would so deterministically corrupt a page that is read in from a data file;

3) an OS bug / driver bug is unlikely to be so deterministic that it hits two slaves in the exact same way!

The execution in your slaves has been rather deterministic, as the page lsn is the same in both slaves.

Hmm... I could add to 5.0 an lsn check also AFTER a buffer has been written to the data file. If that check passes ok, then it is almost certain that this is an OS / driver bug.

Regards,

Heikki

Marko,

please add the 'last millisecond check' also AFTER a file write. This should be done in 4.1, 5.0, and 5.1. Simply put this code in os0file.c:

                if (array == os_aio_write_array) {
                        /* Do a 'last millisecond' check that the page end
                        is sensible; reported page checksum errors from
                        Linux seem to wipe over the page end */

                        for (len2 = 0; len2 + UNIV_PAGE_SIZE <= total_len;
                                                len2 += UNIV_PAGE_SIZE) {
                                if (mach_read_from_4(combined_buf + len2
                                                + FIL_PAGE_LSN + 4)
                                    != mach_read_from_4(combined_buf + len2
                                                + UNIV_PAGE_SIZE
                                        - FIL_PAGE_END_LSN_OLD_CHKSUM + 4)) {
                                        ut_print_timestamp(stderr);
                                        fprintf(stderr,
"  InnoDB: ERROR: The page to be written seems corrupt!\n");
                                        fprintf(stderr,
"InnoDB: Writing a block of %lu bytes, currently writing at offset %lu\n",
                                        (ulong)total_len, (ulong)len2);
                                        buf_page_print(combined_buf + len2);
                                        fprintf(stderr,
"InnoDB: ERROR: The page to be written seems corrupt!\n");
                                }
                        }
                }

also after the call of:

                ret = os_file_write(slot->name, slot->file, combined_buf,
                                slot->offset, slot->offset_high, total_len);
Regards,

Heikki

Of course, the patch does not fix any bug. But it can prove that the problem really is in the OS or hardware.

Yesterday night 4 of my mysql servers crashed with the same error. Even I use mysql replication, and all my slaves crashed. But interestingly, the first slave (a single server) crashed before I pushed any changes to the master. 

Once I pushed the changes to master, all 4 slaves crashed(It was an UPDATE query). So I dont yet beleive that it was due to a master slave config. I am running in this config since more that 4 months, but this kind of crash has happened for the first time!

I use mysql 4.1.10a
My INNODB config:
 innodb_additional_mem_pool_size : 20971520
 innodb_autoextend_increment : 8
 innodb_buffer_pool_awe_mem_mb : 0
 innodb_buffer_pool_size : 1073741824
 innodb_data_file_path : ibdata1:10M:autoextend
 innodb_data_home_dir :
 innodb_fast_shutdown : ON
 innodb_file_io_threads : 4
 innodb_file_per_table : ON
 innodb_flush_log_at_trx_commit : 0
 innodb_flush_method :
 innodb_force_recovery : 0
 innodb_lock_wait_timeout : 50
 innodb_locks_unsafe_for_binlog : OFF
 innodb_log_arch_dir :
 innodb_log_archive : OFF
 innodb_log_buffer_size : 8388608
 innodb_log_file_size : 5242880
 innodb_log_files_in_group : 2
 innodb_log_group_home_dir : ./
 innodb_max_dirty_pages_pct : 90
 innodb_max_purge_lag : 0
 innodb_mirrored_log_groups : 1
 innodb_open_files : 300
 innodb_table_locks : ON
 innodb_thread_concurrency : 20

Any clues to this problem?
Will upgrading to 4.1.19 solve my problem?

Interestingly again, I am running on AMD Opetron, with 8GB RAM on each.
(Dual Proc AMD Opteron(tm) Processor 250)
All posts I have seen about susch a crash, involve AMD Opetron! Does this suggest something?

Thanks,
Jeetu

Forgot to mention:
I am running RHEL3 : 2.4.21-31.ELsmp
Will attach logs from one server.

Not allowed to attach files :(
If required, give me an email address, i'll send you gzipped log.
Jeetu
(jeetum at  gmail dot com)

Jeetendra,

I assume that you did not get corruption in the slaves and it was just an ordinary crash? Then it is probably some deterministic bug in MySQL. An upgrade to 4.1.xx may well solve the problem.

Regards,

Heikki

Yes, it was on a slave. But later since then, I have had crashes on individual boxes (all slaves), in different databases, even the ones that are not replicated. 
I am planning to upgrade to 4.1.20 soon (I am currently using 4.1.10a) . Will keep you all posted if I see the same page corruption problem after upgrading.

Jeetu

Heikki,

Thank you very much for taking the time to look at this.  After some very thorough debugging on our end, it appears as though the corruption was caused by a bug in the firmware on our 3ware card.  I was able to get the corruption to be fairly easily reproduced on our systems, and once I upgraded to a newer firmware release, 2.06.00.009, the problem hasn't shown up again (3 days and counting, despite my best efforts to cause it).

I think the consistency from the initial bug report was a coincidence.  Subsequent tests showed a couple more consistent failures, but also many inconsistent failures, and doing tests with other, non MySQL software and the older firmware release, we were also able to produce similar forms of corruption.

Can this safely be closed as 'Not a bug'?

From my standpoint, the bug is fixed.  I'm not sure about the others that have attached messages to this bug.

I should say, 'Not a bug', not 'fixed'.

My problem got solved by upgrading the kernel on my RHEL 3 system.

Thanks for the confirmation.

I have documented my side of the problem here:
http://www.jeetu.isa-geek.com/rhel3mysqlbug

Strange that a combination of some hardware and some software version puts you in a really bad situation.

Thanks everyone who contributed to this bug (or whatever it was :-) )

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)