MySQL Bugs: #16515: MySQL crashes saying innodb page corruption

Bug #16515	MySQL crashes saying innodb page corruption
Submitted:	15 Jan 2006 17:05	Modified:	3 Jul 2013 5:45
Reporter:	ravi prasad	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	4.0.26, 5.0	OS:	FreeBSD (FreeBSD-4.10)
Assigned to:	Marko Mäkelä	CPU Architecture:	Any

Description:
The mysql server(slave) crashes with the following messages in its error
log, and gets restarted, after which rollbacks and starts replicating from
 master
Page directory corruption: supremum not pointed to
051226 10:02:58  InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex 
 0000000000000000000000000000000000000000000000000000.....(many
 0's)
  ;InnoDB: End of page dump
 051226 10:02:58  InnoDB: Page checksum 1575996416, prior-to-4.0.14-form
 checksum 1371122432
 InnoDB: stored checksum 0, prior-to-4.0.14-form stored checksum 0
 InnoDB: Page lsn 0 0, low 4 bytes of lsn at page end 0
 InnoDB: Error: trying to access a stray pointer 0xc9c1bff8
 InnoDB: buf pool start is at 0x52a4000, number of pages 70400
 051226 10:02:58InnoDB: Assertion failure in thread 20491 in
 file ../../innobase/page/../include/buf0buf.ic line 286
 InnoDB: We intentionally generate a memory trap
 InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
 InnoDB: If you get repeated assertion failures or crashes, even
 InnoDB: immediately after the mysqld startup, there may be
 InnoDB: corruption in the InnoDB tablespace. Please refer to
 InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html
 InnoDB: about forcing recovery.
 mysqld got signal 11;

it is a slave server replicating from a master. There were no reads from the server
 while it crashed. This has happened couple of times. we tried restoring
 mysql with a  backup of master. But it goes through fine for few days(2-3
 days) , after which the same error reiterates.

How to repeat:
No idea about how to repeat the error

mysqld error log file

Attachment: mysqld.err (text/plain), 201.14 KiB.

Thank you for a problem report. Please, send your my.cnf file content and describe your hardware.

Ravi,

do you have the .err log from the original crash? It would be nice to know what was the first error that crashed mysqld.

The .err log shows that some 16 kB data page is completely zeroed. It may be an index page. InnoDB does not complain when it reads in the page since InnoDB does accept a completely zeroed page as a legal format for an ibdata page on disk. But when InnoDB tries to use the page, it notices the problem.

Your log sequence number is about 9 GB. You have got corruption rather quickly in the database.

My first guess is that this is an OS/driver bug or bad hardware. You could run disk tests and memtests on the computer.

This may also be an InnoDB bug. To diagnose it, the original .err log would be useful.

I am assigning this bug report to Marko.

Regards,

Heikki

Hardware info:
-bash-2.05b$ /etc/bin/smbiosinfo
0000 BIOS: HP P51 12/02/2004
0100 System: HP ProLiant DL380 G4
0300 Chassis: HP Unknown GB85095Y0V       72U
0400 Processor: CPU Intel Xeon 3400MHz ZIF socket Populated, Enabled
0406 Processor: CPU Intel Xeon 0MHz ZIF socket Unpopulated, status Unknown
0710 Cache: Processor 1 Internal L1 Cache 384KB
0716 Cache: Processor 2 Internal L1 Cache 768KB
0720 Cache: Processor 1 Internal L2 Cache 1024KB
0726 Cache: Processor 2 Internal L2 Cache 769KB
0730 Cache: Processor 1 Internal L3 Cache 386KB
0736 Cache: Processor 2 Internal L3 Cache 770KB
1000 Memory Array: System board System memory
1100 Memory Device: 0x1000 1024MB DIMM DDR
1101 Memory Device: 0x1000 1024MB DIMM DDR

my.cnf:
[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
user            = mysql
skip-locking
set-variable    = key_buffer=32M
set-variable    = max_allowed_packet=8M
set-variable    = table_cache=256
set-variable    = sort_buffer=512K
set-variable    = net_buffer_length=8K
set-variable    = myisam_sort_buffer_size=1M
set-variable    = max_connect_errors=999999999
basedir         = /home/y
datadir         = /home/y/var/mysql/data
log-error       = /home/y/logs/mysql/mysqld.err
tmpdir          = /tmp/
pid-file        = /home/y/var/mysql/mysqld.pid
# Added by philip 2005-02-19
set-variable    = max_connections=500
set-variable    = wait_timeout=30
set-variable    = thread_cache_size=30

## avoid FreeBSD DNS problems
skip-name-resolve

## avoid "niced" "threads" on FreeBSD
skip-thread-priority

# Do not automaticall start slave
#skip-slave-start

## Uncomment these if this server will be a master.

log-bin
log-slave-updates
server-id       = 4     

## Slow Query Log
set-variable = long_query_time=1
log-slow-queries=slow.log

## InnoDB settings

innodb_data_home_dir = /home/y/var/mysql/ibdata
innodb_data_file_path = ibdata1:4792M;ibdata2:4096M;ibdata3:256M:autoextend
innodb_log_group_home_dir = /home/y/var/mysql/iblog
innodb_log_arch_dir = /home/y/var/mysql/iblog
set-variable = innodb_buffer_pool_size=1100M
set-variable = innodb_additional_mem_pool_size=50M
set-variable = innodb_log_file_size=800M
set-variable = innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=1
set-variable = innodb_lock_wait_timeout=50

Ravi,
do you have the complete error log from before 060113 04:27:32? It could give us some hints what might have gone wrong in the first place.

Since the complete error log is around 800 kb, but the interface for attching file has a limit of 200 kb.
If i can send it over mail, to which  mail id , i have to send?

Compressed complete err log file

Attachment: mysqld.err.gz (application/x-gzip, text), 21.29 KiB.

I cannot see any helpful hints in the complete error log.

When InnoDB creates the data files, it writes them full of zeroes. When a page is taken into use, it is initialized. Somehow, a B-tree node points to a page full of zeroes.

Usually, corruption is caused by faulty hardware or operating system drivers. If this is a bug in InnoDB (which I think is very unlikely for MySQL/InnoDB 4.0), it would be interesting to replay the replication logs from the scratch and see if the problem repeats.

Can you run memtest86 or a similar tool on the system?

Hi!

The .err is still not complete since it does not show the ibdata files being created etc.

The corruption appeared on Jan 11, two hours after a clean startup.

Marko is right, there are no helpful hints in the .err log. Is there some pattern that the corruption would appear at certain day time or under some specific workload? Does the corruption appear if you switch to a different computer?

Regards,

Heikki

"
060110 02:16:13  mysqld started
060110  2:16:15  InnoDB: Started
060110  2:16:15 Slave SQL thread initialized, starting replication in log 'db1-bin.020' at position 240884875, relay log './db2-relay-bin.006' position: 152095941
/home/y/libexec/mysqld: ready for connections.
Version: '4.0.26-Yahoo-SMP-log'  socket: '/tmp/mysql.sock'  port: 3306  Yahoo SMP
060110  2:16:15 Slave I/O thread: connected to master 'repl@db1.rep.ird.yahoo.com:3306',  replication started in log 'db1-bin.020' at position 240884875
060111  2:00:02 /home/y/libexec/mysqld: Normal shutdown

060111  2:00:02 Slave I/O thread killed while reading event
060111  2:00:02 Slave I/O thread exiting, read up to log 'db1-bin.020', position 261318178
060111  2:00:02 Error reading relay log event: slave SQL thread was killed
060111  2:00:02  InnoDB: Starting shutdown...
060111  2:00:06  InnoDB: Shutdown completed
060111  2:00:06 /home/y/libexec/mysqld: Shutdown Complete

060111 02:00:06  mysqld ended

060111 02:16:07  mysqld started
060111  2:16:08  InnoDB: Started
060111  2:16:08 Slave SQL thread initialized, starting replication in log 'db1-bin.020' at position 261318178, relay log './db2-relay-bin.007' position: 20433345
/home/y/libexec/mysqld: ready for connections.
Version: '4.0.26-Yahoo-SMP-log'  socket: '/tmp/mysql.sock'  port: 3306  Yahoo SMP
060111  2:16:08 Slave I/O thread: connected to master 'repl@db1.rep.ird.yahoo.com:3306',  replication started in log 'db1-bin.020' at position 261318178
InnoDB: Page directory corruption: supremum not pointed to
060111  4:27:24  InnoDB: Page dump in ascii and hex (16384 bytes):
"

There is n't any workload on the machine.It only replicates from  master. Today  server crashed repeatedly 
even after restarts saying assertion-failure, dumping out zero filled page. I had to kill the mysqld_safe script to stop it from running.   
060119  6:57:15InnoDB: Assertion failure in thread 20491 in file ../../innobase/page/../include/buf0buf.ic line 286
InnoDB: We intentionally generate a memory trap.

I then tried starting the server with slave-skip-start . Then it came up fine. when i started the slave, it again started crashing repeatedly. I tried running the query at which it is crashing manually(getting it from master using mysqlbinlog). It crashes and restarts , dumping out zero filled page.Running a check table on the table, insert into which is causing crash,  also crashes the server. It is the same table which is always getting corrupted. check table on all other tables returns status ok.
 
Running memtest , showed no errors.

Hi Heikki,
                  We take tar backup of the whole mysql data directory, and use this to restore the DB. While restoring, mysql directory is deleted and replaced with the untarred mysql directory. That is why, 
there is no ibdata files created .

Ravi,
I assume that mysqld is shut down while the backup is being created. Before starting the replication, could you run CHECK TABLE on all tables of the backed up tablespace? This would ensure that the tablespace is not corrupted in the first place. (A file backup taken while mysqld is running is likely to result in corrupted data files. If you prefer not to shut down mysqld while backing up, you should use InnoDB Hot Backup or mysqldump.)

Hi Marko,
                 yes ofcourse, the mysql server is shutdown before the data directory is tar zipped. Also , it is ensured that no mysql thread is still running , by doing a ps -auxww|grep mysqld , before starting backup. So I guess that the chances of backup data itself being corrupt is less. Even then, let me try running check table after restore , before starting replication.

Ravi,

please run CHECK TABLE on your tables, and if it reports corruption, please send the complete unedited .err log to us.

If the database is not huge, you could also test rebuilding it from scratch. Then we would see if the corruption appears again.

Regards,

Heikki

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".