MySQL Bugs: #418: InnoDB database of 45GB crashed and not recovered

Bug #418	InnoDB database of 45GB crashed and not recovered
Submitted:	12 May 2003 5:00	Modified:	16 Jun 2003 2:23
Reporter:	Vlad S	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	3.23.56	OS:	FreeBSD (FreeBSD 4.7)
Assigned to:	Heikki Tuuri	CPU Architecture:	Any

Description:
We have a large working database (innoDB,45GB). One day mysqld daemon  suddenly  crashed and didn't want to restart. Force recovering of each level (till 6)  doesn't help. Here is my log:

030511 23:11:11  mysqld restarted
030511 23:11:13  InnoDB: Database was not shut down normally.
InnoDB: Starting recovery from log files...
InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 6 1624370294
InnoDB: Doing recovery: scanned up to log sequence number 6 1625122879
InnoDB: Error: trying to access page number 4287956991 in space 0
InnoDB: which is outside the tablespace bounds.
InnoDB: Byte offset 0, len 16384, i/o type 10
030511 23:11:13  InnoDB: Assertion failure in thread 137920512 in file fil0fil.c line 1
171
InnoDB: We intentionally generate a memory trap.
InnoDB: Send a detailed bug report to mysql@lists.mysql.com
mysqld got signal 11;

How to repeat:

Suggested fix:
May be there is something in kernel settings.

some addition info about mysqld config:

key_buffer_size=50327552
record_buffer=2093056
sort_buffer=2097144
max_used_connections=0
max_connections=100
threads_connected=0
key_buffer_size + (record_buffer + sort_buffer)*max_connections = 458347 K
bytes of memory

What does it print if you put

set-variable=innodb_force_recovery=6

to the [mysqld] section of my.cnf?

In the printout below InnoDB does a log scan, and that should not happen if you have used the above my.cnf option at level 6.

The original problem probably is memory or file corruption caused by FreeBSD, or its drivers, or hardware.

It seems to me you are right about hardware or OS problem. I looked through previous log messages and saw the following:

InnoDB: Fatal error: cannot allocate 49152 bytes of
InnoDB: memory with malloc! Total allocated memory
InnoDB: by InnoDB 437553013 bytes. Operating system errno: 12
InnoDB: Cannot continue operation!
InnoDB: Check if you should increase the swap file or
InnoDB: ulimits of your operating system.
InnoDB: On FreeBSD check you have compiled the OS with
InnoDB: a big enough maximum process size.
InnoDB: We now intentionally generate a seg fault so that
InnoDB: on Linux we get a stack trace.
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose
the problem, but since we have already crashed, something is definitely
wrong
and this may fail

And after that mysqld can't start. If I put set-variable=innodb_force_recovery=6 to my.cnf file, i'v got the following:

030511 23:49:25  mysqld started
InnoDB: The user has set SRV_FORCE_NO_LOG_REDO on
InnoDB: Skipping log redo
InnoDB: Error: trying to access page number 4287956991 in space 0
InnoDB: which is outside the tablespace bounds.
InnoDB: Byte offset 0, len 16384, i/o type 10
030511 23:49:26  InnoDB: Assertion failure in thread 137920512 in file
fil0fil.c line 1171
InnoDB: We intentionally generate a memory trap.
InnoDB: Send a detailed bug report to mysql@lists.mysql.com
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose
the problem, but since we have already crashed, something is definitely
wrong
and this may fail

Here is the content of my.cnf:
[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
datadir=/data/db
default-character-set=cp1251
myisam-recover=BACKUP,FORCE
##skip-locking
set-variable    = max_connection=100
set-variable    = key_buffer=48M
set-variable    = max_allowed_packet=10M
set-variable    = tmp_table_size=16M
set-variable    = table_cache=999
set-variable    = sort_buffer=2M
set-variable    = record_buffer=2M
set-variable    = join_buffer_size=1M
set-variable    = thread_cache=16
# Try number of CPU's*2 for thread_concurrency
set-variable    = thread_concurrency=2
set-variable    = myisam_sort_buffer_size=64M
##log-bin
server-id       = 1

# Point the following paths to different dedicated disks
#tmpdir         = /tmp/
#log-update     = /path-to-dedicated-directory/hostname

# Uncomment the following if you are using BDB tables
skip-bdb
#set-variable   = bdb_cache_size=384M
#set-variable   = bdb_max_lock=100000

# Uncomment the following if you are using InnoDB tables
innodb_data_home_dir = /data/innodb/db/
innodb_data_file_path = ibdata1:2000M;ibdata2:2000M;ibdata3:2000M;ibdata4:2000M;ibdata5:2000M;ibdata6:2000M;ibdata7:2000M;ibdata8:2000M;ibdata9:2000M;ibdata10:2000M;ibdata11:2000M;ibdata12:2000M
#;ibdata13:2000M;ibdata14:2000M;ibdata15:2000M;ibdata16:2000M;ibdata17:2000M;ibdata18:2000M;ibdata19:2000M;ibdata20:2000M;ibdata21:2000M;ibdata22:2000M
innodb_log_group_home_dir = /data/innodb/log/
innodb_log_arch_dir = /data/innodb/log/
innodb_log_archive=0
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
set-variable = innodb_buffer_pool_size=350M
#set-variable = innodb_buffer_pool_size=300M
set-variable = innodb_additional_mem_pool_size=20M
# Set .._log_file_size to 25 % of buffer pool size
#set-variable = innodb_log_file_size=128M
set-variable = innodb_log_file_size=75M
set-variable = innodb_log_buffer_size=16M
set-variable = innodb_log_files_in_group=3
innodb_flush_log_at_trx_commit=1
#innodb_flush_log_at_trx_commit=0
set-variable = innodb_lock_wait_timeout=50
set-variable = innodb_file_io_threads=4
#for recovery
#set-variable = innodb_force_recovery=7

[mysqldump]
quick
set-variable    = max_allowed_packet=16M
[mysql]
user=mysql
basedir=/usr/local/mysql
#no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
set-variable    = key_buffer=256M
set-variable    = sort_buffer=256M
set-variable    = read_buffer=2M
set-variable    = write_buffer=2M

[myisamchk]
set-variable    = key_buffer=256M
set-variable    = sort_buffer=256M
set-variable    = read_buffer=2M
set-variable    = write_buffer=2M

[mysqlhotcopy]
interactive-timeout

So, we come to the following:
1.There was something about "maximum process size" in kernel settings. I looked through LINT kerenel config file and noticed MAXDSIZ,MAXSSIZ,DFLDSIZ and some settings about shared memory. Will the tuning of this kernel settings allow us to avoid such errors in the future?
2.Is there a possibility to recover our data after such corruption?

We are really appreciate any helpful information. We have to dump all our databases now because of risk to lost all data. This is very inconvient for us because of large amount of data.

Vlad,

making system memory limits bigger will let you avoid this type of assertion.

But the database should not become corrupt in a crash like this. As I said, I suspect a bug in FreeBSD, drivers, or hardware. An upgrade to the latest FreeBSD-4.7 version is recommended.

Does the operating system error log (I do not mean the MySQL .err log but the general OS log) say anything about disk driver errors etc.?

In this type of corruption the only way to recover is to obey the instructions in http://www.innodb.com/ibman.html#Forcing_recovery and hope for the best. Since InnoDB seems to crash in the log roll-forward, this is serious corruption, and a full recreate of the tablespace is recommended.

Regards,

Heikki

Hi!

I am closing this bug report, or more accurately, putting it to the 'Can't repeat' state.

Setting memory limits bigger in FreeBSD will help to avoid the crash.

The corruption was probably caused a bug in FreeBSD file i/o, or some unknown bug in InnoDB. MySQL-4.0.14 contains a fixed data page checksum calculation which might prove that this bug is in FreeBSD.

Regards,

Heikki