MySQL Bugs: #15717: Crash on doing update

Bug #15717	Crash on doing update - 4.1.12
Submitted:	13 Dec 2005 19:05	Modified:	17 Dec 2005 2:50
Reporter:	Renato Weiner	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	4.1.12-3.RHEL4.1	OS:	Linux (Linux Redhat AS4.0)
Assigned to:	Heikki Tuuri	CPU Architecture:	Any

Description:
I have a table with 700.000 lines on InnoDB. The system is very loaded (400 queries/s). I did an update on a varchar(25) field on the master and the update went to the 4.1 slave and it crashed (InnoDB was then unrecoverable). The update was also sent to a 4.0 slave, and there it worked. 

Something like:

update table set field="" where field="0"

051213 11:02:04  mysqld started
051213 11:02:04  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
051213 11:02:04  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 4 1704773408.
InnoDB: Doing recovery: scanned up to log sequence number 4 1704773120
051213 11:02:04  InnoDB: ERROR: We were only able to scan the log up to
InnoDB: 4 1704773120, but a checkpoint was at 4 1704773408.
InnoDB: It is possible that the database is now corrupt!
051213 11:02:04  InnoDB: Error: page 2 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt.
051213 11:02:04  InnoDB: Error: page 5517 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 50 row operations to undo
InnoDB: Trx id counter is 1 342382336
051213 11:02:04  InnoDB: Error: page 0 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt.
InnoDB: Starting rollback of uncommitted transactions
InnoDB: Rolling back trx with id 1 342380545, 50 rows to undo051213 11:02:04InnoDB: Assertion failure in thread 16384 in file ../include/buf0buf.ic line 191
InnoDB: Failing assertion: block->state == BUF_BLOCK_FILE_PAGE
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html
InnoDB: about forcing recovery.
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=0
read_buffer_size=8384512
max_used_connections=0
max_connections=500
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3700780 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=(nil)
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfffe2a8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x80ab010 handle_segfault + 416
0x82e4aac __pthread_sighandler + 116
0x81ff1fe buf_block_get_page_no + 270
0x81fbf0f trx_undo_truncate_end + 943
0x81effb8 trx_roll_try_truncate + 104
0x81f0b87 trx_roll_pop_top_rec_of_trx + 1767
0x81b44b0 row_undo + 800
0x81b45fd row_undo_step + 45
0x818f401 que_thr_step + 577
0x818ef2b que_run_threads + 43
0x81ef655 trx_rollback_or_clean_all_without_sess + 677
0x821d7f6 recv_recovery_from_checkpoint_finish + 198
0x81760c5 innobase_start_or_create_for_mysql + 2757
0x812ee0f _Z13innobase_initv + 623
0x8126615 _Z7ha_initv + 37
0x80ac39e _Z22init_server_componentsv + 398
0x80ac939 main + 377
0x82eb190 __libc_start_main + 492
0x8048121 _start + 33
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it 
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
051213 11:02:04  mysqld ended

How to repeat:
After the database got unrecoverable, I had to redo the replication. This query worked on a MySQL 4.0

Renato,

please post the complete, unedited .err log from the crashed slave.

Please also post the my.cnf of the crashed slave.

The printout suggest that this may also be a hardware problem. The InnoDB log files were not consistent.

Regards,

Heikki

Please, provide the information requested by Heikki.

This is the latest crash before I setup the replication back:

051213 11:30:54  mysqld started
051213 11:30:54  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer... 
051213 11:30:54  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 4 1704773408.
InnoDB: Doing recovery: scanned up to log sequence number 4 1704773120
051213 11:30:54  InnoDB: ERROR: We were only able to scan the log up to
InnoDB: 4 1704773120, but a checkpoint was at 4 1704773408.
InnoDB: It is possible that the database is now corrupt!
051213 11:30:54  InnoDB: Error: page 2 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See 
http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.
051213 11:30:54  InnoDB: Error: page 5517 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 50 row operations to undo
InnoDB: Trx id counter is 1 342382336
051213 11:30:54  InnoDB: Error: page 0 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 1374910, file name ib_logbin1.1676
InnoDB: Starting in background the rollback of uncommitted transactions
051213 11:30:54  InnoDB: Rolling back trx with id 1 342380545, 50 rows to undo
051213 11:30:54InnoDB: Assertion failure in thread 81926 in file ../include/buf0buf.ic line 188
InnoDB: Failing assertion: block->state == BUF_BLOCK_FILE_PAGE
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html
InnoDB: about forcing recovery.
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=0
read_buffer_size=8384512
max_used_connections=0
max_connections=500
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3700780 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=(nil)
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbeffec58, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x80c1600
0x832142c
0x824758f
0x824489a
0x823a2c8
0x823ae8d
0x8204c80
0x8204dcd
0x81e2fb1
0x81e2d3b
0x8239932
0x831cd11
0x834a38a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do 
resolve it
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
InnoDB: Thread 16384 stopped in file os0sync.c line 501
051213 11:30:54  mysqld ended

my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
server-id=9
old_passwords=1
set-variable    = back_log=200
set-variable    = wait_timeout=15
set-variable    = max_connections=500
set-variable    = table_cache=5000
set-variable    = thread_cache_size=128
#set-variable    = innodb_force_recovery=6
#set-variable    = delayed_insert_limit=30
#set-variable    = delayed_insert_timeout=30
#set-variable    = delayed_queue_size=30
set-variable    = max_heap_table_size=2097152
set-variable    = key_buffer=256M
set-variable    = sort_buffer=32M
set-variable    = record_buffer=8M
set-variable    = max_connect_errors=10000
set-variable    = long_query_time=10
#set-variable    = query_cache_size=64M
#set-variable    = query_cache_type=2
set-variable    = max_allowed_packet=32M

set-variable = max_relay_log_size=19M

innodb_data_home_dir=/var/lib/mysql
innodb_data_file_path=ibdata1:200M:autoextend:max:2000M

set-variable = innodb_buffer_pool_size=50M
set-variable = innodb_additional_mem_pool_size=10M

[mysql.server]
user=mysql
#basedir=/var/lib

[safe_mysqld]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

Renato,

do you have the .err log from the time the slave crashed?

The log you have posted just shows a failed crash recovery.

Regards,

Heikki

These are the first lines. I´m not sure if at the moment of the crash it wrote something on the log.

051213 11:21:04 mysqld started
^G/usr/sbin/mysqld: File '/var/log/mysql/perf_mysql.log' not found (Errcode: 13)
051213 11:21:04 [ERROR] Could not use /var/log/mysql/perf_mysql.log for logging (error 13). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
051213 11:21:04 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
051213 11:21:04 InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 4 1704773408.
InnoDB: Doing recovery: scanned up to log sequence number 4 1704773120
051213 11:21:04 InnoDB: ERROR: We were only able to scan the log up to
InnoDB: 4 1704773120, but a checkpoint was at 4 1704773408.
InnoDB: It is possible that the database is now corrupt!
051213 11:21:04 InnoDB: Error: page 2 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.
051213 11:21:04 InnoDB: Error: page 5517 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 50 row operations to undo
InnoDB: Trx id counter is 1 342382336
051213 11:21:04 InnoDB: Error: page 0 log sequence number 4 1716198907
InnoDB: is in the future! Current system log sequence number 4 1704773408.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
http://dev.mysql.com/doc/mysql/en/backing-up.html for more information.
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 1374910, file name ib_logbin1.1676
InnoDB: Starting in background the rollback of uncommitted transactions
051213 11:21:04 InnoDB: Rolling back trx with id 1 342380545, 50 rows to undo
051213 11:21:04InnoDB: Assertion failure in thread 81926 in file ../include/buf0buf.ic line 188
InnoDB: Failing assertion: block->state == BUF_BLOCK_FILE_PAGE
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html
InnoDB: about forcing recovery.
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=0
read_buffer_size=8384512
max_used_connections=0
max_connections=500
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3700780 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=(nil)
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbeffec58, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x80c1600
0x832142c
0x824758f
0x824489a
0x823a2c8
0x823ae8d
0x8204c80
0x8204dcd
0x81e2fb1
0x81e2d3b
0x8239932
0x831cd11
0x834a38a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
InnoDB: Thread 16384 stopped in file os0sync.c line 501
051213 11:21:04 mysqld ended

And then only crash recovery data...

Please, try to resolve that stack trace and send the results.

Here it goes.

0x80c1600 _Z21mysql_execute_commandP3THD + 13312
0x832142c _IO_vfscanf + 3168
0x824758f lock_rec_queue_validate + 2671
0x824489a lock_table_dequeue + 202
0x823a2c8 fil_load_single_table_tablespace + 840
0x823ae8d fil_space_for_table_exists_in_mem + 989
0x8204c80 buf_validate + 1504
0x8204dcd buf_validate + 1837
0x81e2fb1 btr_search_drop_page_hash_index + 689
0x81e2d3b btr_search_drop_page_hash_index + 59
0x8239932 fil_reset_too_high_lsns + 434
0x831cd11 vfwprintf + 313
0x834a38a field_separator + 34290

Hi!

The resolved stack trace is not sensible. But it would not help anything, because the real bug is that the checkpoint is at least 12 MB behind what was the log sequence number when InnoDB crashed.

You had the default two ib_lofiles, 5 MB each. It is no wonder how it was not able to scan from the checkpoint!

This failure to recover may be caused by:

1) a hardware fault or OS bug that caused ib_logfile to lag behind what it should have contained,

or

2) a MySQL/InnoDB bug that caused MySQL to crash, and another InnoDB bug that caused the checkpoint to lag behind.

I think 1) is much more probable than 2).

I am putting this bug report to the status 'Can't repeat'. You should run a memtest as well as disk tests on your computer.

Regards,

Heikki