MySQL Bugs: #18410: InnoDB: Assertion failure in thread 1517779888 in file btr0cur.c line 3568

Bug #18410	InnoDB: Assertion failure in thread 1517779888 in file btr0cur.c line 3568
Submitted:	22 Mar 2006 2:39	Modified:	29 Apr 2006 11:43
Reporter:	K Hopkins	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	4.1.18	OS:	Linux (Linux)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
060322 11:34:54  mysqld started
060322 11:34:55  InnoDB: Started; log sequence number 6 874818451
/usr/sbin/mysqld-max: ready for connections.
Version: '4.1.18-Max-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edi         tion - Max (GPL)
InnoDB: Error: page n:o stored in the page read in is 497458, should be 497455!
InnoDB: Error: page n:o stored in the page read in is 497459, should be 497456!
InnoDB: Error: page n:o stored in the page read in is 497460, should be 497457!
InnoDB: Error: page n:o stored in the page read in is 497461, should be 497458!
InnoDB: Error: page n:o stored in the page read in is 497462, should be 497459!
InnoDB: Error: page n:o stored in the page read in is 497463, should be 497460!
InnoDB: Error: page n:o stored in the page read in is 497464, should be 497461!
InnoDB: Error: page n:o stored in the page read in is 497465, should be 497462!
InnoDB: Error: page n:o stored in the page read in is 497466, should be 497463!
InnoDB: Error: page n:o stored in the page read in is 497467, should be 497464!
060322 11:57:07InnoDB: Assertion failure in thread 1517779888 in file btr0cur.c line 3568
InnoDB: Failing assertion: copied_len == local_len + extern_len
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/mysql/en/Forcing_recovery.html
InnoDB: about forcing recovery.
InnoDB: Thread 1422703536 stopped in file srv0srv.c line 576
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=67108864
read_buffer_size=1044480
max_used_connections=2
max_connections=100
threads_connected=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 577135 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x8aeaef0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x5a775bac, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x815b8f0
0xffffe420
0x431a9028
0x82f463d
0x82f70a2
0x81e8f01
0x81e91a3
0x81d526f
0x81af808
0x816feaa
0x817476d
0x816dcb5
0x816d926
0x816d0fe
0x4005eaa7
0x4018fc2e
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x8b15fd8 = UPDATE dbmail_messageblks SET is_header = '0'
thd->thread_id=2
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
060322 11:57:07  mysqld restarted
060322 11:57:07  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
060322 11:57:08  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 7 188038387.
InnoDB: Doing recovery: scanned up to log sequence number 7 188038387
InnoDB: Last MySQL binlog file position 0 5788, file name ./aptemip1-bin.000004
060322 11:57:09  InnoDB: Flushing modified pages from the buffer pool...
060322 11:57:09  InnoDB: Started; log sequence number 7 188038387
/usr/sbin/mysqld-max: ready for connections.
Version: '4.1.18-Max-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edition - Max (GPL)

# resolve_stack_dump -s /usr/lib/mysql/mysqld.sym -n backtrace.txt
0x815b8f0 _Z9frm_erroriP8st_tablePKcii + 608
0xffffe420 _end + -139391344
0x431a9028 _end + 986430104
0x82f463d mi_base_info_write + 173
0x82f70a2 _mi_seq_search + 322
0x81e8f01 _ZNK8Geometry5get_xEPd + 1
0x81e91a3 _Z11init_fieldsP3THDP13st_table_listP13st_find_fieldj + 83
0x81d526f _Z16init_master_infoP14st_master_infoPKcS2_bi + 255
0x81af808 _ZN11ha_innobase6createEPKcP8st_tableP24st_ha_create_information + 104
0x816feaa _Z16change_group_refP3THDP9Item_funcP8st_orderPb + 266
0x817476d _Z12write_recordP8st_tableP12st_copy_info + 221
0x816dcb5 _Z15join_init_cacheP3THDP13st_join_tablej + 629
0x816d926 _Z26remove_dup_with_hash_indexP3THDP8st_tablejPP5FieldmP4Item + 774
0x816d0fe _Z17create_sort_indexP3THDP4JOINP8st_ordermm + 622
0x4005eaa7 _end + 934745367
0x4018fc2e _end + 935995038

How to repeat:
stop MySQL
Recover database from backup.
start MySQL
verify data OK. (mysqlcheck -e --auto-repair dbmail -p, returns all OK)
re-run dbmail migration script (SQL)
**Crash occurs 
reverify data:
# mysqlcheck -e --auto-repair dbmail -p
Enter password:
dbmail.dbmail_acl                                  OK
dbmail.dbmail_aliases                              OK
dbmail.dbmail_mailboxes                            OK
dbmail.dbmail_messageblks                          OK
dbmail.dbmail_messages                             OK
dbmail.dbmail_physmessage                          OK
dbmail.dbmail_subscription                         OK
dbmail.dbmail_users                                OK
dbmail.tmpmessage                                  OK
dbmail.tmpmessageblk                               OK

---Last two commands were:
ALTER TABLE dbmail_messageblks
        DROP INDEX messageblk_idnr,
        DROP INDEX messageblk_idnr_2,
        DROP INDEX msg_index,
        CHANGE message_idnr physmessage_id bigint(21) NOT NULL DEFAULT '0',
        ADD COLUMN is_header tinyint(1) DEFAULT '0' NOT NULL,
        ADD INDEX physmessage_id_index (physmessage_id),
        ADD INDEX physmessage_id_is_header_index (physmessage_id, is_header),
        ADD FOREIGN KEY physmessage_id_fk (physmessage_id)
                REFERENCES dbmail_physmessage (id)
                ON DELETE CASCADE;
UPDATE dbmail_messageblks SET is_header = '0';

---(after crash)
mysql> describe dbmail_messageblks;  
+-----------------+------------+------+-----+---------+----------------+
| Field           | Type       | Null | Key | Default | Extra          |
+-----------------+------------+------+-----+---------+----------------+
| messageblk_idnr | bigint(21) |      | PRI | NULL    | auto_increment |
| physmessage_id  | bigint(21) |      | MUL | 0       |                |
| messageblk      | longtext   |      |     |         |                |
| blocksize       | bigint(21) |      |     | 0       |                |
| is_header       | tinyint(1) |      |     | 0       |                |
+-----------------+------------+------+-----+---------+----------------+
5 rows in set (0.03 sec)

---/etc/my.cnf
[client]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
[mysqld]
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-locking
key_buffer = 256M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 16M
thread_concurrency = 8
log-bin
server-id       = 1
innodb_data_home_dir = /var/lib/mysql/InnoDB
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /var/lib/mysql/InnoDB/logs/
innodb_log_arch_dir = /var/lib/mysql/InnoDB/logs/archive/
innodb_buffer_pool_size = 256M
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 20M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout

--
I attempted the migration with several times, with different revisions of the database, each created during a cold backup (stop MySQL, tar whole mysql dir), all with the similar results.  The master copy of the db is still in production on another machine, and functions fine.

Hi!

What migration did you attempt? How did you copy the files to another computer?

What is your Linux distro?

Looks like a portion of the data file has been written to a file address 48 kB too small.

My first guess is an OS bug or hardware failure.

Regards,

Heikki

Hi Heikki!

  Thanks for the feedback.  

> What migration did you attempt? How did you copy the files to another computer?
  
  I'm migrating a dbmail 1.x database to 2.0 format, both on InnoDB.  It's done with a SQL script, which alters tables and initializes new fields.  The source table, messageblks has a length of 4169940992, and is by far the largest table in the database.
  To copy the files, I shut down MySQL, tar -cz the entire mysql directory, scp it over to a HP workstation (which is bigger/faster & has more free disk than the laptop that the db normally runs on), where I untar it into an empty /var/lib/mysql directory.  The tarred file is about 3.5GB.  /etc/my.cnf has been updated on the HP workstation for the new database.  I then start MySQL.  v4.1.18 is installed from the same rpms on both hosts.  Only one instance of MySQL is running on either host.

>What is your Linux distro?

  SuSE Pro Linux 9.3, up-to-date patches.

>Looks like a portion of the data file has been written to a file address 48 kB
too small.

  Does that mean the database is corrupt before I start?  Would mysqlcheck catch this?  What can?
  
> My first guess is an OS bug or hardware failure.

  I can run memtest over the weekend (as it does double duty as a production server too), but the HP workstation is working perfectly otherwise.  
  Is there any benefit to run the migration several times to see if it aborts in the same place each time?  I'd expect hardware failures to occur in random parts of the code.
  It is sitting on a reiserfs over lvm over md/raid1 over PATA disks.  I've often used reiserfs/lvm/md/SuSE9.3 elsewhere without problems (but only with lightweight MySQL databases).

  Are any alarm bells ringing from what I've described?

Regards,
  Keith

Keith,

the interesting question is whether the file corruption happens in the file copying phase, or whether it is the original ibdata files that are corrupt.

InnoDB's ibdata files form one logical tablespace. You seem to have corruption in a 16 kB page starting at offset 8,150,302,720 bytes from the start of the tablespace, and in a few subsequent pages.

Please use dd or some similar tool to copy a few hundred bytes from that offset on in the original database, so that I can check if the page number is right in the original database. You can email the bytes to heikki.tuuri@oracle.com

No bug has ever been found in InnoDB that could explain the file corruption cases that have been reported from Linux in the past 5 years. I recall in one case, a page had been shifted 4 kB or 8 kB from the right position.

What are your LVM configuration parameters? Can a 48 kB shift somehow be connected to the parameter values?

You should test the disk subsystem of your computer.

Regards,

Heikki

/* The byte offsets on a file page for various variables */
#define FIL_PAGE_SPACE_OR_CHKSUM 0      /* in < MySQL-4.0.14 space id the
                                        page belongs to (== 0) but in later
                                        versions the 'new' checksum of the
                                        page */
#define FIL_PAGE_OFFSET         4       /* page offset inside space */
#define FIL_PAGE_PREV           8       /* if there is a 'natural' predecessor
                                        of the page, its offset */
#define FIL_PAGE_NEXT           12      /* if there is a 'natural' successor
                                        of the page, its offset */
#define FIL_PAGE_LSN            16      /* lsn of the end of the newest
                                        modification log record to the page */
#define FIL_PAGE_TYPE           24      /* file page type: FIL_PAGE_INDEX,...,
                                        2 bytes */
#define FIL_PAGE_FILE_FLUSH_LSN 26      /* this is only defined for the
                                        first page in a data file: the file
                                        has been flushed to disk at least up
                                        to this lsn */
#define FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID  34 /* starting from 4.1.x this
                                        contains the space id of the page */
#define FIL_PAGE_DATA           38      /* start of the data on the page */

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".