MySQL Bugs: #30117: mysqld segfaults when loading from a corrupt mysqldump

Bug #30117	mysqld segfaults when loading from a corrupt mysqldump
Submitted:	28 Jul 2007 20:58	Modified:	29 Jul 2007 21:02
Reporter:	Joseph Fisk	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Backup	Severity:	S2 (Serious)
Version:	5.0.22	OS:	Linux (Ubuntu 6.06 LTS AMD64)
Assigned to:		CPU Architecture:	Any
Tags:	mysqldump, segfault

Description:
When loading a mysqldump generated SQL file, mysqld segfaults with the following error:

mysql[6490]: segfault at 00002aababd02000 rip 00002aaaab92b966 rsp 00007fffffd62e08 error 6

Examination of the input file shows some corruption at the location where mysqld crashed.

How to repeat:
Use mysqldump Ver 10.9 Distrib 4.1.12, for pc-linux-gnu (i486), with the following commandline options:

mysqldump -c -Q --max_allowed_packet=1M --insert-ignore --extended-insert=false

The database contains about 140 million rows, resulting in a dump file of about 22GB.

Load it in the normal way:  mysql -u user -p dbname < dumpfile.sql

Since corruption is present in more than one place, I see more than one type of error.  In some cases, mysql correctly indicates that there is an SQL syntax error.  In other instances, mysql segfaults.

Suggested fix:
mysql should report a syntax error instead of segfaulting.

I tried snipping just the corrupted area from the file, but the segfault no longer happens.  I can provide the entire dump on request.

In the interest of providing a more complete picture:

I used mysqldump (Ver 10.9 Distrib 4.1.12, for pc-linux-gnu (i486)) to dump a database from mysql 4.1.12 (4.1.12-Debian_1ubuntu3.1-log) with default options.  This generated a dump of about 5GB.

When I tried to load that dump into mysql 5.0.22 (5.0.22-Debian_0ubuntu6.06.3-log), I received the following error:

If I then tried to load the dump again, without first dropping and recreating the database, the machine would lock hard (cease responding to ping, and I would have to use the reset button).

It's unclear whether the hard lockup is a result of a bug in mysql leading to a kernel bug, or just a standalone kernel bug.  The kernel version is 2.6.15-28-amd64-server.

I can reproduce the same error and hard lockup on two different machines, each running the same mysql and kernel versions.

I can successfully load the dump on a different machine running mysql 4.1.12-Debian_1ubuntu3.1-log.  I am not aware of any corruption in the database.  The dump generated by mysqldump after loading into 4.1.12 appears to be identical to the dump that was loaded, and either dump will cause the crash on the 5.0.22 systems.

I forgot to include the error message.  Here it is:

ERROR 1153 (08S01) at line 553: Got a packet bigger than 'max_allowed_packet' bytes

I increased max_allowed_packet in steps, and the same error recurred until max_allowed_packet reached 1GB.  Then the error changed to:

ERROR 1064 (42000) at line 553: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '083,0.01,0.081,0.001,0.018,2,2,'Default','NoChange',0.92,0.052,300.349,1,NULL,NU' at line 2

This is a reproducible behavior.

Next, in an effort to stop mysqldump from producing corrupt output, I ran it with options (shown in "how to reproduce" above) to generate one INSERT statement per table row.  However, the corruption is still present, and worse, it leads to the segfault that led to this bug report.

I believe this to be a genuine, legitimate bug report, but the real issue for me is that I have no apparent way to load my database into mysql 5!

A potential workaround to the apparent bug in mysqldump (admittedly, an old version) would be to use SELECT .. INTO OUTFILE.

So I did just that, and then used LOAD DATA INFILE ...

The table appeared to load successfully, with no errors or warnings.  However, when trying to mysqldump what appeared to be a successful import of the whole database, mysql crashes.  Here is the crash and restart from syslog, with the hex/ascii page dump omitted:

==========
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: Database page corruption on disk or a failed
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: file read of page 156932.
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: You may have to recover from a backup.
Jul 28 20:18:43 murray mysqld[7891]: 070728 20:18:43  InnoDB: Page dump in ascii and hex (16384 bytes):
[dump omitted]
Jul 28 20:18:43 murray mysqld[7891]: 070728 20:18:43  InnoDB: Page checksum 174021137, prior-to-4.0.14-form checksum 373593286
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: stored checksum 3294372564, prior-to-4.0.14-form stored checksum 373593286
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: Page lsn 19 2492227009, low 4 bytes of lsn at page end 2492227009
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: Page number (if stored to page already) 156932,
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: Page may be an index page where index id is 0 825
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: (index PRIMARY of table CGHDevel/CloneData)
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: Database page corruption on disk or a failed
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: file read of page 156932.
[help text omitted]
Jul 28 20:18:43 murray mysqld[7891]: InnoDB: Ending processing because of a corrupt database page.
Jul 28 20:18:43 murray mysqld_safe[7932]: Number of processes running now: 0
Jul 28 20:18:43 murray mysqld_safe[7934]: restarted
Jul 28 20:18:45 murray mysqld[7937]: 070728 20:18:45  InnoDB: Database was not shut down normally!
Jul 28 20:18:45 murray mysqld[7937]: InnoDB: Starting crash recovery.
Jul 28 20:18:45 murray mysqld[7937]: InnoDB: Reading tablespace information from the .ibd files...
Jul 28 20:18:45 murray mysqld[7937]: InnoDB: Restoring possible half-written data pages from the doublewrite
Jul 28 20:18:45 murray mysqld[7937]: InnoDB: buffer...
Jul 28 20:18:45 murray mysqld[7937]: 070728 20:18:45  InnoDB: Starting log scan based on checkpoint at
Jul 28 20:18:45 murray mysqld[7937]: InnoDB: log sequence number 22 2821793972.
Jul 28 20:18:45 murray mysqld[7937]: InnoDB: Doing recovery: scanned up to log sequence number 22 2821793972
Jul 28 20:18:46 murray mysqld[7937]: InnoDB: Last MySQL binlog file position 0 180836678, file name /var/log/mysql/mysql-bin.000521
Jul 28 20:18:46 murray mysqld[7937]: 070728 20:18:46  InnoDB: Started; log sequence number 22 2821793972
Jul 28 20:18:46 murray mysqld[7937]: 070728 20:18:46 [Note] Recovering after a crash using /var/log/mysql/mysql-bin
Jul 28 20:18:46 murray mysqld[7937]: 070728 20:18:46 [Note] Starting crash recovery...
Jul 28 20:18:46 murray mysqld[7937]: 070728 20:18:46 [Note] Crash recovery finished.
Jul 28 20:18:46 murray mysqld[7937]: 070728 20:18:46 [Note] /usr/sbin/mysqld: ready for connections.
Jul 28 20:18:46 murray mysqld[7937]: Version: '5.0.22-Debian_0ubuntu6.06.3-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Debian Etch distribution
==========

Here is what mysqldump tells me:

joe@murray:~$ mysqldump -c -Q --max_allowed_packet=1M --insert-ignore --extended-insert=false -u root CGHDevel CloneData > clonedata.sql
mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `CloneData` at row: 10442812

It turns out that accessing any of an entire range of rows will cause a server crash (notice value of CloneID):

mysql> select CaseID from CloneData where CaseID=12762 and CloneID=1762;
+--------+
| CaseID |
+--------+
|  12762 | 
+--------+
1 row in set (0.00 sec)

mysql> select CaseID from CloneData where CaseID=12762 and CloneID=1763;
ERROR 2013 (HY000): Lost connection to MySQL server during query

mysql> select CaseID from CloneData where CaseID=12762 and CloneID=1928;
ERROR 2013 (HY000): Lost connection to MySQL server during query

mysql> select CaseID from CloneData where CaseID=12762 and CloneID=1929;
+--------+
| CaseID |
+--------+
|  12762 | 
+--------+
1 row in set (0.12 sec)

Running CHECK TABLE .. EXTENDED will also crash the server.  However, CHECK TABLE .. EXTENDED on 4.1.12 says the table is OK.  Likewise, the above SELECT statements execute without error, and nothing appears amiss with the data returned, nor with the data in the TSV file.  Should this be filed as a different bug?

More and more this appears to be either a kernel bug or some sort of hardware issue.