MySQL Bugs: #41571: MySQL segfaults after innodb recovery

Bug #41571	MySQL segfaults after innodb recovery
Submitted:	18 Dec 2008 0:15	Modified:	18 Jun 2010 22:57
Reporter:	Thomas Keller	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.0.32-7etch8, r5.0.67, 5.0.74	OS:	Any (Debian 4.0)
Assigned to:		CPU Architecture:	Any

Description:
(Same bug also exists in 5.0.67 from 5.0.67-0.dotdeb.1)

Mysql crashed for us tonight and a couple of data needed to be replayed (innodb). Mysql constantly segfaulted during this process. I've ran it in gdb to get a basic backtrace. The interesting piece is that it just tells me that the recovery has been finished just to segfault finally:

[New Thread 1082132832 (LWP 19012)]
[New Thread 1090525536 (LWP 19013)]
[New Thread 1098918240 (LWP 19014)]
[New Thread 1107310944 (LWP 19015)]
081217 21:44:03  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
081217 21:44:03  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 3 1888274402.
InnoDB: Doing recovery: scanned up to log sequence number 3 1888277687
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 6 row operations to undo
InnoDB: Trx id counter is 0 196608
081217 21:44:03  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 262363, file name /var/log/mysql/mysql-bin.000005
[New Thread 1115703648 (LWP 19016)]
InnoDB: Starting in background the rollback of uncommitted transactions
081217 21:44:04  InnoDB: Rolling back trx with id 0 196173, 6 rows to undo
[New Thread 1124096352 (LWP 19017)]
[New Thread 1132489056 (LWP 19018)]
[New Thread 1140881760 (LWP 19019)]
081217 21:44:04  InnoDB: Started; log sequence number 3 1888277687
081217 21:44:04 [Note] Recovering after a crash using /var/log/mysql/mysql-bin
081217 21:44:04 [Note] Starting crash recovery...
081217 21:44:04 [Note] Crash recovery finished.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1115703648 (LWP 19016)]
0x000000000079a3a4 in rec_set_nth_field_sql_null ()
(gdb) bt
#0  0x000000000079a3a4 in rec_set_nth_field_sql_null ()
#1  0x00000000006ff726 in row_upd_rec_in_place ()
#2  0x000000000071b7f0 in btr_cur_update_in_place ()
#3  0x00000000007233b3 in btr_cur_optimistic_update ()
#4  0x0000000000705197 in row_undo_ins ()
#5  0x0000000000705f75 in row_undo_mod ()
#6  0x00000000006feb40 in row_undo_step ()
#7  0x00000000006dbcbd in que_run_threads ()
#8  0x0000000000739cb1 in trx_rollback_or_clean_all_without_sess ()
#9  0x00002ab3a11d4f1a in start_thread () from /lib/libpthread.so.0
#10 0x00002ab3a1a875d2 in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()

The original crash occured with 5.0.67 with the kind-of unsupported mysql builds for debian-based machines from http://dotdeb.org. We initially thought that these binaries there were the cause for the misbehaviour and rolled back to the Debian 4.0 default binaries (5.0.32), which showed the same behaviour.

Since MySQL could not be started at all and being not a MySQL expert per se when it comes to crash recovery, I looked for a way to skip the recovery / replay of these missing data and only found the "innodb_force_recovery" variable which I set to 4. With this value the recovery apparently ran through and MySQL started as it should, however, none of the tables in the database were usable (mysql hinted me to repair them, just to tell me afterwards that REPAIR TABLE is not supported for innodb).

After a lot of struggeling we've physically removed the ibdata* and ib_logfile* files and table definitions and imported an old dump. The original (corrupted) files are still available.

How to repeat:
Crash on each start of /usr/sbin/mysqld

Thank you for the bug report. Are you tried our latest tarball package release?. Thanks in advance.

No, not yet. We've encountered this problem tonight on a production system, we could however copy the data over to a test system, try to reproduce the problem there and optionally compile a tarball release there.

Can you point me to the tarball release I should try out? The 5.0.67 one or the new GA 5.1.x? Are there any special build options I should apply (maybe also something which deactivates unused things to speed up the build process)?

Thanks in advance,
Thomas.

Thank you for the feedback.

Is better if you use 5.0.67 tarball to avoid upgrading issues. No special options so far.

1. can we get complete mysqld error log and maybe even the coredump and/or stack backtrace
from the original crash:

>Mysql crashed for us tonight

2. before that original crash, had you upgrade MySQL by any chance?

3. any information about your storage configuration you could share
as far as data and log file location is concerned.

it the original crash analyzing which is both interesting and useful.
The subsequent events are a bit more clear: crash recovery completes,
then InnoDB starts rolling back unfinished txns which is done in the background.
The database is already corrupted beyond repair, apparently, so the rollback 
fails consistently. Of course an assert would be better than SEGV, however
looking at all the rollback failures would be only of marginal help.

So the status is 'Need Feedback' but I'm still assigning it to Marko 
so he could try to make sense out of the coredump.

It looks like InnoDB successfully applied the redo log, that is, restored the unwritten changes to all data files. After that, the database is ready for connections, but InnoDB will still have to roll back incomplete transactions (which were active at the time of the previous crash). The crash occurs during the rollback. It should be repeatable on those data files.

Please back up these data files until this bug has been closed, so that we can debug this problem. We need a repeatable test case, and now the test case is your data files.

Unfortunately, the stack trace does not help us much, because it does not contain function parameters or local variables. Could you please compile MySQL from the source, adding -g to the CFLAGS, and possibly removing -O3? Then start mysqld under gdb and post the stack trace of the crashing thread (bt full). While you are at it, you could also define UNIV_DEBUG in storage/innobase/univ.i and see if an ut_ad() assertion will fail somewhere earlier.

In rec_set_nth_field_sql_null(), I would like to see the contents of rec, and I would like to see the dict_index_t* of the record, which could be visible in some upper stack frame. The function should only be invoked on ROW_FORMAT=REDUNDANT tables. If you never passed this attribute to CREATE TABLE or ALTER TABLE, such tables would include old tables created before MySQL 5.0.3 and the internal InnoDB data dictionary tables.

In gdb, please go to the stack frame of rec_set_nth_field_sql_null() and execute these commands:

p/x rec[-6-2*n]@106+2*n
up 2
p *index

The command "up 2" is for selecting the stack frame of btr_cur_optimistic_update(), where the local variable "index" is assigned from "cursor->index". If the parameter "cursor" is no longer available, the local variable might be.

some more debug info from 5.0.67

Attachment: bug41571_gdb_5.0.67_info.txt (text/plain), 7.22 KiB.

Thanks, Shane.  Unfortunately, I couldn't find any valid-looking dict_index_t pointer in the output. I'll try to interpret the record.

If possible, please try to run the debug binaries on the faulty data files.

Do you have the complete .err log? It is crucial to know why InnoDB crashed in the first place. Is the hardware ok?

@Heikki: Sorry, no, I've forgot to backup mysql.err that day in the rush. All I have left is the complete /var/lib/mysql directory with ibdata1, ib_logfile0, ib_logfile1 and the *.frm files of the two deployed InnoDB databases.

@Michael:

> 1. can we get complete mysqld error log and maybe even the coredump and/or 
> stack backtrace from the original crash:

Sorry, I don't have this available. Problem is that it was not me who experienced the crash, but somebody else and he was not aware of the procedures.

> 2. before that original crash, had you upgrade MySQL by any chance?

We eventually upgraded from 5.0.32 to 5.0.67 a few weeks before, but the databases were not in place during the upgrade, i.e. have been created afterwards.

> 3. any information about your storage configuration you could share
> as far as data and log file location is concerned.

Simple SCSI storage, ext3 filesystem. No special, fancy things here.

MySQL recovery crashlog

Attachment: mysql.err (application/octet-stream, text), 3.49 KiB.

Sorry that it took rather long to get you a reasonable crash log; the above one comes from a stock 5.0.67 installation, compiled without -O3 and with -g. Furthermore, as wanted, a couple of debug defines where commented in. I hope this is of help.

And last, but not least, the resolved stack trace:

0x81aaa08 handle_segfault + 808
0x8318cc0 ibuf_count_get + 151
0x837af5a buf_page_io_complete + 775
0x8383f66 buf_read_page_low + 614
0x8384329 buf_read_page + 99
0x8378bb0 buf_page_get_gen + 791
0x8319efc ibuf_bitmap_get_map_page + 81
0x831eeac ibuf_merge_or_delete_for_page + 237
0x837af07 buf_page_io_complete + 692
0x8383f66 buf_read_page_low + 614
0x8384329 buf_read_page + 99
0x8378bb0 buf_page_get_gen + 791
0x8350b0d btr_cur_search_to_nth_level + 1960
0x83596d9 btr_pcur_open + 111
0x835ae43 btr_pcur_open_on_user_rec + 52
0x8312338 dict_load_indexes + 605
0x83137bc dict_load_sys_table + 174
0x8301089 dict_boot + 3802
0x82feff2 innobase_start_or_create_for_mysql + 2851
0x828875c _Z13innobase_initv + 892
0x827d4a2 _Z7ha_initv + 466
0x81a9e4b _Z22init_server_componentsv + 427
0x81ae7ea main + 1194
0xb7c88450 _end + -1351541932
0x8107bb1 _start + 33

Thomas, can you please run the mysqld recovery inside gdb, or could you get a core dump that you could analyze with gdb? I would very much like to have a stack trace that includes the values of the function parameters.

gdb bt full output of the recovery crash - here we go

Attachment: recovery_crash-bt_full.txt (text/plain), 12.80 KiB.

Thomas, I'm sorry to bother you again, but the function ibuf_count_get() should not normally be included in the compilation, not even when UNIV_DEBUG is defined. I should have noticed this even in your stack trace of [15 Jan 14:32], before asking for the "bt full" from gdb.

The comment next to UNIV_IBUF_DEBUG in storage/innobase/univ.i tries to say that this option should only be activated after a slow shutdown (full insert buffer merge) or when the database is being created from the scratch. The UNIV_IBUF_DEBUG option is intended to be used for debugging any changes made to the InnoDB insert buffer.

Can you please compile mysqld again, defining just UNIV_DEBUG during the compilation, and post the "bt full" output from gdb of the crash? All other debug defines in univ.i than UNIV_IBUF_DEBUG (or UNIV_IBUF_COUNT_DEBUG in the InnoDB Plugin) are safe to use (they just make InnoDB slower and may enable some debug output to the error log).

Best regards and sorry for the nuisance,

Marko

Output of gdb bt full; mysql compiled only with UNIV_DEBUG

Attachment: bt-full.txt (text/plain), 8.56 KiB.

Thank you, Thomas! Now we are in business.

Somehow, btr_cur_optimistic_update() has decided that it can update the record in place, although the update would change field 6 to NULL. In ROW_FORMAT=COMPACT (the InnoDB default since MySQL 5.0.3), a NULL column takes no space. ROW_FORMAT=REDUNDANT would always reserve the same space for a fixed-length column, NULL or not.

Here is the relevant part of the stack trace:

(gdb) bt full
#0  0x083b8d2e in rec_set_nth_field (rec=0xb5aa372f "\200", offsets=0xb3c55b10, n=6, data=0xb62b2685, len=4294967295) at ./../include/rem0rec.ic:1231
        data2 = (unsigned char *) 0x0
        len2 = 0
#1  0x0833553b in row_upd_rec_in_place (rec=0xb5aa372f "\200", offsets=0xb3c55b10, update=0xb62b26c8) at row0upd.c:457
        upd_field = (upd_field_t *) 0xb62b26d8
        new_val = (dfield_t *) 0xb62b26e0
        n_fields = 3
        i = 0
#2  0x0834ef2b in btr_cur_update_in_place (flags=7, cursor=0xb62a5dc8, update=0xb62b26c8, cmpl_info=1, thr=0xb62a5cf8, mtr=0xb3c55dd8)
    at btr0cur.c:1497

Can you please issue the following commands:

frame 2
print *cursor->index

and show the SHOW CREATE TABLE for the indicated table (cursor->index->table_name)?  I primarily want to see index->name and index->table_name. (If you are not analyzing a core dump in gdb, but you are re-running mysqld under gdb, the addresses in the stack trace should be different. But I think that this bug should be fully repeatable with your data files.)

Can you please also compile mysqld without UNIV_DEBUG and post the stack trace of the crash? It should crash somewhat later, since the ut_ad() assertions (such as the failing one) are only enabled by UNIV_DEBUG.

The requested index is the PRIMARY index of our customer table, here is the schema:

CREATE TABLE `customer` (
  `id` int(11) NOT NULL auto_increment,
  `live_version_id` int(11) default NULL,
  `status` tinyint(4) default '0',
  `row_deleted` tinyint(4) default '0',
  `foreign_rc_owner_id` int(11) default NULL,
  `foreign_rc_address_id` int(11) default NULL,
  `number` varchar(20) NOT NULL,
  `firstname` varchar(128) default NULL,
  `lastname` varchar(128) default NULL,
  `email1` varchar(128) default NULL,
  `company_name` varchar(128) NOT NULL,
  `street1` varchar(128) default NULL,
  `street2` varchar(128) default NULL,
  `postcode` varchar(16) NOT NULL,
  `city1` varchar(64) NOT NULL,
  `city2` varchar(64) default NULL,
  `province_code` varchar(10) default 'XX-XX',
  `country_code` varchar(2) NOT NULL,
  `phone1` varchar(64) NOT NULL,
  `phone2` varchar(64) default NULL,
  `fax` varchar(64) default NULL,
  `mobile` varchar(64) default NULL,
  `customer_type` tinyint(4) default '0',
  `advised_by` tinyint(4) default '0',
  `vat_id` varchar(64) default NULL,
  `vat_uid` varchar(64) default NULL,
  `bank_account_country_code` varchar(2) default NULL,
  `bank_account_holder` varchar(128) default NULL,
  `bank_account_name` varchar(128) default NULL,
  `bank_account_code` varchar(64) default NULL,
  `bank_account_number` varchar(64) default NULL,
  `bank_account_street` varchar(128) default NULL,
  `bank_account_postcode` varchar(16) default NULL,
  `bank_account_city` varchar(64) default NULL,
  `bank_account_iban` varchar(64) default NULL,
  `bank_account_bic` varchar(64) default NULL,
  `tax_authority_name` varchar(128) default NULL,
  `created_at` datetime default NULL,
  `is_admin` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `customer_FI_1` (`live_version_id`),
  KEY `customer_FI_2` (`province_code`),
  CONSTRAINT `customer_FK_1` FOREIGN KEY (`live_version_id`) REFERENCES `customer` (`id`),
  CONSTRAINT `customer_FK_2` FOREIGN KEY (`province_code`) REFERENCES `province` (`iso_code`)
) ENGINE=InnoDB AUTO_INCREMENT=24 DEFAULT CHARSET=utf8 COMMENT='Customer'

Output of gdb bt full; all debug options disabled

Attachment: bt-full_w_o_debug.txt (text/plain), 8.11 KiB.

Thomas,

I investigated offsets_[] of your "bt full" from the UNIV_DEBUG build, and it looks plausible. The length of all NULL fields is 0 (the difference to the preceding entry in the array is always 0 or 1<<31).

I think that this is a bug in row_upd_changes_field_size_or_external(). I would like to see the update vector, which was not shown in your stack traces. (They only show a pointer to the update vector.)

In the stack frame of btr_cur_update_in_place() (frame 2 of your UNIV_DEBUG build), please execute the following in gdb:

p *update
p *update->fields@update->n_fields
p/x *(char*)update->fields[0].new_val.data@update->fields[0].new_val.len
p/x *(char*)update->fields[1].new_val.data@update->fields[1].new_val.len

and so on until update->n_fields. Some of these may complain if len = (ulint)-1 [the field is SQL NULL].

It would also be nice to see the record being updated:

p/x rec[-0x1f]@0x1f
p/x *rec@127

If you wish to debug this yourself, you could do this in gdb:

break ut_dbg_assertion_failed
break row_upd_changes_field_size_or_external
ignore 2 10000
run ...
(it should hit breakpoint 1, ut_dbg_assertion_failed, in the UNIV_DEBUG build)
info breakpoints
(the last line should be "breakpoint already hit N times")
ignore 2 N-1 (set it to one smaller than the above number)
run ...

Now you should hit the faulty instance of row_upd_changes_field_size_or_external(). I don't think that single-stepping will help much here: seeing the update vector and the offsets[] is easiest (for me, at least).

There was a bug that was fixed in MySQL 4.1 (innobase/row/row0upd.c) that may have been related to this bug. Unfortunately, I was unable to find anything in the MySQL bug database about it. I merely found this commit note: http://bazaar.launchpad.net/%7Emysql/mysql-server/mysql-4.1/revision/2055.272.1

Thomas sent me the update vector privately.  It contains three fields: assigning field 6 NULL (foreign_rc_owner_id=NULL) and updating the InnoDB system columns DB_TRX_ID and DB_ROLL_PTR.  I was unable to reproduce this in MySQL/InnoDB 5.1.  I will try to repeat under 5.0 tomorrow.

This bug seems to be affect the following case:

* A table is in ROW_FORMAT=COMPACT (the default since MySQL/InnoDB 5.0.3).
* A column that is NULL is being updated to NULL. If InnoDB gets an update
vector for this no-op, it will misinterpret the record as if it were in ROW_FORMAT=REDUNDANT (the original InnoDB format).

It is possible that the database got corrupted because of this bug. Depending on the contents of the containing B-tree page of the affected records, InnoDB could be write to a random location (possibly on another page), or this attempted write could cause a segmentation fault.

The fix could be this simple:

--- innobase/include/rem0rec.ic	2007-03-22 20:40:52 +0000
+++ innobase/include/rem0rec.ic	2009-01-19 20:33:54 +0000
@@ -1227,9 +1227,11 @@
 	ut_ad(rec);
 	ut_ad(rec_offs_validate(rec, NULL, offsets));
 
-	if (len == UNIV_SQL_NULL) {
-		ut_ad(!rec_offs_comp(offsets));
-		rec_set_nth_field_sql_null(rec, n);
+	if (UNIV_UNLIKELY(len == UNIV_SQL_NULL)) {
+		if (!rec_offs_nth_sql_null(offsets, n)) {
+			ut_a(!rec_offs_comp(offsets));
+			rec_set_nth_field_sql_null(rec, n);
+		}
 
 		return;
 	}

Thomas, can you please try this patch out? Could you also run CHECK TABLE to see if this bug caused any corruption? CHECK TABLE won't detect all cases of corruption, though.

As far as I understand, this bug was triggered when a transaction that updated a column from NULL to NULL was being rolled back.

I was unable to repeat the bug in MySQL 5.0.77 (bzr) either. Presumably MySQL is usually clever enough not to call ha_innobase::update_row in most cases when an update does not actually change the record.

I've applied your fix (had to relocate the patch a few lines for 5.0.67) and the recovery now runs through without a crash - very cool, thanks!

CHECK TABLE on the customer table also does not bring up anything:

mysql> CHECK TABLE customer;
+--------------+-------+----------+----------+
| Table        | Op    | Msg_type | Msg_text |
+--------------+-------+----------+----------+
| od2.customer | check | status   | OK       |
+--------------+-------+----------+----------+
1 row in set (0.13 sec)

Thomas, I'm glad that my fix worked. If it is not too much work, could you  post the SQL statement whose rollback likely triggered the bug? It should be something that sets foreign_rc_owner_id=NULL without changing any other columns.

Sorry, but I don't have this SQL handy ATM. I could enable the query log, but I don't really like to do that on a production machine.

We're also using Propel on top of PDO::MySQL, so most of the time we have no real knowledge what exact SQL string is fired towards the database.

In general setting this particular field to NULL during the rollback sounds like some backend action was completed on our side which fetched and the field, but again, I'm unable to give you an exact SQL string for this.

Docs team, everyone:  I apologize, this bug was fixed in an earlier release, but the bug report was not updated properly.

The bug is fixed in:  5.0.79, 5.1.33, 6.0.11 (not yet released).

Timothy

Noted in 5.0.79, 5.1.33, 6.0.11 changelog.

In InnoDB recovery after a server crash, rollback of a transaction
that updated a column from NULL to NULL could cause another crash.

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (merge vers: 5.1.47) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (pib:16)

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)