Bug #53812 assert row/row0umod.c line 660 in txn rollback after crash recovery
Submitted: 19 May 2010 15:41 Modified: 15 Dec 2010 0:45
Reporter: Mikhail Izioumtchenko Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S2 (Serious)
Version:1.0.9 OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any

[19 May 2010 15:41] Mikhail Izioumtchenko
Description:
soon after crash recovery:

100518 18:42:23  InnoDB: Assertion failure in thread 1196226880 in file row/row0umod.c line 660
InnoDB: Failing assertion: entry

 #5  0x0000003c1a231d10 in abort () from /lib64/libc.so.6
 #6  0x00002aaaabca11ba in row_undo_mod_upd_exist_sec (node=0x99039c8,
     thr=0x9350178) at row/row0umod.c:680
 #7  0x00002aaaabca1674 in row_undo_mod (node=0x99039c8, thr=0x9350178)
     at row/row0umod.c:821
 #8  0x00002aaaabca1d2c in row_undo (node=0x99039c8, thr=0x9350178)
     at row/row0undo.c:321
 #9  0x00002aaaabca1e31 in row_undo_step (thr=0x9350178) at row/row0undo.c:359
 #10 0x00002aaaabc7a709 in que_thr_step (thr=0x9350178) at que/que0que.c:1248
 #11 0x00002aaaabc7a920 in que_run_threads_low (thr=0x9350178)
     at que/que0que.c:1314
 #12 0x00002aaaabc7aa37 in que_run_threads (thr=0x9350fb8) at que/que0que.c:1352
 #13 0x00002aaaabcbd080 in trx_rollback_active (trx=0x9338e58)
     at trx/trx0roll.c:479
 #14 0x00002aaaabcbd450 in trx_rollback_or_clean_recovered (all=1)

the dateset can't ever be recovered, once this assert appears it will 
appear in subsequent recovery attempts.

How to repeat:
BLOB-heavy workload, crash server, the reproducibility is not on every crash,
but it does happen.

Suggested fix:
not sure if the assert should be there. row_build_index_entry() returns NULL here:

                        entry = row_build_index_entry(node->row, node->ext,
                                                      index, heap);
                        ut_a(entry);

and the only way for it to return NULL seems legitimate, 

externally stored columns in the clustered index record are
unavailable and ext != NULL.

The same may apply to another assert like this in the same function.

the patch below replaces the asserts with successful return, not sure about its 
validity but it allows to recover the dataset:

=== modified file 'storage/innodb_plugin/row/row0umod.c'
--- storage/innodb_plugin/row/row0umod.c        2010-02-20 16:45:41 +0000
+++ storage/innodb_plugin/row/row0umod.c        2010-05-19 15:07:07 +0000
@@ -657,7 +657,12 @@
                        /* Build the newest version of the index entry */
                        entry = row_build_index_entry(node->row, node->ext,
                                                      index, heap);
-                       ut_a(entry);
+                       if (! entry) {
+                               mem_heap_free(heap);
+
+                               return(DB_SUCCESS);
+                       }
+
                        /* NOTE that if we updated the fields of a
                        delete-marked secondary index record so that
                        alphabetically they stayed the same, e.g.,
@@ -687,7 +692,11 @@
                        entry = row_build_index_entry(node->undo_row,
                                                      node->undo_ext,
                                                      index, heap);
-                       ut_a(entry);
+                       if (! entry) {
+                               mem_heap_free(heap);
+
+                               return(DB_SUCCESS);
+                       }

                        err = row_undo_mod_del_unmark_sec_and_undo_update(
                                BTR_MODIFY_LEAF, thr, index, entry);
[1 Jun 2010 7:52] Marko Mäkelä
This condition can occur when updating a PRIMARY KEY column of a ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED table when there is a secondary index defined on a column prefix of an externally stored column.

In row_upd_clust_rec_by_insert(), it would be as follows:

The first mini-transaction delete-marks the old record and writes the undo log entry.
Subsequent mini-transaction(s) insert the new record, but leave the pointers to externally stored columns zero.
Each page of externally stored column will be written by a separate mini-transaction.

If the server crashes before all externally stored columns are written, the rollback of incomplete transactions will fail to fetch the externally stored column prefix. We must ignore this failure during the rollback of incomplete transactions, but not during a normal transaction rollback.
[1 Jun 2010 10:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109692
[1 Jun 2010 10:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109693
[1 Jun 2010 10:42] Marko Mäkelä
Workaround: if you must define column prefix indexes, use a different InnoDB ROW_FORMAT format than DYNAMIC or COMPRESSED.
[1 Jun 2010 11:11] Jimmy Yang
Fix looks fine for the scenario described. Ok to Push.
[1 Jun 2010 11:17] Marko Mäkelä
For what it is worth, I tried to reproduce this by crashing the server with an assertion:

=== modified file 'storage/innodb_plugin/row/row0upd.c'
--- storage/innodb_plugin/row/row0upd.c	revid:marko.makela@oracle.com-20100601103738-upm8awahesmeh9dr
+++ storage/innodb_plugin/row/row0upd.c	2010-06-01 10:59:28 +0000
@@ -1650,6 +1650,8 @@ row_upd_clust_rec_by_insert(
 		btr_cur_mark_dtuple_inherited_extern(entry, node->update);
 	}
 
+	ut_a(ut_rnd_gen_ulint() & 7);
+
 	err = row_ins_index_entry(index, entry,
 				  node->upd_ext ? node->upd_ext->n_ext : 0,
 				  TRUE, thr);

but my test failed to trigger that statement. I guess my test scripts did not update the primary key frequently enough.
[1 Jun 2010 12:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109718
[1 Jun 2010 12:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/109719
[19 Jul 2010 14:35] Bugs System
Pushed into 5.1.49 (revid:build@mysql.com-20100719143034-omcma40sblwmay3x) (version source revid:vasil.dimov@oracle.com-20100704071244-3lo4okzels3kvy1p) (merge vers: 5.1.49) (pib:16)
[20 Jul 2010 19:02] John Russell
Since this was an internally-generated bug during testing, it won't go in the change log.
[23 Jul 2010 12:22] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100723121820-jryu2fuw3pc53q9w) (version source revid:alik@sun.com-20100723121820-jryu2fuw3pc53q9w) (merge vers: 5.5.6-m3) (pib:18)
[23 Jul 2010 12:29] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100723121929-90e9zemk3jkr2ocy) (version source revid:alik@sun.com-20100723121827-3bsh51m5sj6g4oma) (pib:18)
[14 Oct 2010 8:27] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (version source revid:martin.skold@mysql.com-20101014082627-jrmy9xbfbtrebw3c) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[14 Oct 2010 8:42] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (version source revid:martin.skold@mysql.com-20101014083757-5qo48b86d69zjvzj) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[14 Oct 2010 8:56] Bugs System
Pushed into mysql-5.1-telco-6.2 5.1.51-ndb-6.2.19 (revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (version source revid:martin.skold@mysql.com-20101014084420-y54ecj85j5we27oa) (merge vers: 5.1.51-ndb-6.2.19) (pib:21)