MySQL Bugs: #55284: Double free of off-page columns due to lock wait while updating PRIMARY KEY

Bug #55284	Double free of off-page columns due to lock wait while updating PRIMARY KEY
Submitted:	15 Jul 2010 13:49	Modified:	10 Feb 2011 23:36
Reporter:	Mikhail Izioumtchenko	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	3.23.53,4.0,4.1,5.0,5.1.49,5.5	OS:	Any
Assigned to:	Marko Mäkelä	CPU Architecture:	Any

Description:
this is a UNIV_DEBUG_FILE_ACCESSES assert.
In txn rollback after crash recovery.

100714 11:46:29  InnoDB: Assertion failure in thread 1206450496 in file buf/buf0buf.c line 1758
InnoDB: Failing assertion: !bpage->file_page_was_freed

 #5  0x0000003acbc31d10 in abort () from /lib64/libc.so.6
 #6  0x00002aaaabbdb3b9 in buf_page_get_zip (space=7, zip_size=4096, offset=203)
     at buf/buf0buf.c:1758
 #7  0x00002aaaabbcba7b in btr_copy_zblob_prefix (d_stream=0x47e8e3e0,
     zip_size=4096, space_id=7, page_no=203, offset=12) at btr/btr0cur.c:4554
 #8  0x00002aaaabbcbde5 in btr_copy_externally_stored_field_prefix_low (
     buf=0x2aaab80d0468 "\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252"..., len=768, zip_size=4096, space_id=7, page_no=203,
     offset=12) at btr/btr0cur.c:4692
 #9  0x00002aaaabbcbfd3 in btr_copy_externally_stored_field_prefix (
     buf=0x2aaab80d0468 "\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252"..., len=768, zip_size=4096, data=0x2aaab80bfa7e "",
     local_len=0) at btr/btr0cur.c:4755
 #10 0x00002aaaabc846df in row_ext_cache_fill (ext=0x2aaab80d0128, i=1,
     zip_size=4096, dfield=0x2aaab80cffd8) at row/row0ext.c:65
 #11 0x00002aaaabc84837 in row_ext_create (n_ext=2, ext=0x2aaab80d00d8,
     tuple=0x2aaab80cff58, zip_size=4096, heap=0x2aaab80bf9b0)
     at row/row0ext.c:111
 #12 0x00002aaaabca4b46 in row_upd_replace (row=0x2aaab80cff58,
     ext=0x2aaab80cf108, index=0x2aaab809a378, update=0x2aaab80cf5b0,
     heap=0x2aaab80bf9b0) at row/row0upd.c:1176
 #13 0x00002aaaabca282c in row_undo_search_clust_to_pcur (node=0x2aaab80cef58)
     at row/row0undo.c:222
 #14 0x00002aaaabca23c0 in row_undo_mod (node=0x2aaab80cef58,
     thr=0x2aaab802ae48) at row/row0umod.c:833
 #15 0x00002aaaabca2b2b in row_undo (node=0x2aaab80cef58, thr=0x2aaab802ae48)
     at row/row0undo.c:326
 #16 0x00002aaaabca2c30 in row_undo_step (thr=0x2aaab802ae48)
     at row/row0undo.c:368
 #17 0x00002aaaabc7aed5 in que_thr_step (thr=0x2aaab802ae48)
     at que/que0que.c:1246
 #18 0x00002aaaabc7b0ec in que_run_threads_low (thr=0x2aaab802ae48)
     at que/que0que.c:1311
 #19 0x00002aaaabc7b203 in que_run_threads (thr=0x2aaab804d798)
     at que/que0que.c:1348
 #20 0x00002aaaabcbdf10 in trx_rollback_active (trx=0x15d44988)
     at trx/trx0roll.c:473
 #21 0x00002aaaabcbe2e0 in trx_rollback_or_clean_recovered (all=1)
     at trx/trx0roll.c:586
 #22 0x00002aaaabcbe35c in trx_rollback_or_clean_all_recovered (
     arg=0x7fff3810b93c) at trx/trx0roll.c:618

How to repeat:
stress/recovery test srcover.sh kbs 4
happened once with 5.1,
a bit more often in 5.5.
I haven't seen it until recently. This could be result of either
- recent improvements in BLOB coverage in the test workload
- a regression caused by a recent BLOB bug fix

As far as I can tell, file_page_was_freed is only assigned TRUE by buf_page_set_file_page_was_freed(). The variable is initialized to FALSE by default. The function buf_page_reset_file_page_was_freed() can also set it to FALSE. That function is called in ibuf0ibuf.c and paired with a call to buf_page_set_file_page_was_freed() within the same mtr, probably to bypass a debug assertion.

The relevant calls to buf_page_set_file_page_was_freed() are in fseg_free_page() and fseg_free_extent().

If this is a bogus assertion failure, then the reason could be that the code for handling off-page columns (BLOBs) in compressed tables somehow forgets to reset the flag.

Does this crash always occur with BLOBs and compressed tables?

Today, we hit this crash on 5.1.50 + innodb plugin:

InnoDB: unable to decompress space 11829 page 37443
101101  9:16:11  InnoDB: Assertion failure in thread 1092319552 in file buf/buf0buf.c line 2254
InnoDB: Failing assertion: success
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
101101  9:16:11 - mysqld got signal 6 ;

I don't know, if this crash is similar to the bug report, but we do use some compressed tables.

This does not look like a bogus assertion.

My current hypothesis is that the block has been freed by btr_page_free_low() or btr_page_free(). It could be that btr_blob_free() is making the bug more visible for compressed tables.

Interestingly, the crash in keep.incident_c55.1.r3134.freed occurs in thread 1 (executing trx_rollback_or_clean_recovered, rolling back transactions that were active when the server crashed) while thread 2 is shutting the server down.

Error log with the backtrace

Attachment: error.log (application/octet-stream, text), 2.78 KiB.

Vojtech, this bug is about off-page columns, while your crash is a failure to decompress a B-tree page, that is, buf_zip_decompress() is returning success=FALSE to buf_page_get_gen(). Please file another bug with all the details. You omitted the interesting part: what was logged between the server was started and the "InnoDB: unable to decompress".

I have spotted two bugs in row_upd_clust_rec_by_insert(), which is invoked during an UPDATE of a PRIMARY KEY. Both bugs involve records that contain externally stored columns (off-page columns, BLOBs).

First (not mentioned in this report so far), if the UPDATE would violate a foreign key constraint, row_upd_clust_rec_by_insert() would ‘disown’ the columns from the record, even though the UPDATE was denied.

Second, between the mtr_commit() and the completion of row_ins_index_entry(), the purge thread can kick in and delete the BLOB. We should position the cursor in BTR_MODIFY_TREE mode, perform the insert and only then mark the columns disowned by the old record, all within the same mini-transaction.

I am working on a fix.

The incomplete transactions that were recovered after an InnoDB crash are a special case in that the in-memory table of explicit record locks is not recovered. Thus, purge can free the BLOBs as soon as row_upd_clust_rec_by_insert() has committed the mini-transaction.

I do not think that it would be acceptable to work around this bug by blocking the purge until after all recovered transactions have been rolled back. The rollback could take too much time.

We may have to spend effort on recovering the lock table. Implicit record locks (fresh inserts in clustered indexes) are recovered, but explicit ones are not. Can this break the recovery (COMMIT or ROLLBACK) of PREPARED transactions?

This bug should exist in all InnoDB versions.

My initial patch was wrong.
The problem is that an off-page column can (BLOB) be orphaned.

We can counter that by first inserting the new record in the update, marking the BLOBs inherited. After a successful insert, we disown the BLOBs from the old record.

Question 1: What if the insert half-succeeds, returning DB_OUT_OF_FILE_SPACE for failure to store an updated off-page column? Will the new record always be rolled back?
Answer: I believe that the transaction will be rolled back, even after crash recovery.

Question 2: What if the server crashes right before the mtr_commit that disowns the BLOBs from the old record?
Answer: The transaction will be rolled back after crash recovery if it has not been committed. The transaction commit would occur after the mtr_commit.

Question 3: Will the purge of the pre-update record remove the ‘inherited’ marks from the after-update record? Or can there be a BLOB leak after the old record has been purged and the new record has been deleted and purged?
Answer: To be determined.

I filed Bug #57980 for Question 3.

What if the transaction is rolled back after row_upd_clust_rec_by_insert() has completed? Will the BLOB ownership be restored to the old record again? If this were the case, this bug should not exist. We have to test this case as well.

It looks like the transaction rollback forgets to reclaim the ownership of the BLOBs. The rollback has to be fixed as well. Also, the purpose and the life cycle of the BTR_EXTERN_INHERITED_FLAG has to be evaluated carefully. Could it have been introduced as a workaround for this bug?

The root cause is that the removal of the off-page column (a.k.a. externally stored field a.k.a. BLOB) ownership from the original record in btr_cur_mark_extern_inherited_fields() is not written to the undo log. Thus, the BLOBs will be orphaned when the transaction is rolled back.

A straightforward fix would require a change to the undo log format, which is something that we cannot do in a generally available release. Fortunately, this is a somewhat rare use case. Updating one of the PRIMARY KEY columns is something to be avoided, because it would require a delete of the old record and an insert of the updated record in every index of the table.

If we were allowed to change the undo log format, in the undo log entry written by the btr_cur_del_mark_set_clust_rec() call in row_upd_clust_rec_by_insert() I would allocate space for representing the updates of the BLOB ownership bits. Then, in the same mini-transaction that does btr_cur_mark_extern_inherited_fields() I would update the undo log entry to indicate the BLOB ownership changes. That should be a crash-safe approach.

I have a patch that mitigates this issue by deferring the removal of the BLOB ownership from the original record until the insert of the updated record has succeeded. The fix would only help the case when the transaction has not successfully transferred any BLOB ownership in row_upd_clust_rec_by_insert() before, and row_upd_clust_rec_by_insert() fails or crashes.

The rollback does seem to restore the BLOB ownership, after all, in the following stack trace:

btr_cur_unmark_extern_fields()
btr_cur_pessimistic_update()
row_undo_mod_clust_low(BTR_MODIFY_TREE)
row_undo_mod_clust()
row_undo_mod()
row_undo()

Note that row_undo_mod_clust() attempts row_undo_mod_clust_low(BTR_MODIFY_LEAF) first, but that will always fail because of the rec_offs_any_extern() condition in btr_cur_optimistic_update().

There would be two changes to undo in a rollback of a PRIMARY KEY update of a record that contains externally stored columns:

* the insertion of the updated record
* the delete-marking of the original record

The question is if either of these can conflict with a purge.

Rolling back the insertion is straightforward, almost the same as rolling back a normal insert. The only difference is that in this case, some BLOBs in the inserted record have BTR_EXTERN_INHERITED_FLAG set, and these BLOBs will not be freed during the rollback. Because insertions are in the insert_undo log and purge operates on the update_undo log, this phase cannot possibly conflict with the purge.

The interesting part is undoing the delete-marking of the original record. This would be the above stack trace, with row_undo_mod_clust_low(BTR_MODIFY_TREE). Can purge be attempting to remove the same BLOB? It should not be, because purge should be operating on the update_undo logs of old committed transactions, not on the undo logs of active transactions.

I am running out of ideas on this one. I think that we need more crash dumps. This could be related to Bug #55283 (assert trx0roll.c undo_no + 1 == trx->undo_no in trx_rollback_or_clean_all_recovered).

A small correction: row_undo_mod_clust_low(BTR_MODIFY_LEAF) can perform the undo by invoking btr_cur_update_in_place(), which does reclaim the BLOB ownership when the delete-mark is cleared, by invoking btr_cur_unmark_extern_fields(). Thus, my suspicion on row_upd_clust_rec_by_insert() being faulty seems to be unfounded.

I removed 'recovered transactions' from the bug title, because in keep.incident_c56.1.b55284.nfree this bug occurs after the rollback of recovered transactions has completed.

With my clean-up patch applied, the crashes occur during normal rollback, even on a freshly created database, with no shutdown, restart or crash recovery step. In keep.incident_c56.6.b55284.three the crash occurs while rolling back a transaction in the SERIALIZABLE level, which performs locking reads instead of consistent reads. I had a suspicion that someone might be updating an old version of the record, but this isolation level seems to suggest that it cannot be the ca(u)se.

Note that my clean-up patch may either be buggy, or it may be correct and is just improving the reproducibility.

I have been trying to track this down, with no breakthrough so far. I am slowly getting there, adding debug printout here and there, and reading the debug log once an assertion fails.

When I change my test case to use ROW_FORMAT=DYNAMIC, the bug goes hiding. With the default ROW_FORMAT=COMPACT I can reproduce it in 5-10 minutes, or one of the clients will exit in about 20 minutes without any failure.

This bug can manifest itself in many ways. One of them is the double free. Another case is an attempt to access a freed BLOB for reading.

In one occasion, the freed BLOB page was allocated to a B-tree page that was later freed. Only after that, the double BLOB free would occur. We do have a page type check in btr_check_blob_fil_page_type() that would have caught the attempted access of a B-tree page as a BLOB, but it is intentionally disabled for the InnoDB Antelope row formats (REDUNDANT, COMPACT), because old versions of InnoDB failed to initialize the page type field. :-(

To improve reproducibility, I enabled the above mentioned check, and in btr_page_free_low() I added code to reset the file page type.

I noticed a bug in row_upd_clust_rec_by_insert() that may have been present in my cleanup patch only (I did not try to revert the clean-up). Sometimes, UPDATE blobt3 SET B=... WHERE A=... would not properly transfer the ownership of column C. After fixing that by simplifying the code further, the bug persists, and none of the assertions that I added to catch this bug are being hit.

My next plan is to implement a debug data structure that would contain (space,page_no,referring_page_no) triplets, ordered by (space,page_no) of the first BLOB page. This would allow us to assert that when freeing a BLOB chain starting at (space,page_no), there are no referring_page_no that would still want to see the BLOB.

I tried a test where at each step mysqld is killed with signal 4, not 9,
thus saving a coredump. Unfortunately in this mode no failures were observed.

I ruled out one suspect: this bug does occur even when semi-consistent reads are disabled.

The main suspect is the interplay between row_mysql_handle_errors() and row_update_for_mysql().

1. A lock wait occurs during the update of the primary key of a record that contains off-page columns is made.
2. While the lock wait is handled, the transaction is chosen as a deadlock victim.
3. row_update_for_mysql() ignores this and retries the update.
4. ???
5. Corruption

We have had a few bugs where MySQL has attempted to carry on executing after InnoDB has reported an error and rolled back the transaction. This seems to be a case where InnoDB does the same.

No caller of row_mysql_handle_errors() seems to check the error status when the function returns TRUE. In some cases, this might be acceptable, but every case has to be reviewed carefully.

When row_update_for_mysql() gets a TRUE return from row_mysql_handle_errors(), either err==DB_SUCCESS or err==DB_LOCK_WAIT. It is not OK to retry the update in the latter case.

filed Bug #58571 for the assert in my note of [26 Nov 13:48].

If I disable the retry-after-lock-wait in row_update_for_mysql(), the bug goes away. The test case succeeded in 3 out of 3 runs.

Alas, I jumped into conclusions a little too fast. I reran with more complete instrumentation that displayed the err value after row_mysql_handle_errors() returned TRUE to row_update_for_mysql(). It is always err=DB_LOCK_WAIT.

It seems that the problems occur when row_upd_clust_rec_by_insert() is retried in the case when change_ownership==TRUE on the first try. Because we do not write the ownership change to the undo log, it may be prohibitively hard to fix this without changing the file format.

We could implement the following work-around: If a lock wait occurs during row_upd_clust_rec_by_insert() on a record that contains off-page columns that were not updated (change_ownership==TRUE), roll back the current statement and return an error to the user.

I will continue investigations to see if there is a better fix in sight.

The problem appears to be a lock wait in the row_ins_index_entry() call in row_upd_clust_rec_by_insert(). I am testing a patch that attempts to restore the situation after operation is resumed and row_upd_clust_rec_by_insert() is subsequently invoked. My patch marks the externally stored fields in the to-be-inserted entry as inherited, and disowns the columns in the old record only after the row_ins_index_entry() succeeds. It looks promising. The first test run succeeded, and this situation was hit twice.

Is this related to http://bugs.mysql.com/bug.php?id=55981
Does Shane need to spend more time on Innodb + large blobs + rollback?

Mark, Bug #55981 is a duplicate of Bug #55543, which in turn is a child of this bug. The Bug #55543 fix is incomplete. Bug #55284 requires a more complete fix.

I updated the bug title. I am not sure if rollback is needed for reproducing this bug. I have seen it occur both during rollback and purge. What is definitely needed is a lock wait during the update of a primary key of a record that refers to off-page columns.

My current fix seems to be working, but there is another bug that comes up in the test case: Purge attempts to free a BLOB that is being pointed to by a committed record that has not been marked for deletion. The BLOB pointer in that record has both BTR_EXTERN_OWNER_FLAG and BTR_EXTERN_INHERITED_FLAG set. The DB_TRX_ID of that record matches the DB_TRX_ID in the node->update vector in the purge (which updates two BLOB columns and the DB_TRX_ID and DB_ROLL_PTR).

I finally created a repeatable test case for the test failure. The failure is caused by my patch.

Correction: the failure was unmasked by my patch. The function row_upd_changes_ord_field_binary() returns TRUE when the update vector affects a column prefix index, even when the indexed prefix does not change.

I filed Bug #58912 (InnoDB unnecessarily avoids update-in-place on column prefix indexes) for the further bug that was uncovered by my fix of this bug. Both bug fixes need to be pushed at the same time.

I do not think that we can fix this in the built-in InnoDB of MySQL 5.1, because fixing Bug #58912 there would require merging a large amount of code (the introduction of row_ext_t) from the InnoDB Plugin. Coincidentally, Bug #55543 (a partial fix of this bug) was only fixed in InnoDB Plugin and later.

Finally, I was able to explain a failure that came up with my diagnostics patch. It was a bug in the diagnostics. The diagnostics checks for orphan BLOB references when a BLOB is freed. Orphan references to the freed BLOB are only tolerated if the reference does not 'own' the BLOB (BTR_EXTERN_OWNER_FLAG is set) and the record has been marked for deletion.

I had made a mistake in the diagnostics that would set btr_blob_dbg_t::del=0 even for delete-marked records when a page is split. This showed up with small compressed pages, because page splits are more common when using a too small compressed page size. It was somewhat hard to trigger this failure, because it required the purge to process slowly enough. If the purge is too fast, there would not be any delete-marked records on the split page.

Now that this failure has been explained and fixed, the bug fix can finally be committed.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127391

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127392

Pushed into mysql-trunk 5.6.2 (revid:vasil.dimov@oracle.com-20110108150732-8qygun7nuaqf9d3u) (version source revid:vasil.dimov@oracle.com-20110108150646-44c9j7ck64ocey31) (merge vers: 5.6.2) (pib:24)

Pushed into mysql-5.5 5.5.9 (revid:vasil.dimov@oracle.com-20110108150508-gpanhz48z8069qot) (version source revid:vasil.dimov@oracle.com-20110108150048-b1y9m8xe72hay0ch) (merge vers: 5.5.9) (pib:24)

Pushed into mysql-5.1 5.1.55 (revid:vasil.dimov@oracle.com-20110108145923-0tjqdpa5w3d6tuwn) (version source revid:vasil.dimov@oracle.com-20110108145923-0tjqdpa5w3d6tuwn) (merge vers: 5.1.55) (pib:24)

The fix introduced a bogus debug assertion, as noted in Bug #59230.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130209

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130210

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130211

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130212

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130213

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130214

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130224

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130225

Bug #30405 could be a duplicate of this.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130696

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130697

Pushed into mysql-5.1 5.1.56 (revid:vasil.dimov@oracle.com-20110208172642-ekjin6otzuy47vn9) (version source revid:vasil.dimov@oracle.com-20110208172642-ekjin6otzuy47vn9) (merge vers: 5.1.56) (pib:24)

Pushed into mysql-trunk 5.6.2 (revid:vasil.dimov@oracle.com-20110208173442-ocy58fdcuew3xvex) (version source revid:vasil.dimov@oracle.com-20110208173331-fu0j2s14jbg915zu) (merge vers: 5.6.2) (pib:24)

Pushed into mysql-5.5 5.5.10 (revid:vasil.dimov@oracle.com-20110208173046-qsmzbrw1gppahx5o) (version source revid:vasil.dimov@oracle.com-20110208172800-tls70r2ot1i0dub7) (merge vers: 5.5.10) (pib:24)

Unfortunately, I hit the same issue twice this week with MySQL-5.1.61, and MySQL failed to recovery:

InnoDB: Starting in background the rollback of uncommitted transactions
130415 12:47:29  InnoDB: Rolling back trx with id 96BAE, 1 rows to undo
130415 12:47:29 InnoDB Plugin 1.0.17 started; log sequence number 889426263
130415 12:47:29 [Note] Recovering after a crash using /home/mysql/data3046/mysql/mysql-bin
InnoDB: Dump of the tablespace extent descriptor:  len 40; hex 000000000000000200000000020600000000022e00000004aabbafbaaaefabbeebbfbabaaeabbbbf; asc                    .                    ;
InnoDB: Serious error! InnoDB is trying to free page 742
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
InnoDB: You may need to dump your InnoDB tables and recreate the whole
InnoDB: database!

I don't know the reproduce steps, and the workaround is comment out the ut_error in fseg_free_page_low when XDES_FREE_BIT is detected set due to BTR_EXTERN_INHERITED_FLAG not handled in such case. It seems work for me.

However, I don't know the risk for such workaround. Any suggestion? Thanks ahead.

MySQL 5.1.61 was released in January 2012.
I think that we did fix some BLOB bugs in 2012 as well.

Furthermore, if your tablespace has been created and modified with an old (buggy) version of InnoDB, a BLOB page could be freed and reused for something else. InnoDB would read and deliver the overwritten contents of the BLOB page to queries. The problem would not be detected until you actually delete the row and both the original BLOB page and the reused BLOB page have been freed.

Because you are hitting the assertion on rollback, you could theoretically be hitting a bug in the 'out-of-order rollback optimization' that was removed in September 2012:

Bug#13249921 ASSERT !BPAGE->FILE_PAGE_WAS_FREED, USUALLY IN TRANSACTION ROLLBACK
This is only a debug assertion failing, catching read access to a freed BLOB page. I am not sure if this code path could actually cause a double BLOB free.

You could also have been affected by
Bug#13721257 RACE CONDITION IN UPDATES OR INSERTS OF WIDE RECORDS
which can cause the loss of BLOB data on a server crash. But I do not think that it would cause a double free of BLOB pages.

Both these fixes should be included in MySQL 5.1.67 and later versions.

If you want to play it safe, you should dump all your data with mysqldump, check the long VARCHAR, TEXT, BLOB columns for consistency, and restore the backup on a newer version of MySQL. Otherwise you cannot be sure.

Starting with MySQL 5.6, CHECK TABLE would notice if your B-tree pages are marked as free, but I do not think that we check that BLOB pages are marked as allocated.

I think this type of problem might also need Bug#14676084 and Bug#14676249 to be fixed. They were fixed in 5.1.68.

Thanks for the detail comments from Marko and Yasufumi, that really help me a lot.

I back-ported the bugfix for Bug#14676084 and Bug#14676249 based on MySQL-5.1.61,and let's try to watch a period for the problematic boxes.

Noted in 5.7.3 changelog:

"InnoDB would attempt to free BLOB pages already marked as free."

Thank you for the bug report.