MySQL Bugs: #66819: InnoDB change buffering is not entirely crash-safe

Bug #66819	InnoDB change buffering is not entirely crash-safe
Submitted:	14 Sep 2012 13:37	Modified:	16 Oct 2012 23:33
Reporter:	Alexey Kopytov	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.1, 5.5, 5.6	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
InnoDB change buffer modifications are not entirely crash-safe. If a
crash happens at a specific point when a change buffer entry is being
merged into the page, the transaction log and the change buffer are
left in an inconsistent state, so InnoDB will crash after a restart
after reading the corresponding secondary index page.

Consider the following scenario:

1. An index page is read into the buffer pool and a change buffer
   merge is performed by first inserting/delete-marking/purging the
   previously buffered changes to it in
   ibuf_merge_or_delete_for_page().
2. The index page update is mtr-logged.
3. ibuf_merge_or_delete_for_page() then tries to delete the change
   buffer entry by calling ibuf_delete_rec().
4. If ibuf_delete_rec() has to resort to a pessimistic delete from
   the tree, it commits mtr where the index page update was logged
   previously by calling ibuf_btr_pcur_commit_specify_mtr().
5. The redo log buffer is flushed to disk by another thread, thus
   redo-logging the index page update.
6. A crash happens before the next log buffer flush (it doesn't
   matter if the change buffer deletion has been completed by that
   time or not).

As a result, the index page update will be replayed on recovery after
restart. But the corresponding change buffer update(s) will be lost,
so it will still contain the operation that has already been
merged. When the index page is accessed, an assertion failure will
happen when attempting to merge an already merged operation.

This is most likely the underlying cause for bug #58571, bug #61104
(which was fixed by avoiding the assertion rather than preventing
the corruption) and bug #65443.

We are able to reproduce this with both 5.5 and 5.6. We have not
tested other versions, but from code analysis it looks like the
problem has been there since day one.

How to repeat:
Crash InnoDB in the right moment during a pessimistic delete from the change buffer.

In fact for an IBUF_OP_DELETE operation ibuf_merge_or_delete_for_page() commits mtr even before calling ibuf_delete_rec():

				/* Because ibuf_delete() will latch an
				insert buffer bitmap page, commit mtr
				before latching any further pages.
				Store and restore the cursor position. */
				ut_ad(rec == btr_pcur_get_rec(&pcur));
				ut_ad(page_rec_is_user_rec(rec));
				ut_ad(ibuf_rec_get_page_no(&mtr, rec)
				      == page_no);
				ut_ad(ibuf_rec_get_space(&mtr, rec) == space);

				btr_pcur_store_position(&pcur, &mtr);
				ibuf_btr_pcur_commit_specify_mtr(&pcur, &mtr);

So in this case ibuf_delete_rec() does not even have to resort to pessimistic removal to trigger the problem.

Thank you for the observation!

It seems to me that MySQL versions before 5.5 should be safe, because they only buffer inserts. Buffered inserts can be applied in an arbitrary order for a given page, and the merge operation is idempotent, because we do not check for uniqueness violations. If the record is already in the index page, we simply discard the buffered record.

Delete-buffering, which was introduced in 5.5, changed this. We must apply the operations in the original order, and we must ensure that the page does not become empty as a result of purging records from the page. This emptiness check can obviously be broken if the server is killed after applying some buffered operations.

There could be an easy fix that involves flagging the processed change buffer records in the same mini-transaction that changed the secondary index.

Marko,

>It seems to me that MySQL versions before 5.5 should be safe, because they only buffer inserts. Buffered inserts can be applied in an arbitrary order for a given page, and the merge operation is idempotent, because we do not check for uniqueness violations. If the record is already in the index page, we simply discard the buffered record.

I'm not sure, there's the same debug assertion in ibuf_insert_to_index_page() in 5.0 and 5.1:

		ut_ad(rec_get_deleted_flag(rec, page_is_comp(page)));

I.e. if the record to be merged is already on the page, we assert it must have been deleted previously.

And it's the same assertion which fails in debug 5.5 and 5.6 builds in our tests.

Anyway, even if versions before 5.5 discard duplicate buffered inserts in release builds, the main problem here is corruption rather than a failing assertion triggered by that corruption. And it looks like versions before 5.5 are vulnerable to this corruption as well. Most of the similar bug reports mention a recent upgrade from 5.1 to 5.5. Indeed, the corruption might be hidden in 5.1 due to duplicate buffered records being discarded, and revealed after an upgrade to 5.5.

I closed Bug#65443 as a duplicate of this one. This bug should also cover most (if not all) cases of Bug#58751. This may be the root cause of Bug#61104.

Thanks Alexey, I will try to see if I can repeat the failure without delete-buffering. If I can, that should explain the 5.1 failure that was reported in Bug#58751.

Alexey, you are right. When fixing Bug#56680, I added the delete-mark debug assertion for the insert merge. I was able to trigger that assertion failure with a modified test, on an instrumented server.

Sorry, I meant that Bug#58571 is a duplicate of this one, not Bug#58751.

Added to changelog for 5.1.67, 5.5.29, 5.6.8, 5.7.0: 

If the server crashed at the moment specific point when a change
buffer entry was being merged into a buffer pool page, the
transaction log and the change buffer were left in an inconsistent
state. After a restart, MySQL could crash after reading the
corresponding secondary index page. The problem was more likely to
occur in MySQL 5.5 or later, where the original insert buffering
mechanism was generalized to cover other operations.

Based on code review I wonder how this fix takes into account the problem explained in the following comment:

"[15 Sep 2012 10:59] Alexey Kopytov"

From what I see, the code is not changed in this part.

Valeriy, you are right. Sorry, my bad.

When we are merging a buffered purge operation (IBUF_OP_DELETE), we must commit the mini-transaction before we can actually delete the just-applied IBUF_OP_DELETE record.

Before committing the mini-transaction, we should flag the IBUF_OP_DELETE record processed. This remaining bug affects purge buffering only. The original fix fixed problems in insert buffering and delete-mark buffering.

5.5$ bzr log -r 4177
------------------------------------------------------------
revno: 4177
committer: Satya Bodapati <satya.bodapati@oracle.com>
branch nick: mysql-5.5
timestamp: Mon 2013-01-28 20:13:44 +0530
message:
  Bug#16183892 - INNODB PURGE BUFFERING IS NOT CRASH-SAFE
  
  With innodb_change_buffering enabled, Innodb buffers
  all modifications to secondary index leaf pages when
  the leaf pages are not in buffer pool.
  
  Crash InnoDB while an IBUF_OP_DELETE is being applied.
  Restart and note that the same record can be applied
  again which may lead to crash.
  
  Mark the change buffer record processed, so that it will
  not be merged again in case the server crashes between
  the following mtr_commit() and the subsequent mtr_commit()
  of deleting the change buffer record.
  
  Testcase: No testcase because it is difficult to get the
  timing right with the two asyncronous task purge and change
  buffering
  
  Approved by Marko. rb#1893

Our DBs crash very often recently
Error like 

The error log shows similar error:
130415 13:05:30  InnoDB: Assertion failure in thread 139661618816768 in file ibuf0ibuf.c line 4124
InnoDB: Failing assertion: page_get_n_recs(page) > 1
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

20:05:30 UTC - mysqld got signal 6 ;

We are using 5.5.20, change_buffering = all. Is there any patch we can use to prevent the crash?

Thanks!

Frank Zhou, Bug#61104 was fixed in 5.5.21. It just ignores the error, which might have in the first place been caused by the root cause of Bug#69979.