MySQL Bugs: #72851: Fix for bug 16345265 in 5.6.11 breaks backward compatibility for InnoDB recovery

Bug #72851	Fix for bug 16345265 in 5.6.11 breaks backward compatibility for InnoDB recovery
Submitted:	3 Jun 2014 14:01	Modified:	4 Jun 2014 13:54
Reporter:	Alexey Kopytov	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.6	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The following fix in 5.6.11 / 5.7.1 introduced a change that can make
InnoDB crash on recovery when replaying redo logs created on earlier
versions:

"
  Bug#16345265 INFINITE PAGE SPLIT FOR A COMPRESSED PAGE, ASSERT LEVEL > 50
  
  This is regression from the code cleanup in
  rb#1761 Bug#14606334 INNODB UNABLE TO MERGE IBUF INTO PAGE
  
  page_cur_insert_rec_zip(): Check if the uncompressed page needs to be
  reorganized in order for the insert to succeed. Reorganize the page
  upfront if needed.
"

The fix changed page_cur_insert_rec_zip() to force page reorganization
even if enough space is available in the modification log on a
compressed page when an insert to the corresponding uncompressed page
would not succeed.

What the fix did not take into account is compatibility with earlier
server versions which do not perform such operation (i.e. 5.1, 5.5 and
pre-5.6.11).

When replaying such an insert (i.e. the one page_zip_available() returns
‘true’ but reorg_before_insert is also ‘true’), no page reorganization
is performed in assumption that it has already been covered by the
previous redo log entries (which is the case when replaying a redo log
created with 5.6.11+):

		if (recv_recovery_is_on()) {
			/* Insert into the uncompressed page only.
			The page reorganization or creation that we
			would attempt outside crash recovery would
			have been covered by a previous redo log record. */
                }

However, there are no corresponding records in redo logs created by
5.1/5.5/5.6.10-. Which leads to inconsistencies between compressed and
uncompressed page images and release/debug assertion failures on
subsequent page operations.

This is a likely reason for bug #71515.

How to repeat:
I only have an XtraBackup test case, but the comments to bug #71515
suggest there is a repeatable test case against MySQL Enterprise
Backup.

Hello Alexey,

Thank you for the bug report.

Thanks,
Umesh

Alexey,
your observation looks correct to me.

Unfortunately, I do not think that we can fix this bug.
The redo logging for compressed tables was broken before this fix.
Because of the breakage, we got a hard-to-reproduce corruption in
redo log apply when we were testing the option innodb_log_compressed_pages=OFF.

We removed this option from the MySQL 5.6.10 GA release because of the problem.
The problem was fixed in MySQL 5.6.11, by changing the way how certain compressed page operations are written to the redo log and how the redo log
gets applied. I did not realize that this would affect the redo log compatibility when using innodb_log_compressed_pages=ON. So, unfortunately the fix broke our policy that there should not be incompatible file format changes after the GA release.

Side note: While working on the fix, I noticed that there is another problem with the logging of MLOG_ZIP_PAGE_REORGANIZE record. Essentially, this record is
assuming that the compression algorithm and parameters remain the same across the recovery. The record is not including the zlib compression level. This was fixed in MySQL 5.6.12 by Oracle Bug#16267120 without introducing a redo log incompatibility.

One more thing:

Upgrading from an earlier server version should only be done after performing a slow shutdown on the old server:

SET GLOBAL innodb_fast_shutdown=0;

and then initiating a shutdown. Or invoke the old server with --innodb-fast-shutdown=0.

After a shutdown, the redo log would be empty and the redo log apply would be skipped. After a slow shutdown, the system is even cleaner: there must be no incomplete transactions, and no pending change buffer merge.

You cannot really expect a 5.x server to start up after a crash from a 5.(x-1) server.

InnoDB could make the life of backup tools easier by adding some format version identifier to the InnoDB redo log header, to prevent problems like this in the future. It would also prevent the problem of getting a nasty error message when an unknown redo log record type is encountered.

Marko,

Thanks for clarifications.

I’m wondering if it is possible to fix the bug by simply ignoring
reorg_before_insert on recovery?

For 5.6.11+, if the server performs a reorg-before-insert operation
during runtime, the insert seems to be only redo-logged if the page is
successfully reorganized to accommodate the record. That is,
reorg_before_insert is always ‘false’ on recovery for 5.6.11+ data,
based on my (limited) code analysis and tests.

For earlier server versions, ignoring reorg_before_insert on recovery
could at least maintain the status quo, i.e. keep redo-logging as broken
as it is currently in those versions without introducing new issues. I’m
also not sure if the problem addressed by reorg_before_insert is
specific to innodb_log_compressed_pages=OFF. If so, it doesn’t apply to
5.1/5.5?

I have implemented this idea in XtraBackup and it passes the XtraBackup suite
for 5.1, 5.5 and 5.6 based servers. Which of course does not guarantee
correctness, 

I also understand it is an unlikely and probably undocumented (I didn’t
check) scenario for the server when a 5.x
recovery is performed on redo logs created with 5.(x-1). But apparently
people expect it to work, especially when dealing with filesystem
snapshots, see https://bugs.launchpad.net/percona-server/+bug/1295672
for example.

But for backup tools, it is obviously a much bigger problem, and I agree
having some version identifier in the InnoDB redo log header would be a
big help.

I filed Bug#78275 Implement an InnoDB redo log format version identifier
that could allow us to avoid this kind of issues in the future.

Of course, also in the future, InnoDB developers would have to remember to adjust the format version identifier whenever the redo log format is changed in some way.