Bug #73689 Zero can be a valid InnoDB checksum, but validation will fail later
Submitted: 22 Aug 2014 15:31 Modified: 8 Oct 2014 18:10
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[22 Aug 2014 15:31] Jeremy Cole
Description:
This is a regression introduced by the fix for Bug #70087 (#17335427).

Zero checksum on a non-empty page doesn't always mean corruption.

InnoDB's checksum algorithm can produce zero (0) for a valid page, however the page_zip_verify_checksum was recently modified to reject pages with zero checksum and any non-zero bytes in the page. This logic is incorrect; an all-zero page (with zero checksum) is considered non-corrupted, BUT a zero checksum is also allowed on any other page, if that is the legitimate output of the checksum function.

How to repeat:
Have enough data in a production system that a page body produces a 0 checksum naturally. Observe failure.

Suggested fix:
Correct the subtle logic of this special case by continuing the checksum verification in the case that a non-zero byte is found in the page in the code path for a zero stored checksum.
[22 Aug 2014 15:38] Jeremy Cole
Proposed patch

Attachment: bug_73689.diff (application/octet-stream, text), 2.24 KiB.

[22 Aug 2014 17:40] MySQL Verification Team
Jeremy,

I have gone through all crc32 and checksum functions and I must admit that it is pretty hard for me to determine whether 0 (zero) could be a legitimate result of page calculated checksum. Also, having all zeros in the entire page is impossible, AFAIK, as each page has its header.

Have you ever encountered a valid page whose checksum was 0. Could you forward it to us in the hex mode or what ever mode you can ????
[23 Aug 2014 4:07] Jeremy Cole
Sinisa,

First of all, my apologies for not providing more detail. This is a compressed 8 KiB page, using "innodb" (adler32 seeded with a 0 starting value) checksums.

Pages can actually be completely zeroed; InnoDB does not write headers (or anything at all) to pages when they are first allocated. They will contain all zeroes until the first actual use of the page. So this is normal and expected, and needs to be specially-handled as not all checksum functions will be guaranteed to return 0 when given an input of all zero bytes. (For example, "crc32" produces 3639908756 given a 8192-byte buffer of zeroes as input. However, the "innodb" algorithm produces 0. But I digress.)

Unfortunately I can't provide the page dump output, as it contains sensitive data. However, I can say that the log prints the following messages:

2014-08-20 11:46:11 7f175f160700 InnoDB: Page dump in ascii and hex (8192 bytes):
 len 8192; hex 00000000...; asc ...
InnoDB: End of page dump
2014-08-20 11:46:11 7f175f160700 InnoDB: Compressed page type (17855); stored checksum in field1 0; calculated checksums for field1: crc32 266539467, innodb 0, none 3735928559; page LSN 6060655243799; page number (if stored to page already) 886543; space id (if stored to page already) 7741
InnoDB: Page may be an index page where index id is 9898

In order to make sure I am not completely crazy, I copied InnoDB's checksum algorithm for compressed pages from page/page0zip.cc, which amounts to the following code in Ruby:

 1 require "zlib"
 2 
 3 hex_page = File.open("bad_page.hex").read(16384)
 4 page = hex_page.each_char.each_slice(2).map { |s| s.join.to_i(16).chr }.join
 5 
 6 puts "page size = #{page.size}"
 7 
 8 # FIL header: Offset, Previous, Next
 9 adler = Zlib::adler32(page[4..15], 0)
10 puts "adler @ 1 = #{adler}"
11 # FIL header: Page Type
12 adler = Zlib::adler32(page[24..25], adler)
13 puts "adler @ 2 = #{adler}"
14 # FIL header: Space ID; remainder of page
15 adler = Zlib::adler32(page[34..8191], adler)
16 puts "adler @ 3 = #{adler}"

I saved the output (in hex) printed by InnoDB in the error log to bad_page.hex, and ran the program, and it prints the following output:

$ ruby bad_page.rb 
page size = 8192
adler @ 1 = 236651251
adler @ 2 = 357172215
adler @ 3 = 0

So this confirms that adler32 (regardless of situation) can return 0, legitimately, for the sequence of bytes present in this page.

I will nonetheless try to reproduce this with synthetic data (we need something for a test case anyway).

In the meantime, can someone from InnoDB team look at this bug and patch suggestion?
[23 Aug 2014 4:52] Davi Arnaut
> I will nonetheless try to reproduce this with synthetic data (we need something for a test case anyway).

This might help: http://nayuki.eigenstate.org/page/forcing-a-files-crc-to-any-value
[23 Aug 2014 5:16] Davi Arnaut
Expanding a little bit, the page checksum is CRC32(page header) XOR CRC32(page data). The CRC32 of the page header cannot be easily manipulated, but it's reproducible. If the CRC32 of the page data, which can be manipulated, is the same as the CRC32 of the header, checksum will be 0. Could be even simpler if unit testing, the point is that it's possible to manipulate the resulting CRC32.
[23 Aug 2014 8:46] Jeremy Cole
Davi has a good point that with crc32 algorithm encryption (as opposed to "innodb") it can be even easier to reproduce, due to the properties of XOR. The case I described is hit with "innodb" algorithm, but it ends up not really mattering. Either case is bad, and I believe my patch is a correct solution (just continue verification in the face of a zero checksum and non-empty page. And for a regular page it should fail after only comparing maximum 4 bytes, since the "offset" field will be populated with a non-zero offset (page number), causing the byte-wise 0 check to jump to "continue_checksum" label after comparing maximum 4 bytes only. So this solution is also quite efficient.
[24 Aug 2014 0:35] Davi Arnaut
How to reproduce:

$ cat t/crashme.test 
SET GLOBAL innodb_checksum_algorithm = 'CRC32';
SET GLOBAL innodb_file_per_table = ON;
CREATE TABLE t1 (a INT PRIMARY KEY, b VARBINARY(512)) ENGINE=InnoDB;
# INSERT INTO t1 VALUES (1, 'force crc32 xxxx');
# HEX('force crc32 xxxx') = '666F7263652063726333322078787878'
INSERT INTO t1 VALUES (1, X'666F72636520637263333220a0be0639');
--source include/restart_mysqld.inc
SELECT * FROM t1;

$ ./mtr crashme

Requires a debug build. Tested on 5.6.19-debug-log.

Crash log:

Version: '5.6.19-debug-log'  socket: '/home/darnaut/mysql-server/mysql-test/var/tmp/mysqld.1.sock'  port: 13000  Source distribution
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 3.
InnoDB: You may have to recover from a backup.
InnoDB: uncompressed page, stored checksum in field1 0, calculated checksums for field1: crc32 0, innodb 2285123291, none 3735928559, stored checksum in field2 0, calculated checksums for field2: crc32 0, innodb 3141041047, none 3735928559, page LSN 0 1631382, low 4 bytes of LSN at page end 1631382, page number (if stored to page already) 3, space id (if created with >= MySQL-4.1.1 and stored already) 6
InnoDB: Page may be an update undo log page
InnoDB: Page may be an index page where index id is 22
InnoDB: (index "PRIMARY" of table "test"."t1")
[25 Aug 2014 12:37] MySQL Verification Team
Fully verified, based on Davi's and Jeremy's outputs, and based also on the fact that, truly, just allocated InnoDB page contains only zeroes, without header data.

This is a regression bug, and will be treated as such.
[25 Aug 2014 22:45] Davi Arnaut
BTW, it might be simpler to do empty page detection based on the page SCN field instead of checksums. If LSN is 0, then check if whole page is empty. Otherwise, perform checksum as usual.
[8 Oct 2014 18:09] Daniel Price
revno: 6192
committer: Aditya A <aditya.a@oracle.com>
branch nick: mysql-5.6
timestamp: Wed 2014-10-08 16:43:32 +0530
message:
  Bug #19500258 ZERO CAN BE A VALID INNODB CHECKSUM, 
  	      BUT VALIDATION WILL FAIL LATER 
  
  PROBLEM
  -------
  
  Checksum of valid pages can be zero. Presently
  we treat pages with checksum value zero as 
  empty pages which is wrong, because valid 
  pages can have zero check sums.
  
  FIX
  ---
  Consider the page empty if the checksum 
  and lsn fields of the page is zero.
[8 Oct 2014 18:10] Daniel Price
Fixed as of the upcoming 5.6.22, 5.7.6 releases, and here's the changelog entry:

Pages with a checksum value of zero were incorrectly treated as empty
pages. A page should only be considered empty if its checksum value and
LSN field values are zero. 

Thank you for the bug report.
[10 Dec 2014 14:08] Laurynas Biveinis
$ bzr log -r 6199
------------------------------------------------------------
revno: 6199
committer: Aditya A <aditya.a@oracle.com>
branch nick: mysql-5.6
timestamp: Mon 2014-10-13 16:10:40 +0530
message:
  Bug #19500258 ZERO CAN BE A VALID INNODB CHECKSUM, 
  	      BUT VALIDATION WILL FAIL LATER 
  
  Post push fix and renamed the test file .
  
  [Approved by Marko #rb6837 ]
[21 Apr 2015 21:34] Justin Tolmer
There is still a problem handling checksums which are zero in 5.6.24.

Having page_zip_verify_checksum check if the LSN is zero as a method of knowing that the page is empty is not a valid assumption. When flushing compressed pages to disk on the page cleaner thread:

buf_flush_write_block_low
buf_flush_page
buf_flush_try_neighbors
buf_do_flush_list_batch
buf_flush_list
buf_flush_page_cleaner_thread

page_zip_verify_checksum is called prior to the LSN of the page being set:

https://github.com/mysql/mysql-server/blob/mysql-5.6.24/storage/innobase/buf/buf0flu.cc#L9...
[7 May 2015 11:31] Vasil Dimov
Hello,

page_zip_verify_checksum() contains the following logic:

if stored checksum == 0 && lsn on page == 0
  if all the bytes on the page are 0
    return page is valid
  else
    return page is corrupted
else
  verify the checksum normally by calculating a checksum over the data and comparing it with the stored one

Yes, in the code you mentioned page_zip_verify_checksum() is called before writing the LSN to the page. Assuming it contains some bogus value at this point, which one of the two are you experiencing:

1. It is an empty page, but the LSN value (FIL_PAGE_LSN) has some bogus contents which is != 0. Thus the above condition "&& lsn on page == 0" is false and subsequently the normal checksum verification fails. If this is the case then why an empty page has some bytes at FIL_PAGE_LSN != 0?

or

2. It is not an empty page, but LSN is still 0 and stored checksum is still 0, thus the above "stored checksum == 0 && lsn on page == 0" is true and because not all bytes of the page are 0 then the page is declared as corrupted. If this is the case when why the page has a stored checksum of 0? Is this the 1 / 2^32 chance that the checksum over some real data actually computes to a value of 0 or is it that the checksum is still not written to the page?
[7 May 2015 11:49] Marko Mäkelä
Vasil, I don't think that it is valid to write to an all-zero page without first initializing the FIL_PAGE_TYPE to something nonzero. I assume that by "empty page" you mean all-zero page (ignoring the checksum and LSN fields).

So, case 1 should be a bug on its own, if it can occur. I do not think it should be possible.

Case 2 sounds plausible to me. What if we are creating a new page (initially all bytes are zero, including the checksum and LSN fields), and coincidentally, the new checksum for the populated page happens to be 0? Could there be a bug in our logic in this case?
[11 May 2015 21:46] Justin Tolmer
Vasil, I'm talking about your case 2. The pages are not empty, the lsn is still 0 because it has not been set yet, and the computed checksum over the data is legitimately 0. Thus, the page is incorrectly declared corrupted, when it is actually completely correct, and the server aborts with a signal 6.
[15 May 2015 17:44] Vasil Dimov
Justin, thanks for your explanation. Then this is a serious bug that needs to be fixed ASAP. There are two possible solutions:

1. Write the LSN first to the page, before checking if it is corrupted

2. When checking if the page is corrupted, if checksum and LSN are 0, then if all bytes are 0, declare the page as valid, but if some bytes on the page are != 0, then still continue to the normal checksum validation mechanism, which will compute the checksum and if it ends up as 0, and thus matches the stored checksum - declare the page as valid.

The two solutions are not mutually exclusive.
[15 May 2015 18:45] Justin Tolmer
I went with the first of your suggested fixes. All indications so far as we continue to deploy the fix to our environment is that it is a stable fix.

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index 1fee088..900d5a4 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -911,11 +911,11 @@ buf_flush_write_block_low(
        case BUF_BLOCK_ZIP_DIRTY:
                frame = bpage->zip.data;
 
-               ut_a(page_zip_verify_checksum(frame, zip_size));
-
                mach_write_to_8(frame + FIL_PAGE_LSN,
                                bpage->newest_modification);
                memset(frame + FIL_PAGE_FILE_FLUSH_LSN, 0, 8);
+
+               ut_a(page_zip_verify_checksum(frame, zip_size));
                break;
        case BUF_BLOCK_FILE_PAGE:
                frame = bpage->zip.data;