Bug #73689 | Zero can be a valid InnoDB checksum, but validation will fail later | ||
---|---|---|---|
Submitted: | 22 Aug 2014 15:31 | Modified: | 8 Oct 2014 18:10 |
Reporter: | Jeremy Cole (Basic Quality Contributor) (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 5.6 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Aug 2014 15:31]
Jeremy Cole
[22 Aug 2014 15:38]
Jeremy Cole
Proposed patch
Attachment: bug_73689.diff (application/octet-stream, text), 2.24 KiB.
[22 Aug 2014 17:40]
MySQL Verification Team
Jeremy, I have gone through all crc32 and checksum functions and I must admit that it is pretty hard for me to determine whether 0 (zero) could be a legitimate result of page calculated checksum. Also, having all zeros in the entire page is impossible, AFAIK, as each page has its header. Have you ever encountered a valid page whose checksum was 0. Could you forward it to us in the hex mode or what ever mode you can ????
[23 Aug 2014 4:07]
Jeremy Cole
Sinisa, First of all, my apologies for not providing more detail. This is a compressed 8 KiB page, using "innodb" (adler32 seeded with a 0 starting value) checksums. Pages can actually be completely zeroed; InnoDB does not write headers (or anything at all) to pages when they are first allocated. They will contain all zeroes until the first actual use of the page. So this is normal and expected, and needs to be specially-handled as not all checksum functions will be guaranteed to return 0 when given an input of all zero bytes. (For example, "crc32" produces 3639908756 given a 8192-byte buffer of zeroes as input. However, the "innodb" algorithm produces 0. But I digress.) Unfortunately I can't provide the page dump output, as it contains sensitive data. However, I can say that the log prints the following messages: 2014-08-20 11:46:11 7f175f160700 InnoDB: Page dump in ascii and hex (8192 bytes): len 8192; hex 00000000...; asc ... InnoDB: End of page dump 2014-08-20 11:46:11 7f175f160700 InnoDB: Compressed page type (17855); stored checksum in field1 0; calculated checksums for field1: crc32 266539467, innodb 0, none 3735928559; page LSN 6060655243799; page number (if stored to page already) 886543; space id (if stored to page already) 7741 InnoDB: Page may be an index page where index id is 9898 In order to make sure I am not completely crazy, I copied InnoDB's checksum algorithm for compressed pages from page/page0zip.cc, which amounts to the following code in Ruby: 1 require "zlib" 2 3 hex_page = File.open("bad_page.hex").read(16384) 4 page = hex_page.each_char.each_slice(2).map { |s| s.join.to_i(16).chr }.join 5 6 puts "page size = #{page.size}" 7 8 # FIL header: Offset, Previous, Next 9 adler = Zlib::adler32(page[4..15], 0) 10 puts "adler @ 1 = #{adler}" 11 # FIL header: Page Type 12 adler = Zlib::adler32(page[24..25], adler) 13 puts "adler @ 2 = #{adler}" 14 # FIL header: Space ID; remainder of page 15 adler = Zlib::adler32(page[34..8191], adler) 16 puts "adler @ 3 = #{adler}" I saved the output (in hex) printed by InnoDB in the error log to bad_page.hex, and ran the program, and it prints the following output: $ ruby bad_page.rb page size = 8192 adler @ 1 = 236651251 adler @ 2 = 357172215 adler @ 3 = 0 So this confirms that adler32 (regardless of situation) can return 0, legitimately, for the sequence of bytes present in this page. I will nonetheless try to reproduce this with synthetic data (we need something for a test case anyway). In the meantime, can someone from InnoDB team look at this bug and patch suggestion?
[23 Aug 2014 4:52]
Davi Arnaut
> I will nonetheless try to reproduce this with synthetic data (we need something for a test case anyway). This might help: http://nayuki.eigenstate.org/page/forcing-a-files-crc-to-any-value
[23 Aug 2014 5:16]
Davi Arnaut
Expanding a little bit, the page checksum is CRC32(page header) XOR CRC32(page data). The CRC32 of the page header cannot be easily manipulated, but it's reproducible. If the CRC32 of the page data, which can be manipulated, is the same as the CRC32 of the header, checksum will be 0. Could be even simpler if unit testing, the point is that it's possible to manipulate the resulting CRC32.
[23 Aug 2014 8:46]
Jeremy Cole
Davi has a good point that with crc32 algorithm encryption (as opposed to "innodb") it can be even easier to reproduce, due to the properties of XOR. The case I described is hit with "innodb" algorithm, but it ends up not really mattering. Either case is bad, and I believe my patch is a correct solution (just continue verification in the face of a zero checksum and non-empty page. And for a regular page it should fail after only comparing maximum 4 bytes, since the "offset" field will be populated with a non-zero offset (page number), causing the byte-wise 0 check to jump to "continue_checksum" label after comparing maximum 4 bytes only. So this solution is also quite efficient.
[24 Aug 2014 0:35]
Davi Arnaut
How to reproduce: $ cat t/crashme.test SET GLOBAL innodb_checksum_algorithm = 'CRC32'; SET GLOBAL innodb_file_per_table = ON; CREATE TABLE t1 (a INT PRIMARY KEY, b VARBINARY(512)) ENGINE=InnoDB; # INSERT INTO t1 VALUES (1, 'force crc32 xxxx'); # HEX('force crc32 xxxx') = '666F7263652063726333322078787878' INSERT INTO t1 VALUES (1, X'666F72636520637263333220a0be0639'); --source include/restart_mysqld.inc SELECT * FROM t1; $ ./mtr crashme Requires a debug build. Tested on 5.6.19-debug-log. Crash log: Version: '5.6.19-debug-log' socket: '/home/darnaut/mysql-server/mysql-test/var/tmp/mysqld.1.sock' port: 13000 Source distribution InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 3. InnoDB: You may have to recover from a backup. InnoDB: uncompressed page, stored checksum in field1 0, calculated checksums for field1: crc32 0, innodb 2285123291, none 3735928559, stored checksum in field2 0, calculated checksums for field2: crc32 0, innodb 3141041047, none 3735928559, page LSN 0 1631382, low 4 bytes of LSN at page end 1631382, page number (if stored to page already) 3, space id (if created with >= MySQL-4.1.1 and stored already) 6 InnoDB: Page may be an update undo log page InnoDB: Page may be an index page where index id is 22 InnoDB: (index "PRIMARY" of table "test"."t1")
[25 Aug 2014 12:37]
MySQL Verification Team
Fully verified, based on Davi's and Jeremy's outputs, and based also on the fact that, truly, just allocated InnoDB page contains only zeroes, without header data. This is a regression bug, and will be treated as such.
[25 Aug 2014 22:45]
Davi Arnaut
BTW, it might be simpler to do empty page detection based on the page SCN field instead of checksums. If LSN is 0, then check if whole page is empty. Otherwise, perform checksum as usual.
[8 Oct 2014 18:09]
Daniel Price
revno: 6192 committer: Aditya A <aditya.a@oracle.com> branch nick: mysql-5.6 timestamp: Wed 2014-10-08 16:43:32 +0530 message: Bug #19500258 ZERO CAN BE A VALID INNODB CHECKSUM, BUT VALIDATION WILL FAIL LATER PROBLEM ------- Checksum of valid pages can be zero. Presently we treat pages with checksum value zero as empty pages which is wrong, because valid pages can have zero check sums. FIX --- Consider the page empty if the checksum and lsn fields of the page is zero.
[8 Oct 2014 18:10]
Daniel Price
Fixed as of the upcoming 5.6.22, 5.7.6 releases, and here's the changelog entry: Pages with a checksum value of zero were incorrectly treated as empty pages. A page should only be considered empty if its checksum value and LSN field values are zero. Thank you for the bug report.
[10 Dec 2014 14:08]
Laurynas Biveinis
$ bzr log -r 6199 ------------------------------------------------------------ revno: 6199 committer: Aditya A <aditya.a@oracle.com> branch nick: mysql-5.6 timestamp: Mon 2014-10-13 16:10:40 +0530 message: Bug #19500258 ZERO CAN BE A VALID INNODB CHECKSUM, BUT VALIDATION WILL FAIL LATER Post push fix and renamed the test file . [Approved by Marko #rb6837 ]
[21 Apr 2015 21:34]
Justin Tolmer
There is still a problem handling checksums which are zero in 5.6.24. Having page_zip_verify_checksum check if the LSN is zero as a method of knowing that the page is empty is not a valid assumption. When flushing compressed pages to disk on the page cleaner thread: buf_flush_write_block_low buf_flush_page buf_flush_try_neighbors buf_do_flush_list_batch buf_flush_list buf_flush_page_cleaner_thread page_zip_verify_checksum is called prior to the LSN of the page being set: https://github.com/mysql/mysql-server/blob/mysql-5.6.24/storage/innobase/buf/buf0flu.cc#L9...
[7 May 2015 11:31]
Vasil Dimov
Hello, page_zip_verify_checksum() contains the following logic: if stored checksum == 0 && lsn on page == 0 if all the bytes on the page are 0 return page is valid else return page is corrupted else verify the checksum normally by calculating a checksum over the data and comparing it with the stored one Yes, in the code you mentioned page_zip_verify_checksum() is called before writing the LSN to the page. Assuming it contains some bogus value at this point, which one of the two are you experiencing: 1. It is an empty page, but the LSN value (FIL_PAGE_LSN) has some bogus contents which is != 0. Thus the above condition "&& lsn on page == 0" is false and subsequently the normal checksum verification fails. If this is the case then why an empty page has some bytes at FIL_PAGE_LSN != 0? or 2. It is not an empty page, but LSN is still 0 and stored checksum is still 0, thus the above "stored checksum == 0 && lsn on page == 0" is true and because not all bytes of the page are 0 then the page is declared as corrupted. If this is the case when why the page has a stored checksum of 0? Is this the 1 / 2^32 chance that the checksum over some real data actually computes to a value of 0 or is it that the checksum is still not written to the page?
[7 May 2015 11:49]
Marko Mäkelä
Vasil, I don't think that it is valid to write to an all-zero page without first initializing the FIL_PAGE_TYPE to something nonzero. I assume that by "empty page" you mean all-zero page (ignoring the checksum and LSN fields). So, case 1 should be a bug on its own, if it can occur. I do not think it should be possible. Case 2 sounds plausible to me. What if we are creating a new page (initially all bytes are zero, including the checksum and LSN fields), and coincidentally, the new checksum for the populated page happens to be 0? Could there be a bug in our logic in this case?
[11 May 2015 21:46]
Justin Tolmer
Vasil, I'm talking about your case 2. The pages are not empty, the lsn is still 0 because it has not been set yet, and the computed checksum over the data is legitimately 0. Thus, the page is incorrectly declared corrupted, when it is actually completely correct, and the server aborts with a signal 6.
[15 May 2015 17:44]
Vasil Dimov
Justin, thanks for your explanation. Then this is a serious bug that needs to be fixed ASAP. There are two possible solutions: 1. Write the LSN first to the page, before checking if it is corrupted 2. When checking if the page is corrupted, if checksum and LSN are 0, then if all bytes are 0, declare the page as valid, but if some bytes on the page are != 0, then still continue to the normal checksum validation mechanism, which will compute the checksum and if it ends up as 0, and thus matches the stored checksum - declare the page as valid. The two solutions are not mutually exclusive.
[15 May 2015 18:45]
Justin Tolmer
I went with the first of your suggested fixes. All indications so far as we continue to deploy the fix to our environment is that it is a stable fix. diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc index 1fee088..900d5a4 100644 --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -911,11 +911,11 @@ buf_flush_write_block_low( case BUF_BLOCK_ZIP_DIRTY: frame = bpage->zip.data; - ut_a(page_zip_verify_checksum(frame, zip_size)); - mach_write_to_8(frame + FIL_PAGE_LSN, bpage->newest_modification); memset(frame + FIL_PAGE_FILE_FLUSH_LSN, 0, 8); + + ut_a(page_zip_verify_checksum(frame, zip_size)); break; case BUF_BLOCK_FILE_PAGE: frame = bpage->zip.data;