Bug #70087 InnoDB can not use the doublewrite buffer properly
Submitted: 20 Aug 2013 2:11 Modified: 19 Dec 2013 17:50
Reporter: Nizameddin Ordulu Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any
Tags: doublewrite, recovery

[20 Aug 2013 2:11] Nizameddin Ordulu
Description:
During recovery InnoDB opens .ibd files in fil_load_single_table_tablespaces() before scanning the doublewrite buffer for the torn pages. If fil_load_single_table_tablespace() calls fil_validate_single_table_tablespace() which checks whether the first page of the ibd file is corrupt. If so, it doesn't continue the recovery.

How to repeat:
Fast shutdown mysqld while the pages are being written to the doublewrite buffer. Make sure that for at least one table, its first page is in the doublewrite buffer. Then go corrupt that first page on disk. Now innodb won't be able to recover even though a valid copy of the page is in the doublewrite buffer.

Suggested fix:
Remove the check for the first page when opening a table if it happens during crash recovery.
[20 Aug 2013 17:58] Sinisa Milivojevic
Careful analysis of the code clearly shows that if necessary conditions are met that  fil_validate_single_table_tablespace()  function is called during the  execution of fil_load_single_table_tablespace() function, then, the contents of the double-write buffer will not be used to recover a table.

This would have been  a feature request, were it not for the recovery of data after crash. As durability is very important attribute of the InnoDB Storage Engine, this, therefore is a bug.

I also think that the title of the bug should be finished with " for the recovery".
[20 Aug 2013 19:23] Marko Mäkelä
This sounds plausible.

I think that this should be repeatable with a workload like this:

CREATE TABLE t(b BLOB)ENGINE=InnoDB;
BEGIN; INSERT INTO t VALUES(REPEAT('blob ',12345)); ROLLBACK;
BEGIN; INSERT INTO t VALUES(REPEAT('blob ',12345)); ROLLBACK;
...

and killing the server during the workload in such a way that a torn write to page 0 of the *.ibd file happens. (Alternatively, you can artificially corrupt the first page and see if it can be recovered from the doublewrite buffer.)

The first page of each InnoDB tablespace holds the allocation bitmap for pages 0 through page_size-1. The bitmap would be updated for allocating and freeing the BLOB pages in the above SQL.

Earlier this year, I noticed that some access to the first page of the *.ibd file was ignoring the page checksum. On checksum mismatch, we should really always try to recover the page from the doublewrite buffer if it is available.
[21 Aug 2013 12:08] Sinisa Milivojevic
Marko,

Thank you for confirming my verification and for additional comments which will help considerably the improvement of recovery procedures.
[26 Aug 2013 7:13] Guangpu Feng
Is this a duplicate of Bug#69623
[26 Aug 2013 7:23] Guangpu Feng
I want to know from which version this bug was intruduced. I have checked the codes of 5.5.18, which is the version we used in production, and find no functions named *fil_validate_single_table_tablespace*, does it mean that 5.5.18 is safe from this bug?
[26 Aug 2013 13:27] Sinisa Milivojevic
Hi,

No, this is not a duplicate bug of # 69623.

Also, 5.5 has it's own recovery process that also did not consult doublewrite buffer. Code is differently organized, but it also missed on checking what is available.
[27 Aug 2013 9:25] Guangpu Feng
Marko

I tried your method against Percona server 5.5.18, but can't repeat, following is the script:(sleep time is shorter than the *for* execution time to make sure kill is performed during the query)

---------------------------------------------------------------

$vim bug70087.sh    

#!/bin/bash

mysql="mysql -uroot -S /tmp/mysql.sock"

`$mysql -e 'DROP TABLE IF EXISTS test.t; CREATE TABLE test.t(b BLOB)ENGINE=INNODB;'`

(
sleep 5
kill -9 `pidof mysqld`
) &                                                                                                                          

for i in i{1..2000}
do
  `$mysql -e "BEGIN; INSERT INTO test.t VALUES(REPEAT('blob ',12345)); ROLLBACK;"`
done

---------------------------------------------------------------

can anybody provide a test case that can definitely repeat this bug?
[19 Dec 2013 17:50] Daniel Price
Fixed as of 5.6.16, 5.7.4:

"If the first page (page 0) of file-per-table tablespace data
file was corrupt, recovery would be halted even though the
doublewrite buffer contained a clean copy of the page."

Thank you for the bug report.
[3 Feb 2014 11:50] Laurynas Biveinis
5.6$ bzr log -r 5703
------------------------------------------------------------
revno: 5703
committer: Annamalai Gurusami <annamalai.gurusami@oracle.com>
branch nick: mysql-5.6
timestamp: Thu 2013-12-19 13:20:50 +0530
message:
  Bug #17335427 INNODB CAN NOT USE THE DOUBLEWRITE BUFFER PROPERLY
  
  Problem:
  
  If the first page (page 0) of the single table tablespace is corrupted in the
  data file then our recovery doesn't progress even if there is a clean copy of
  the same available in the double write buffer.  
  
  Analysis:
  
  During recovery, our first step is to process the double write buffer.  We look
  at the pages in the double write buffer and determine its (space_id, page_no)
  details.  Each of the page in the double write buffer corresponds to a page in
  the .ibd data file.  Using the space_id information we need to map the page in
  the double write buffer to the corresponding ibd file.  This is done by reading
  the space_id information from the first page of the single table tablespace.
  If the first page of the single table tablespace is corrupted, then we are
  unable to determine the data file to which a particular page in the double
  write buffer belongs to.  So we need to explore and see if we can determine the
  space_id in other means.
  
  Solution:
  
  Assume a particular page size.  Read N number of pages from the ibd file.
  Ignore the corrupted pages and determine the (space_id, page_size and zip_size)
  information.  Repeat this for all supported page sizes.  Using this approach
  determine the correct (space_id, page_size and zip_size) of the ibd file.
  
  rb#4025 approved by Yasufumi.
[3 Feb 2014 11:53] Laurynas Biveinis
5.6$ bzr log -r 5707
------------------------------------------------------------
revno: 5707
committer: Annamalai Gurusami <annamalai.gurusami@oracle.com>
branch nick: mysql-5.6
timestamp: Fri 2013-12-20 12:05:46 +0530
message:
  BUG 17335427 - INNODB CAN NOT USE THE DOUBLEWRITE BUFFER PROPERLY 
  
  Problem:
  
  Fixing a memory issue in my original fix.  This was identified from PB2
  failures.  If the page is uncompressed, then its size must be equal to
  UNIV_PAGE_SIZE.  The buf_page_is_corrupted() assumes the size of the
  uncompressed pages as equal to UNIV_PAGE_SIZE. 
  
  Solution:
  
  Call buf_page_is_corrupted() for uncompressed pages only if page size is
  equal to UNIV_PAGE_SIZE.
  
  approved by Yasufumi over IM.
[28 Mar 2014 19:23] Laurynas Biveinis
5.6$ bzr log -r 5776 -n0
------------------------------------------------------------
revno: 5776
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: mysql-5.6
timestamp: Tue 2014-01-28 12:02:37 +0200
message:
  Bug#17335427 INNODB CAN NOT USE THE DOUBLEWRITE BUFFER PROPERLY
  
  Clean up the test a little.
[22 Aug 2014 15:40] Jeremy Cole
This fix appears to have introduced a regression for legitimate zero-checksum pages which are now seen as corrupt. I filed a bug for the regression here:

http://bugs.mysql.com/bug.php?id=73689