Bug #69477 InnoDB: Use of large externally-stored fields makes crash recovery lose data
Submitted: 14 Jun 2013 19:06 Modified: 26 May 2014 14:52
Reporter: Jeremy Cole (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:All OS:Any
Assigned to: CPU Architecture:Any
Tags: BLOB, crash, external, innodb, redo
Triage: Needs Triage: D2 (Serious)

[14 Jun 2013 19:06] Jeremy Cole
Description:
When too-large blob fields are used, this is noted to the administrator in a rather innocuous looking message:

InnoDB: ERROR: the age of the last checkpoint is XXX,
InnoDB: which exceeds the log group capacity YYY.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

I would have expected that this means that InnoDB is stalling in order to make more space in its redo logs. However, what it actually means is that InnoDB has overwritten its most recent checkpoint in its redo logs. This compromises crash recovery, potentially causing data loss (or even metadata loss, such as writes to data dictionary tables or system tablespace data). This is easily reproducible using the attached test case.

This appears to happen because externally-stored fields are always written in a single batch to the redo logs, all while holding the log mutex, thus making it impossible to checkpoint during that write. 

This issue exists in all versions of MySQL and MariaDB.

How to repeat:
See attached test case.

Suggested fix:
There are several possible solutions to this:

1. Allow flushing to "catch up" and checkpoint during large external field writes. This will involve releasing the log mutex during the write, which is likely complex.

2. Disallow (at least optionally) such large writes. Disallowing external field writes which sum to more than 10% of the total redo log space will in theory prevent this problem, because log_free_check() is called before the write of the external field, and (although it has some races) it should ensure that 10% of the log space is available before starting the write.
[14 Jun 2013 19:06] Jeremy Cole
Test case to demonstrate problem

Attachment: innodb_blob_unrecoverable_crash.test (application/octet-stream, text), 1.54 KiB.

[14 Jun 2013 19:56] Miguel Solorzano
Thank you for the bug report.
[26 May 2014 14:52] Daniel Price
Fixed as of 5.6.20, 5.7.5, and here's the changelog entry:

Redo log writes for large, externally stored "BLOB" fields could
overwrite the most recent checkpoint. The 5.6.20 patch limits the size of
redo log "BLOB" writes to 10% of the redo log file size. The 5.7.5 patch
addresses the bug without imposing a limitation. For MySQL 5.5, the bug
remains a known limitation. 

The limitation has also been documented in the reference manual. The changes should appear soon, with the next published documentation build.

http://dev.mysql.com/doc/refman/5.1/en/innodb-restrictions.html
http://dev.mysql.com/doc/refman/5.5/en/innodb-restrictions.html

Thank you for the bug report.
[15 Jul 2014 15:08] Daniel Price
The changelog entry for this bug has been revised:

Fixed as of the upcoming 5.6.20, 5.7.5 release, and here's the changelog entry:

Redo log writes for large, externally stored "BLOB" fields could
overwrite the most recent checkpoint. The 5.6.20 patch limits the size of
redo log "BLOB" writes to 10% of the redo log file size. The 5.7.5 patch
addresses the bug without imposing a limitation. For MySQL 5.5, the bug
remains a known limitation. 

As a result of the redo log "BLOB" write limit introduced for 
MySQL 5.6, "innodb_log_file_size" should be set to a value 
greater than 10 times the largest "BLOB" data size found
in the rows of your tables. Failing to do so could result in "Row size too
large" errors. No action is required if your "innodb_log_file_size"
setting is already 10 times the largest "BLOB" data size or your tables
contain no "BLOB" data. 

Thank you for the bug report.
[7 Aug 2014 5:10] Laurynas Biveinis
$ bzr log -n0 -r 5958
------------------------------------------------------------
revno: 5958
committer: Annamalai Gurusami <annamalai.gurusami@oracle.com>
branch nick: mysql-5.6
timestamp: Thu 2014-05-22 21:23:03 +0530
message:
  Bug #16963396 INNODB: USE OF LARGE EXTERNALLY-STORED FIELDS MAKES CRASH
  RECOVERY LOSE DATA
  
  Problem:
  
  When too-large blob fields are used, InnoDB overwrites its most recent
  checkpoint in its redo logs.
  
  Solution:
  
  Ensure that the total blob length does not exceed 10% of the redo log file
  size.
  
  rb#5399 approved by Marko, Nuno, Manish. 
  Venkat also contributed to patch (in replication related test case).
[19 Sep 2014 17:06] Daniel Price
Posted by developer:
 
Please Note:

In MySQL 5.6.22, the redo log BLOB write limit is relaxed. The changelog
entry for this bug has been revised to state the following:

Redo log writes for large, externally stored "BLOB" fields could
overwrite the most recent checkpoint. The 5.6.20 patch limits the size of
redo log "BLOB" writes to 10% of the redo log file size. The 5.7.5 patch
addresses the bug without imposing a limitation. For MySQL 5.5, the bug
remains a known limitation.

 As a result of the redo log
"BLOB" write limit introduced for MySQL 5.6, the "innodb_log_file_size"
setting should be 10 times larger than the largest "BLOB" data size found
in the rows of your tables plus the length of other variable length fields
("VARCHAR", "VARBINARY", and "TEXT" type fields). No action is required if
your "innodb_log_file_size" setting is already sufficiently large or your
tables contain no "BLOB" data.

In MySQL 5.6.22, the redo log "BLOB" write limit is relaxed to 10% of
the "total redo log size" ("innodb_log_file_size" *
"innodb_log_files_in_group").