Bug #61500 Rare bug in redo invalidation can lead to "Error while reading the REDO log"
Submitted: 13 Jun 2011 11:27 Modified: 14 Jun 2011 14:16
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:6.3.0 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[13 Jun 2011 11:27] Jonas Oreland
Description:
===========
A    B    C
GCI

Suppose alive redo-log writes log from A to C
It will fsync redo-log
- at each GCI (A)
- end of each file
- start of each mega-byte

suppose that there is no end-of-file or mega-byte border between
A and C.

Then (very) rarely it could be that OS makes C durable on disk,
but B never gets written.

This scenario could lead to data-node starting assume that end of redo-log is
somewhere in between A and B. If data-node then starts, and gets stopped again
before having over-written C, it can be at next restart it encounters a "Error while reading the REDO log"

---

This is quite similar to http://bugs.mysql.com/bug.php?id=56961

How to repeat:
repeated (every now and then) by autotest on solaris

Suggested fix:
.
[13 Jun 2011 12:32] Jonas Oreland
pushed to 6.3.45, 7.0.26 and 7.1.15
[14 Jun 2011 14:16] Jon Stephens
Documented bugfix in the NDB 6.3.45, 7.0.26, and 7.1.15 changelogs, as follows:

        When global checkpoint indexes were written with no intervening
        end-of-file or megabyte border markers, this could sometimes
        lead to a situation in which the end of the redo log was
        mistakenly regarded as being between these GCIs, so that if the
        restart of a data node took place before the start of the next
        redo log was overwritten, the node encountered -Error while
        reading the REDO log-.

Closed.