Bug #54986 Incorrect handling of LSN for empty pages
Submitted: 4 Jul 2010 15:56 Modified: 12 Jul 2010 11:16
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.2 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[4 Jul 2010 15:56] Jonas Oreland
Description:
As an optimization when inserting a row to an empty
page, the page is not read. But simply initialized.

This was performed always, but should only be performed
in first time page was used by a table/fragment.
If pages has been in use, and then all records has been
released from it, we must still read it to find it's LSN.

This did only cause problem, iff page was flushed with incorrect LSN
and datanode crashed before any local checkpoint was completed.
(which would removed need to apply undo, hence the incorrect LSN
 was ignored)

The result of the incorrect LSN is that it could crash during
restart...and possibly also incorrect data (although I havent manage
to produce such a scenario)

How to repeat:
1) create table
2) insert into t1 some values
3) force & wait for LCP
4) delete all rows
5) make sure pages are flush (e.g by reading from different DD-table)
6) do uncommitted insert
7) force LCP which should die before it completes, but after all pages has been flushed again

After this data node would crash on restart

Suggested fix:
Only reinitialize LSN on empty page once since extent
has been assigned to fragment.
[4 Jul 2010 16:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/112835

3113 Jonas Oreland	2010-07-04
      ndb - bug#54986 - only set EMPTY_PAGE first time page is accessed in extent (to not reset LSN)
[4 Jul 2010 16:38] Jonas Oreland
pushed to 6.2.19, 6.3.36, 7.0.17 and 7.1.6
[12 Jul 2010 11:16] Jon Stephens
Documented bugfix in the NDB-6.2.19, 6.3.36, 7.0.17, and 7.1.6 changelogs, as follows:

        As an optimization when inserting a row to an empty page, the
        page is not read, but rather simply initialized. However, this
        optimzation was performed in all cases when an empty row was
        inserted, even though it should have been done only if it was
        the first time that the page had been used by a table or
        fragment. This is because, if the page had been in use, and then
        all records had been released from it, the page still needed to
        be read in order to learn its log sequence number (LSN).

        This caused problems only if the page had been flushed using an
        incorrect LSN and the data node failed before any local
        checkpoint was completed—which would removed any need to
        apply the undo log, hence the incorrect LSN was ignored.

        The user-visible result of the incorrect LSN was that it caused
        the data node to fail during a restart. It was perhaps also
        possible (although not conclusively proven) that this issue
        could lead to incorrect data.

Closed.