Bug #29354 Incorrect handling of replica REDO during SR (NR in 5.1)
Submitted: 26 Jun 2007 8:42 Modified: 4 Jul 2007 9:30
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:4.1,5.0,5.1 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[26 Jun 2007 8:42] Jonas Oreland
Description:
Dih does keep/update/save information on when a replica has consistent
redo log in case of node failure during node restart before first LCP

This lead lead to that it on subsequent sr instructs LQH to run local REDO
  which is incomplete

How to repeat:
#!/bin/sh

ndb_drop_table T1 > /dev/null 2>&1
create_tab T1
hugoLoad -r 100000 -l 1000 T1 &
pid=$!
sleep 10
ndb_mgm -e "3 restart -a -n"
ndb_waiter --node=3 --not-started
sleep 3
ndb_mgm -e "2 error 7184"
ndb_mgm -e "3 error 7008"
ndb_mgm -e "3 dump 2602 1"
ndb_mgm -e "3 start"
sleep 1
ndb_waiter --node=3 --not-started
kill $pid
ndb_mgm -e "all restart -a"
ndb_waiter
ndb_mgm -e "all dump 2407"
hugoScanUpdate T1

Suggested fix:
Simple solution is to send GCP_SAVE_REF until first LCP during NR is complete

Hmm...what happens then if it crashes before fragment info has been written to disk but after new GCP has been run
[3 Jul 2007 6:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30143

ChangeSet@1.2158, 2007-07-03 08:34:35+02:00, jonas@perch.ndb.mysql.com +1 -0
  ndb - bug#29354 - Incorrect handling of replica REDO during SR (wl2325-5.0)
    Not very clever fix for DIH incorrect REDO handling
    - Dont report GCP_SAVE_CONF until first LCP has been complete during NR
[3 Jul 2007 6:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30145

ChangeSet@1.2313, 2007-07-03 08:39:42+02:00, jonas@perch.ndb.mysql.com +1 -0
  ndb - bug#29354 - Incorrect handling of replica REDO during SR (5.0)
    Not very clever fix for DIH incorrect REDO handling
    - Dont report GCP_SAVE_CONF until first LCP has been complete during NR
[3 Jul 2007 17:23] Jon Stephens
Documented for telco-6.1.7 release. Left in PQ status.
[3 Jul 2007 18:57] Bugs System
Pushed into 5.1.21-beta
[4 Jul 2007 9:30] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix in 5.1.21 changelog.
[4 Jul 2007 10:07] Jon Stephens
Documented bugfix for telco-6.2.4 release.
[4 Jul 2007 20:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30331

ChangeSet@1.2159, 2007-07-04 22:37:23+02:00, jonas@perch.ndb.mysql.com +1 -0
  ndb - bug#29354 - fix bug in bug fix,
    dont assert if 2 LCP's are being run during a node recovery
[4 Jul 2007 20:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30333

ChangeSet@1.2314, 2007-07-04 22:39:55+02:00, jonas@perch.ndb.mysql.com +1 -0
  ndb - bug#29354 - fix bug in bug fix,
    dont assert if 2 LCP's are being run during a node recovery
[10 Jul 2007 13:27] Bugs System
Pushed into 5.1.21-beta
[10 Jul 2007 13:28] Bugs System
Pushed into 5.0.46
[6 Sep 2007 9:39] Jon Stephens
Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/

Documented corrected fix in 5.0.46/5.1.21/5.1.15-ndb-6.1.18 changelogs.