MySQL Bugs: #57650: LCP can crash data-node if getting transient errors

Bug #57650	LCP can crash data-node if getting transient errors
Submitted:	22 Oct 2010 5:38	Modified:	4 Nov 2010 14:25
Reporter:	Jonathon Coombes	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-7.0	OS:	Linux
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	7.0.13, cluster, ndbfs

Description:
2010-10-21 15:44:22 [ndbd] INFO     -- Unable to store fragment during LCP. NDBFS Error: 1217
2010-10-21 15:44:22 [ndbd] INFO     -- DBLQH (Line: 13001) 0x0000000a
2010-10-21 15:44:22 [ndbd] INFO     -- Error handler shutting down system
2010-10-21 15:44:22 [ndbd] INFO     -- Error handler shutdown completed - exiting
2010-10-21 15:44:27 [ndbd] ALERT    -- Node 14: Forced node shutdown completed. Caused by error 1217: 'No message slogan found (please report a bug if you get this error code)(Unknown). Unknown'.

How to repeat:
Not enough diskspace?

Suggested fix:
Supply an appropriate message

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/122552

3916 Jonas Oreland	2010-11-02
      ndb - bug#57650 - add retries on transient errors of backup/lcp

Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:jonas@mysql.com-20101102145326-mqsgv1srv7ns52db) (version source revid:jonas@mysql.com-20101102145326-mqsgv1srv7ns52db) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)

pushed to 7.0.20 and 7.1.9

DOCS: If a LCP got a transient error (in this case 1217) it would crash
  data-node. This patch solves this by retrying operation 10 times with
  100ms delay.

Documented bugfix in the NDB-7.0.20 and 7.1.9 changelogs, as follows:

        Transient errors during a local checkpoint were not retried,
        leading to a crash of the data node. Now when such errors occur,
        they are retried up to 10 times if necessary.

Closed.

Note: a follow up fix was made for this bug
This was made in 7.0.30 and 7.1.19

/Jonas