MySQL Bugs: #51512: Endless 1220 leading to GCP STOP

Bug #51512	Endless 1220 leading to GCP STOP
Submitted:	25 Feb 2010 16:12	Modified:	26 Feb 2010 16:22
Reporter:	Jonas Oreland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	*	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
Cluster returns 1220 for DML when opening/closing files in REDO logs
get too slow.

If this is reached, and the writing of a GCI marker in redo-log is performed
and open/close state involves file 0 in log-part.

Then sometimes, the 1220 could be "permanent" until gcp-stop is encountered.

Basically the problem was that fix in Bug#20904 was not applied to all code paths.

How to repeat:
create cluster with little redo space, and small redo-files
e.g NoOfFragmentLogFiles=6 FragmentLogFileSize=6M
and run very high update load.

problem will occur within 1h.

Suggested fix:
make sure that when "not" opening file 0 (as it's already open)
be sure to reset file-change-problem (if occurring) and consider marking
a gcp as complete.

this by aligning code-path's so that same code is used also for file 0.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101489

3104 Jonas Oreland	2010-02-25
      ndb - bug#51512 - fix rare GCP stop due to endless 1220

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101490

3104 Jonas Oreland	2010-02-25
      ndb - bug#51512 - fix rare GCP stop due to endless 1220

pushed to 6.3.32, 7.0.13 and 7.1.2

Documented bugfix in the NDB-6.3.32, 7.0.13, and 7.1.2 changelogs, as follows:

        DML operations can fail with NDB error 1220 (REDO log files
        overloaded...) if the opening and closing of in REDO logs files
        takes too much time. If this occurred as a GCI marker was being
        written in the REDO log while REDO log file 0 was being opened
        or closed, the error could persist until a GCP stop was
        encountered. This issue could be triggered when there was
        insufficient REDO log space (for example, with configuration
        parameter settings NoOfFragmentLogFiles=6 and
        FragmentLogFileSize=6M) with a load including a high number of
        updates.

Closed.