Bug #51512 Endless 1220 leading to GCP STOP
Submitted: 25 Feb 2010 16:12 Modified: 26 Feb 2010 16:22
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:* OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[25 Feb 2010 16:12] Jonas Oreland
Description:
Cluster returns 1220 for DML when opening/closing files in REDO logs
get too slow.

If this is reached, and the writing of a GCI marker in redo-log is performed
and open/close state involves file 0 in log-part.

Then sometimes, the 1220 could be "permanent" until gcp-stop is encountered.

Basically the problem was that fix in Bug#20904 was not applied to all code paths.

How to repeat:
create cluster with little redo space, and small redo-files
e.g NoOfFragmentLogFiles=6 FragmentLogFileSize=6M
and run very high update load.

problem will occur within 1h.

Suggested fix:
make sure that when "not" opening file 0 (as it's already open)
be sure to reset file-change-problem (if occurring) and consider marking
a gcp as complete.

this by aligning code-path's so that same code is used also for file 0.
[25 Feb 2010 16:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101489

3104 Jonas Oreland	2010-02-25
      ndb - bug#51512 - fix rare GCP stop due to endless 1220
[25 Feb 2010 16:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101490

3104 Jonas Oreland	2010-02-25
      ndb - bug#51512 - fix rare GCP stop due to endless 1220
[25 Feb 2010 16:33] Jonas Oreland
pushed to 6.3.32, 7.0.13 and 7.1.2
[26 Feb 2010 16:22] Jon Stephens
Documented bugfix in the NDB-6.3.32, 7.0.13, and 7.1.2 changelogs, as follows:

        DML operations can fail with NDB error 1220 (REDO log files
        overloaded...) if the opening and closing of in REDO logs files
        takes too much time. If this occurred as a GCI marker was being
        written in the REDO log while REDO log file 0 was being opened
        or closed, the error could persist until a GCP stop was
        encountered. This issue could be triggered when there was
        insufficient REDO log space (for example, with configuration
        parameter settings NoOfFragmentLogFiles=6 and
        FragmentLogFileSize=6M) with a load including a high number of
        updates.

Closed.