MySQL Bugs: #49589: Job Buffer Full aborts in mt.cpp

Bug #49589	Job Buffer Full aborts in mt.cpp
Submitted:	10 Dec 2009 15:35	Modified:	28 Dec 2009 18:26
Reporter:	Andrew Hutchings	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-7.0	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
When a job buffer full condition occurs the node fails with abort().  This does not give us a lot of diagnostic information to work with beyond the fact that "job buffer full" caused the failure.

How to repeat:
No test case yet, look at kernel/vm/mt.cpp:

static
void
job_buffer_full()
{
  ndbout_c("job buffer full");
  abort();
}

static
thr_job_buffer*
seize_buffer(struct thr_repository* rep, int thr_no, bool prioa)
{
...
        if (unlikely(cnt == 0))
        {
          job_buffer_full();
        }

Suggested fix:
1. Ideally the node would not fail at all, but this may be a needed step.
2. Change to a proper node failure with more diagnostic information in the node log file if possible,

Fix

Attachment: bug49589.tgz (application/x-compressed-tar, text), 2.65 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95251

3301 Jonas Oreland	2009-12-21
      ndb - bug#49589 - change Dbtc not to send too many commit/complete which can lead to job-buffer-full specially for ndbmtd with 1-datanode

pushed to 7.0.11

Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091221143437-5yoz2xr89h6diz7b) (version source revid:jonas@mysql.com-20091221143437-5yoz2xr89h6diz7b) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)

Documented bugfic in the NDB-7.0.11 changelog as follows:

        Under some circumstances, the DBTC kernel block could send an
        excessive number of commit and complete messages which could
        lead to a the job buffer filling up and node failure. This was
        especially likely to occur when using ndbmtd with a single data 
        node.

Closed.