| Bug #49589 | Job Buffer Full aborts in mt.cpp | ||
|---|---|---|---|
| Submitted: | 10 Dec 2009 15:35 | Modified: | 28 Dec 2009 18:26 |
| Reporter: | Andrew Hutchings | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | mysql-5.1-telco-7.0 | OS: | Any |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[21 Dec 2009 11:55]
Jonas Oreland
Fix
Attachment: bug49589.tgz (application/x-compressed-tar, text), 2.65 KiB.
[21 Dec 2009 14:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95251 3301 Jonas Oreland 2009-12-21 ndb - bug#49589 - change Dbtc not to send too many commit/complete which can lead to job-buffer-full specially for ndbmtd with 1-datanode
[21 Dec 2009 14:37]
Jonas Oreland
pushed to 7.0.11
[21 Dec 2009 14:37]
Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091221143437-5yoz2xr89h6diz7b) (version source revid:jonas@mysql.com-20091221143437-5yoz2xr89h6diz7b) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[28 Dec 2009 18:26]
Jon Stephens
Documented bugfic in the NDB-7.0.11 changelog as follows:
Under some circumstances, the DBTC kernel block could send an
excessive number of commit and complete messages which could
lead to a the job buffer filling up and node failure. This was
especially likely to occur when using ndbmtd with a single data
node.
Closed.

Description: When a job buffer full condition occurs the node fails with abort(). This does not give us a lot of diagnostic information to work with beyond the fact that "job buffer full" caused the failure. How to repeat: No test case yet, look at kernel/vm/mt.cpp: static void job_buffer_full() { ndbout_c("job buffer full"); abort(); } static thr_job_buffer* seize_buffer(struct thr_repository* rep, int thr_no, bool prioa) { ... if (unlikely(cnt == 0)) { job_buffer_full(); } Suggested fix: 1. Ideally the node would not fail at all, but this may be a needed step. 2. Change to a proper node failure with more diagnostic information in the node log file if possible,