Bug #42052 | ndbd - Received signal 6. Running error handler | ||
---|---|---|---|
Submitted: | 12 Jan 2009 14:33 | Modified: | 12 Oct 2009 9:47 |
Reporter: | Gerhard Fürnkranz | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | mysql-5.1-telco-6.4 | OS: | Solaris (Solaris 10 / Sparc) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | 6.4 |
[12 Jan 2009 14:33]
Gerhard Fürnkranz
[12 Jan 2009 14:37]
Gerhard Fürnkranz
Log files
Attachment: ndb_server_crash.tar.gz (application/x-gzip, text), 174.25 KiB.
[12 Jan 2009 15:41]
Tomas Ulin
Gerhard, so we have a guess what this problem is. But to verify this we would like to see a backtrace from the core that you get. Please let us know if you need help on how to get the backtrace. BR, Tomas
[12 Jan 2009 16:02]
Gerhard Fürnkranz
Sorry, I did not find any core. On the machine all core files are directed to the /TspCore directory, but unfortunately I did not find any core from ndbd there. # coreadm global core file pattern: global core file content: default init core file pattern: /TspCore/core.%f.%p.%t init core file content: default global core dumps: disabled per-process core dumps: enabled global setid core dumps: disabled per-process setid core dumps: enabled global core dump logging: disabled
[13 Jan 2009 10:13]
Tomas Ulin
Gerhard, is it possible for you to configure your system so that you get core's? BR, Tomas
[13 Jan 2009 11:03]
Gerhard Fürnkranz
On the system we certainly do get core dumps from other processes, but we don't get a core dump from ndbd - so far I was not able to figure out why. We'll try to run ndbd under control of dbx in order to get a stack backtrace from the crash. -Gerhard
[13 Jan 2009 11:15]
Gerhard Fürnkranz
Stack backtrace
Attachment: stack_backtrace1.txt (text/plain), 20.53 KiB.
[13 Jan 2009 12:40]
Tomas Ulin
Gerhard, Thank you very much, it verifies that it is the error we were thinking about. We know what it is and how to fix it. It is triggered by large transactions such as yours. Hopefully we will be able to get it into the next release of 6.4. BR, Tomas
[14 Jan 2009 19:44]
Tomas Ulin
To clarify, this is a problem with all large transactions in 6.4, whether it be updates, deletes, or inserts. All will be addressed with the same bug fix. BR, Tomas
[21 Jan 2009 9:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63670 3223 Jonas Oreland 2009-01-21 ndbmtd - 1) OJ optimizations developed for cmt/bw 2) pessemstic scheduling (update_sched_config), bug#42052 only execute signals if space exist to send to other-threads in system 3) new NdbCondition_ComputeAbsTime/NdbCondition_WaitTimeoutAbs
[21 Jan 2009 9:55]
Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:jonas@mysql.com-20090121095501-mxb7w5hi56lzp1jr) (version source revid:jonas@mysql.com-20090121095501-mxb7w5hi56lzp1jr) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[21 Jan 2009 10:03]
Jonas Oreland
Description: In the ndbmtd, one thread could flood another thread, which would cause the system to stop with job-buffer-full (impl. as an abort currently) This has been prevented, by before start executing signals, one computes how many signals threads in system can accept, and only execute if space is found. The flood could be provoked by committing/aborting a large (>50k rows) on a *single datanode* ndbmtd
[21 Jan 2009 13:09]
Jon Stephens
Bugfix documented in the NDB-6.4.1 changelog as follows: When using ndbmtd, one thread could flood another thread, which would cause the system to stop with a job buffer full condition (currently implemented as an abort). This could be caused by committing or aborting a large transaction (50000 rows or more) on a single data node running ndbmtd. To prevent this from happening, the number of signals that can be accepted by the system threads is calculated before excuting them, and only executing if sufficient space is found.