Bug #49696 | Error 6000 with multithreaded ndbmtd | ||
---|---|---|---|
Submitted: | 14 Dec 2009 21:14 | Modified: | 8 Jan 2010 21:07 |
Reporter: | Troy A | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 7.0.9 | OS: | Linux |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | 6000, Abort, ndbmtd, SIBABRT, threaded |
[14 Dec 2009 21:14]
Troy A
[14 Dec 2009 21:23]
Troy A
relevant mysql traces from all four data nodes
Attachment: mysql-traces.tar.lzma (application/x-lzma, text), 341.77 KiB.
[14 Dec 2009 22:08]
Jonas Oreland
Hi Troy, I couldnt find any cluster log, and error logs, nor any ndb_X_out.log files Can you attach them too ?
[14 Dec 2009 22:25]
Troy A
Relevant error log from the four nodes Time: Friday 11 December 2009 - 17:14:39 Status: Temporary error, restart node Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug) Error: 6000 Error data: Signal 6 received; Aborted Error object: ndbd.cpp Program: /usr/libexec/ndbmtd Pid: 31689 thr: 0 Version: mysql-5.1.39 ndb-7.0.9b Trace: /var/lib/mysql-cluster/ndb_1_trace.log.17 /var/lib/mysql-cluster/ndb_1_trace.log.17_t1 /var/lib/mysql-cluster/ndb_1_trace.log.17_t2 /var/lib/m Time: Friday 11 December 2009 - 17:14:40 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: CMVMI (Line: 512) 0x0000000e Program: /usr/libexec/ndbmtd Pid: 4837 thr: 3 Version: mysql-5.1.39 ndb-7.0.9b Trace: /var/lib/mysql-cluster/ndb_2_trace.log.15 /var/lib/mysql-cluster/ndb_2_trace.log.15_t1 /var/lib/mysql-cluster/ndb Time: Friday 11 December 2009 - 17:14:44 Status: Temporary error, restart node Message: Node lost connection to other nodes and can not form a unpartitioned cl uster, please investigate if there are error(s) on other node(s) (Arbitration error) Error: 2305 Error data: Arbitrator decided to shutdown this node Error object: QMGR (Line: 5532) 0x0000000e Program: /usr/libexec/ndbmtd Pid: 20143 thr: 0 Version: mysql-5.1.39 ndb-7.0.9b Trace: /var/lib/mysql-cluster/ndb_3_trace.log.18 /var/lib/mysql-cl Time: Friday 11 December 2009 - 17:14:42 Status: Temporary error, restart node Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error) Error: 2305 Error data: Arbitrator decided to shutdown this node Error object: QMGR (Line: 5532) 0x0000000e Program: /usr/libexec/ndbmtd Pid: 9792 thr: 0 Version: mysql-5.1.39 ndb-7.0.9b Trace: /var/lib/mysql-cluster/ndb_4_trace.log.20 /var/lib/mysql-clu
[14 Dec 2009 22:29]
Troy A
out.log from each data node
Attachment: out.log.tar.gz (application/x-gzip, text), 220.36 KiB.
[14 Dec 2009 22:30]
Troy A
Apologies in advance if I still missed any files that are required. Thanks for taking a look.
[14 Dec 2009 22:36]
Jonas Oreland
Hi, It looks like a duplicate of http://bugs.mysql.com/bug.php?id=49589 and http://bugs.mysql.com/bug.php?id=48852 But...more files are need to be sure ndb_X_error.log, ndb_Y_cluster.log* config.ini /Jonas
[14 Dec 2009 22:49]
Troy A
Remainder of logs
Attachment: error_cluster_logs.tar.gz (application/x-gzip, text), 478.84 KiB.
[14 Dec 2009 22:51]
Troy A
config.ini
Attachment: config.ini (application/octet-stream, text), 1.78 KiB.
[14 Dec 2009 22:57]
Troy A
Well, it seems http://bugs.mysql.com/bug.php?id=48852 is just related to how it reports the error, so it wouldn't be causal. I believe earlier Error 6000s on one node even caused different messages in our error.logs on the corresponding node, such as: Error: 2303 Error data: Node 2 killed this node because GCP stop was detected The other one, http://bugs.mysql.com/bug.php?id=49589 seems plausible to me. Thanks again!
[14 Dec 2009 23:02]
Jonas Oreland
Hi again, 1) yes, 48852 is only how the error is reported...but if it was reported correctly you probably would have increased the size of your send buffers. 2) the out files are filled with "0 - sleeploop 10!!" which is very very bad...So what kind of load are you running ?? would it be possible to somehow try to mimic it? 3) you still havent uploaded the config.ini, right ? /Jonas
[14 Dec 2009 23:14]
Troy A
1) Makes sense. We did see that bug before and increased sendbuffers to see if it fixed the problem. We didn't see any change so we went back to defaults. 2) We have mogilefs running against the cluster, as well as our own app. It's not in full production quite yet so we're at liberty to switch back to mt to test. The usage from our app shouldn't be anything special. I was only using two connections (to two databases) from the app, and looping through the statements given in the original bug description seemed to be the only trigger for the crash, as mogilefs should have been relatively idle. I did have some large transactions that failed previously, but I've rewritten the app to use smaller batches when possible. 3) config.ini was uploaded separately above. Thanks.
[21 Dec 2009 12:25]
Jonas Oreland
This could also maybe be http://bugs.mysql.com/bug.php?id=49589, which now has a patch attached... Can you test that (together with increased buffers) and see it it helps your case /Jonas
[8 Jan 2010 21:07]
Troy A
Sorry for the delay. I wanted to report that with the patch applied, we have not been able to reproduce the crashes. Thanks for your time, Jonas. -Troy