Bug #49664 | MySQL Cluster data node failure | ||
---|---|---|---|
Submitted: | 14 Dec 2009 10:32 | Modified: | 21 Dec 2009 13:55 |
Reporter: | MySQL DBA | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | mysql-5.1-telco-7.0 | OS: | Linux (Fedora) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | 5.1.35-ndb-7.0.7-cluster-gpl, data, Failure, MySQLCluster, node |
[14 Dec 2009 10:32]
MySQL DBA
[14 Dec 2009 10:34]
MySQL DBA
This is configuration file.
Attachment: Config.ini (, text), 2.43 KiB.
[14 Dec 2009 10:35]
MySQL DBA
This is the trace log related to this error
Attachment: Trace_log.txt (text/plain), 28.89 KiB.
[14 Dec 2009 10:35]
MySQL DBA
This is the second trace log
Attachment: ndb_4_trace.log.14_t1.txt (text/plain), 28.89 KiB.
[14 Dec 2009 10:38]
MySQL DBA
These are the logs from management nodes. 2009-12-13 00:52:23 [MgmtSrvr] INFO -- Node 3: Local checkpoint 409 started. Keep GCI = 702757 oldest restorable GCI = 664173 2009-12-13 01:50:19 [MgmtSrvr] INFO -- Node 3: Local checkpoint 410 started. Keep GCI = 704461 oldest restorable GCI = 664173 2009-12-13 02:48:15 [MgmtSrvr] INFO -- Node 3: Local checkpoint 411 started. Keep GCI = 706165 oldest restorable GCI = 664173 2009-12-13 03:46:11 [MgmtSrvr] INFO -- Node 3: Local checkpoint 412 started. Keep GCI = 707869 oldest restorable GCI = 664173 2009-12-13 04:44:10 [MgmtSrvr] INFO -- Node 3: Local checkpoint 413 started. Keep GCI = 709574 oldest restorable GCI = 664173 2009-12-13 05:42:07 [MgmtSrvr] INFO -- Node 3: Local checkpoint 414 started. Keep GCI = 711280 oldest restorable GCI = 664173 2009-12-13 06:40:06 [MgmtSrvr] INFO -- Node 3: Local checkpoint 415 started. Keep GCI = 712986 oldest restorable GCI = 664173 2009-12-13 07:38:07 [MgmtSrvr] INFO -- Node 3: Local checkpoint 416 started. Keep GCI = 714691 oldest restorable GCI = 664173 2009-12-13 08:36:05 [MgmtSrvr] INFO -- Node 3: Local checkpoint 417 started. Keep GCI = 716399 oldest restorable GCI = 664173 2009-12-13 09:34:06 [MgmtSrvr] INFO -- Node 3: Local checkpoint 418 started. Keep GCI = 718106 oldest restorable GCI = 664173 2009-12-13 10:32:03 [MgmtSrvr] INFO -- Node 3: Local checkpoint 419 started. Keep GCI = 719815 oldest restorable GCI = 664173 2009-12-13 11:29:56 [MgmtSrvr] INFO -- Node 3: Local checkpoint 420 started. Keep GCI = 721519 oldest restorable GCI = 664173 2009-12-13 12:27:55 [MgmtSrvr] INFO -- Node 3: Local checkpoint 421 started. Keep GCI = 723220 oldest restorable GCI = 664173 2009-12-13 13:25:53 [MgmtSrvr] INFO -- Node 3: Local checkpoint 422 started. Keep GCI = 724926 oldest restorable GCI = 664173 2009-12-13 14:23:48 [MgmtSrvr] INFO -- Node 3: Local checkpoint 423 started. Keep GCI = 726631 oldest restorable GCI = 664173 2009-12-13 15:21:47 [MgmtSrvr] INFO -- Node 3: Local checkpoint 424 started. Keep GCI = 728333 oldest restorable GCI = 664173 2009-12-13 16:19:45 [MgmtSrvr] INFO -- Node 3: Local checkpoint 425 started. Keep GCI = 730039 oldest restorable GCI = 664173 2009-12-13 17:17:42 [MgmtSrvr] INFO -- Node 3: Local checkpoint 426 started. Keep GCI = 731745 oldest restorable GCI = 664173 2009-12-13 18:15:43 [MgmtSrvr] INFO -- Node 3: Local checkpoint 427 started. Keep GCI = 733450 oldest restorable GCI = 664173 2009-12-13 19:13:40 [MgmtSrvr] INFO -- Node 3: Local checkpoint 428 started. Keep GCI = 735160 oldest restorable GCI = 664173 2009-12-13 20:11:38 [MgmtSrvr] INFO -- Node 3: Local checkpoint 429 started. Keep GCI = 736865 oldest restorable GCI = 664173 2009-12-13 21:09:38 [MgmtSrvr] INFO -- Node 3: Local checkpoint 430 started. Keep GCI = 738570 oldest restorable GCI = 664173 2009-12-13 22:07:35 [MgmtSrvr] INFO -- Node 3: Local checkpoint 431 started. Keep GCI = 740277 oldest restorable GCI = 664173 2009-12-13 23:05:31 [MgmtSrvr] INFO -- Node 3: Local checkpoint 432 started. Keep GCI = 741982 oldest restorable GCI = 664173 2009-12-13 23:31:15 [MgmtSrvr] ALERT -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. 2009-12-13 23:31:15 [MgmtSrvr] ALERT -- Node 3: Node 4 Disconnected 2009-12-13 23:31:15 [MgmtSrvr] ALERT -- Node 3: Network partitioning - arbitration required 2009-12-13 23:31:15 [MgmtSrvr] INFO -- Node 3: President restarts arbitration thread [state=7] 2009-12-13 23:31:15 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 closed 2009-12-13 23:31:15 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected 2009-12-13 23:31:15 [MgmtSrvr] ALERT -- Node 3: Arbitration won - positive reply from node 1 2009-12-13 23:31:15 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected 2009-12-13 23:31:15 [MgmtSrvr] INFO -- Node 3: Started arbitrator node 1 [ticket=4bc00003c1427405] 2009-12-13 23:31:18 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 opened 2009-12-14 00:03:20 [MgmtSrvr] INFO -- Node 3: Local checkpoint 433 started. Keep GCI = 743685 oldest restorable GCI = 664173 2009-12-14 01:01:11 [MgmtSrvr] INFO -- Node 3: Local checkpoint 434 started. Keep GCI = 745386 oldest restorable GCI = 744481 2009-12-14 01:59:07 [MgmtSrvr] INFO -- Node 3: Local checkpoint 435 started. Keep GCI = 747088 oldest restorable GCI = 747072 2009-12-14 02:52:02 [MgmtSrvr] INFO -- Node 3: Local checkpoint 436 started. Keep GCI = 748793 oldest restorable GCI = 748538 2009-12-14 02:52:34 [MgmtSrvr] INFO -- Node 3: Local checkpoint 437 started. Keep GCI = 750348 oldest restorable GCI = 750360 2009-12-14 03:50:18 [MgmtSrvr] INFO -- Node 3: Local checkpoint 438 started. Keep GCI = 750365 oldest restorable GCI = 750360 2009-12-14 04:48:09 [MgmtSrvr] INFO -- Node 3: Local checkpoint 439 started. Keep GCI = 752059 oldest restorable GCI = 751726
[14 Dec 2009 10:45]
Jonas Oreland
Hi, The problem is with send-buffer (and has been addressed in http://bugs.mysql.com/bug.php?id=48852) Remove these lines from your config.ini TotalSendBufferMemory=4M SendBufferMemory=2M And instead add: [TCP DEFAULT] SendBufferMemory=2M ReceiveBufferMemory=2M
[14 Dec 2009 10:53]
MySQL DBA
Thanks Jonas for a quick response. I'll implement it and get back to you. Regards, Manasi Save.
[14 Dec 2009 11:34]
MySQL DBA
Hi Jonas, I changed the config.ini TCP DEFAULT section. But immediately after connecting one node got disconnected with following :- error.Time: Monday 14 December 2009 - 06:26:11 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: SimulatedBlock.cpp Error object: DBLQH (Line: 512) 0x0000000a Program: ndbmtd Pid: 20136 thr: 2 Version: mysql-5.1.39 ndb-7.0.9b Trace: /usr/local/mysql/data/ndb_3_trace.log.13 /usr/local/mysql/data/ndb_3_trace.log.13_t1 /usr/local/mysql/data/ndb_3_trace.log.13 How should I do now?
[14 Dec 2009 11:37]
Jonas Oreland
Hi again, 1) check that you removed it from ndbd default section 2) check that your config changes took effect (you need to restart ndb_mgmd using the "--reload" switch) 3) Increase the value of SendBufferMemory to say 16M
[14 Dec 2009 11:45]
MySQL DBA
Hi Jonas, Yes, I have removed it completely from ndbd section. I have started ndb_mgmd with --reload. and now I'll try to increase the SendBufferMemory. Also, These are few more warnigns I got after connecting to all nodes: -2009-10-06 02:45:53 [MgmSrvr] WARNING -- Failed to convert connection from '10.1.138.3:37239' to transporter 2009-10-06 02:45:53 [MgmSrvr] WARNING -- Failed to convert connection from '10.1.138.3:37240' to transporter 2009-10-06 02:45:53 [MgmSrvr] WARNING -- Failed to convert connection from '10.1.138.3:37241' to transporter 2009-10-06 02:45:53 [MgmSrvr] WARNING -- Failed to convert connection from '10.1.138.3:37242' to transporter 2009-10-06 02:45:53 [MgmSrvr] WARNING -- Failed to convert connection from '10.1.138.3:37244' to transporter 2009-10-06 02:45:53 [MgmSrvr] WARNING -- Failed to convert connection from '10.1.138.3:37245' to transporter
[21 Dec 2009 12:26]
Jonas Oreland
How is testing of new config doing ?
[21 Dec 2009 12:41]
MySQL DBA
Hi Jonas, Sorry for not giving status. The new config is working fine. so, far there is no problem. Thanks, Regards, Manasi Save MySQL DBA
[21 Dec 2009 12:55]
Jonas Oreland
super! so i'll close this a duplicate of the one with the error message (for now)
[21 Dec 2009 13:55]
MySQL DBA
Ok. Thanks for your active support. Regards, Manasi Save MySQL DBA