Bug #85584 | MYSQL cluster restart failing after initialization | ||
---|---|---|---|
Submitted: | 22 Mar 2017 15:49 | Modified: | 25 Apr 2017 13:23 |
Reporter: | Zeljko Zuvic | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | ndb-7.4.7 | OS: | CentOS (6.6) |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | No message slogan found |
[22 Mar 2017 15:49]
Zeljko Zuvic
[30 Mar 2017 16:41]
MySQL Verification Team
Hi, I understand you can't fetch logs with ndb_error_reporter but you can compress and upload all logs from the crashing node manually + you upload logs from your management nodes. If you start the node that crashed with " Node 2: Forced node shutdown completed. Occured during startphase 4. Caused by error 32782: 'No message slogan found (please report a bug if you get this error code)(Unknown). Unknown'." again, will it start or will it again stop at startphase4 ? To get your cluster up, start the crashing node with --initial so it can re-fetch the data from surviving node. all best Bogdan p.s. I tried reproducing this with 7.4.7 without any luck
[31 Mar 2017 11:48]
Zeljko Zuvic
Hi Bogdane, I have just uploaded required logs from data nodes and management node and I hope it should be enough for troubleshooting. Also I tried to start node again after it crushed but the same failed with the same error at startphase 4. At the same time another node is failing with error: "-- Node 2: Forced node shutdown completed. Occured during startphase 4. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart err or). Temporary error, restart node'. " And vice versa, sometimes during restart 1st node is failing with the error "No message slogan found ...." and 2nd then is stopping with "Another node failed during system restart ...." So it looks hopeless so far. Many thanks for your support! Zeljko
[4 Apr 2017 22:58]
MySQL Verification Team
Hi Zeljko, The errors you are getting are:File system open failed. OS errno: 4294967295 so there is a problem with your file system. Possible reasons - wrong permissions of the files - wrong ownership of the files - filesystem corruption - hardware error But I doubt it's related to mysql cluster itself. Can you check the cluster data directory for permissions/ownership settings and can you please check the whole filesystem too. Do you have some antivirus? all best Bogdan
[6 Apr 2017 1:15]
MySQL Verification Team
Hi Zeljko, > Maybe is worth to mention that we have encrypted partition .. > Regarding antivirus, yes we have some version .. > Do you have some recommendation what should be our next step to do? Well, since you are not able to reproduce this (exactly the same everything) I can only guess but I doubt it's related to MySQL :( The steps now - check all your system/kernel log to see if any issues with that encrypted partition - check log from your antivir - disable antivir for the cluster datadir I think it was the antivir that's corrupting the cluster datadir. MySQL Cluster is not playing well with them (nor is regular MySQL Server) and if any time they fight over file the mccge with commit suicide. all best Bogdan
[25 Apr 2017 13:23]
Zeljko Zuvic
Hi Bogdane, You was absolutely right and AV caused the issue we had. After we excluded the mysql files for AV everything started to work again. Many thanks for the support! Zeljko
[25 Apr 2017 18:08]
MySQL Verification Team
Thanks for the update uzdravlje Bogdan