Bug #91865 | mysqld crashes with signal 6 | ||
---|---|---|---|
Submitted: | 2 Aug 2018 8:17 | Modified: | 10 Aug 2018 13:43 |
Reporter: | Hendrik Woltersdorf | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6.41 ndb-7.4.21 | OS: | CentOS (6.3) |
Assigned to: | MySQL Verification Team | CPU Architecture: | x86 |
[2 Aug 2018 8:17]
Hendrik Woltersdorf
[2 Aug 2018 8:18]
Hendrik Woltersdorf
error as seen from the client
Attachment: error.txt (text/plain), 8.07 KiB.
[2 Aug 2018 8:23]
Hendrik Woltersdorf
I can't upload the file from the ndb_error_reporter on sftp because of network security limitations. (11MB).
[2 Aug 2018 8:26]
Hendrik Woltersdorf
reduced ndb_error-reporter files
Attachment: mysql-bug-data-91865_v2.tar.bz2 (application/octet-stream, text), 2.26 MiB.
[2 Aug 2018 8:26]
Hendrik Woltersdorf
the stored procedure mentioned
Attachment: sp_ueberwachung.sql (application/octet-stream, text), 20.39 KiB.
[6 Aug 2018 20:00]
MySQL Verification Team
Hi, I order to see what's going on we need all log files (easiest way to collect them is using ndb_error_rerpot tool). Why are you "cloning" SQL node? best regards Bogdan
[7 Aug 2018 5:04]
Hendrik Woltersdorf
I collected the files using ndb_error_reporter. I just deleted some large old log files. We lost one machine and had to set up a new one. Our system administrators suggested, to clone the surviving sql node on the os level into a new virtual machine. And that's what we did. Anything wrong with that?
[7 Aug 2018 9:12]
MySQL Verification Team
Hi, I see the logs, apologies, I was thinking about one thing and writing another, the ndb_error_reporter does not collect mysql log files so I wanted to ask about full sql log file of the failing node. It is questionable if I'll be able to see anything new there that's not already in the error.txt but we might find something useful there so if you can upload please do. Now, I can't reproduce this and no, just cloning the node is not a good way to go about it as both ndb filesystem and mysql datadir are "wrong" so if you don't want to install sql and ndbmtd on a new node you can clone the existing one but you have to remove ndb filesystem (start the node with --initial or manually delete filesystem before starting the node). The mysqld should work ok with cloned filesystem but I personally like to clear it's datadir too before connecting to cluster. If I understand you correctly - the original SQL node is never crashing, only the new cloned one? If that's correct I'd easily assume there's a filesystem issue on this cloned node. kind regards Bogdan
[7 Aug 2018 10:48]
Hendrik Woltersdorf
mysqld log of the cloned node
Attachment: etq-dusv-dbcl2.zip (application/x-zip-compressed, text), 27.42 KiB.
[7 Aug 2018 10:48]
Hendrik Woltersdorf
mysqld log of the original node
Attachment: etq-wil-dbcl2.zip (application/x-zip-compressed, text), 23.34 KiB.
[7 Aug 2018 10:50]
Hendrik Woltersdorf
I added the log files of the two sql nodes. The original node crashed too, but less often.
[7 Aug 2018 11:00]
MySQL Verification Team
Hi, Thanks for the logs and clarification. Let me analyze this and I'll get back to you. all best Bogdan
[10 Aug 2018 5:37]
Hendrik Woltersdorf
Yesterday I recreated the MySQL Cluster SQL node on the cloned machine. That means: - stop mysqld - delete everything in 'datadir' - start from scratch with mysql_install_db Since then (at least until now, for one day) no crashes and no more messages like: "Incorrect information in file: './hacom/SYSTEM_CACHE.frm'" on one SQL node whenever a "truncate table SYSTEM_CACHE" was issued on the other one.
[10 Aug 2018 13:43]
MySQL Verification Team
Hi, looks like cloning was the problem. You can't clone data filesystem (neither mysqld's nor ndbmtd's) for new nodes, those need to be clean. Hopefully this solves the problem. Let me know if your experience show otherwise, but for now I'm setting this to "not a bug" all best Bogdan