Bug #91865 mysqld crashes with signal 6
Submitted: 2 Aug 2018 8:17 Modified: 10 Aug 2018 13:43
Reporter: Hendrik Woltersdorf Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.6.41 ndb-7.4.21 OS:CentOS (6.3)
Assigned to: MySQL Verification Team CPU Architecture:x86

[2 Aug 2018 8:17] Hendrik Woltersdorf
Description:
a test system 4 machines, 2 nodes of type management, data and sql each.
After one machine, hosting a sql node, died, we made a copy of the surviving sql node on the operating system level. This copy lives inside of a virtual machine.
With this cloned sql node we see often crashes of the type:
glibc detected *** /opt/mysql/bin/mysqld: munmap_chunk(): invalid pointer: 0x00007fcd80f39fb0 ***
...

How to repeat:
The crash happens often, but not always, when I call a stored procedure (call SP_UEBERWACHUNG('');)
[2 Aug 2018 8:18] Hendrik Woltersdorf
error as seen from the client

Attachment: error.txt (text/plain), 8.07 KiB.

[2 Aug 2018 8:23] Hendrik Woltersdorf
I can't upload the file from the ndb_error_reporter on sftp because of network security limitations. (11MB).
[2 Aug 2018 8:26] Hendrik Woltersdorf
reduced ndb_error-reporter files

Attachment: mysql-bug-data-91865_v2.tar.bz2 (application/octet-stream, text), 2.26 MiB.

[2 Aug 2018 8:26] Hendrik Woltersdorf
the stored procedure mentioned

Attachment: sp_ueberwachung.sql (application/octet-stream, text), 20.39 KiB.

[6 Aug 2018 20:00] MySQL Verification Team
Hi,

I order to see what's going on we need all log files (easiest way to collect them is using ndb_error_rerpot tool).

Why are you "cloning" SQL node?

best regards
Bogdan
[7 Aug 2018 5:04] Hendrik Woltersdorf
I collected the files using ndb_error_reporter. I just deleted some large old log files.
We lost one machine and had to set up a new one. Our system administrators suggested, to clone the surviving sql node on the os level into a new virtual machine. And that's what we did. Anything wrong with that?
[7 Aug 2018 9:12] MySQL Verification Team
Hi,

I see the logs, apologies, I was thinking about one thing and writing another, the ndb_error_reporter does not collect mysql log files so I wanted to ask about full sql log file of the failing node. It is questionable if I'll be able to see anything new there that's not already in the error.txt but we might find something useful there so if you can upload please do.

Now, I can't reproduce this and no, just cloning the node is not a good way to go about it as both ndb filesystem and mysql datadir are "wrong" so if you don't want to install sql and ndbmtd on a new node you can clone the existing one but you have to remove ndb filesystem (start the node with --initial or manually delete filesystem before starting the node). The mysqld should work ok with cloned filesystem but I personally like to clear it's datadir too before connecting to cluster.

If I understand you correctly - the original SQL node is never crashing, only the new cloned one? If that's correct I'd easily assume there's a filesystem issue on this cloned node.

kind regards
Bogdan
[7 Aug 2018 10:48] Hendrik Woltersdorf
mysqld log of the cloned node

Attachment: etq-dusv-dbcl2.zip (application/x-zip-compressed, text), 27.42 KiB.

[7 Aug 2018 10:48] Hendrik Woltersdorf
mysqld log of the original node

Attachment: etq-wil-dbcl2.zip (application/x-zip-compressed, text), 23.34 KiB.

[7 Aug 2018 10:50] Hendrik Woltersdorf
I added the log files of the two sql nodes.
The original node crashed too, but less often.
[7 Aug 2018 11:00] MySQL Verification Team
Hi,
Thanks for the logs and clarification. Let me analyze this and I'll get back to you.

all best
Bogdan
[10 Aug 2018 5:37] Hendrik Woltersdorf
Yesterday I recreated the MySQL Cluster SQL node on the cloned machine.
That means:
- stop mysqld
- delete everything in 'datadir'
- start from scratch with mysql_install_db

Since then (at least until now, for one day) no crashes and no more messages like:
"Incorrect information in file: './hacom/SYSTEM_CACHE.frm'" on one SQL node whenever a "truncate table SYSTEM_CACHE" was issued on the other one.
[10 Aug 2018 13:43] MySQL Verification Team
Hi,
looks like cloning was the problem. You can't clone data filesystem (neither mysqld's nor ndbmtd's) for new nodes, those need to be clean. Hopefully this solves the problem. Let me know if your experience show otherwise, but for now I'm setting this to "not a bug"

all best
Bogdan