Bug #86360 | Error 2306 'Pointer too large' on ndbmtd start after a full cluster crash | ||
---|---|---|---|
Submitted: | 17 May 2017 15:34 | Modified: | 22 May 2017 21:01 |
Reporter: | Andrew Blackmore | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 7.4.15 | OS: | Ubuntu (16.04) |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[17 May 2017 15:34]
Andrew Blackmore
[18 May 2017 4:16]
MySQL Verification Team
Hi Andrew, I cannot reproduce this issue and the error log you provided does not provide enough data, the error you are getting (pointer too large in dbdih) is not helpful (lot of different issues that are not bugs can lead to that). kind regards bogdan
[18 May 2017 15:30]
Andrew Blackmore
I believe that the initial error that actually triggered the chain of events is in that same error log but it is just a little further up: Time: Wednesday 17 May 2017 - 02:11:03 Status: Temporary error, restart node Message: Send signal error (Internal error, programming error or missing error message, please report a bug) Error: 2339 Error data: Unhandled sections in sendSignal for GSN 33 (KEYINFO20). Error object: Program: ndbmtd Pid: 2884 thr: 1 Version: mysql-5.6.36 ndb-7.4.15 Trace: /usr/local/mysql/data/ndb_2_trace.log.9 [t1..t11] ***EOM*** This happened at the time when the entire cluster crashed and caused what I believe to be the rest of the issues
[18 May 2017 19:33]
MySQL Verification Team
Hi, It is possible but send signal error is usually effect and not the cause of the crash. What really caused the crash I can't say from the info I have, and I doubt it's a bug. Normally a partial cluster start and initial start of remaining nodes followed by initial start of other nodes clears everything up but you were unlucky enough to have 2 nodes from the same group fail. In that case restoring backup is the fastest/safest (and often only) possibility. Now this is entering a domain of support so I do suggest you contact Oracle Support team, they can - figure out why this happened to you and how to prevent it from happening again - get the system up and running with the least amount of downtime Without ndb_2_trace.log.9 [t1..t11] we can't see how the crash happened, but often even with those logs we might be in same sistuation best regards bogdan
[18 May 2017 19:41]
Andrew Blackmore
I have added the log files you mentioned so that you can look at them. I have already moved on to a less volatile solution going forward.
[18 May 2017 19:52]
MySQL Verification Team
Thanks for uploading trace.log.9, we'll see if there's anything there to show how to reproduce the problem all best Bogdan
[19 May 2017 8:00]
Mauritz Sundell
Regarding Unhandled sections in sendSignal for GSN 33 (KEYINFO20). Do you have tables with large primary or unique keys? Typically some char or varchar keys.
[22 May 2017 15:56]
Andrew Blackmore
Yes there is a table with a unique key that is VARCHAR(16)
[22 May 2017 20:50]
MySQL Verification Team
Hi Andrew, any chance you can give us the table structure? You can mangle the column names just leave types. And can you tell us the count(*) for that table? thanks Bogdan