Bug #92434 NDB node keep crashing due to Backup.cpp bug
Submitted: 15 Sep 2018 9:48 Modified: 10 Oct 2018 0:23
Reporter: Alex West Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.6.6 OS:CentOS (7)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: ndb

[15 Sep 2018 9:48] Alex West
Description:
Hello,

Since I upgraded my cluster to the latest available version at the end of July, I get my data nodes randomly crashing.
When I wanted to report the bug, the report tool told me that some config variables were deprecated so I deleted them and made a rolling restart.
THis have not solved the problem.

I know the 7.6.7 is out but there is nothing about backup.cpp bug.

                                                         
Time: Friday 14 September 2018 - 22:33:24
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Backup.cpp
Error object: BACKUP (Line: 6487) 0x00000000 Check false failed
Program: ndbd
Pid: 48286
Version: mysql-5.7.22 ndb-7.6.6
Trace file name: ndb_11_trace.log.13
Trace file path: /home/cluster//ndb_11_trace.log.13 [t1..t1]
***EOM***
            

How to repeat:
Upgrade to 7.6.6 then wait hours to days
[24 Sep 2018 0:18] Alex West
Hello,

Have you been able to reproduce the bug? Since it randomly appears, it may be difficult. SOmetimes it's twice a day, sometimes it takes several days.

Problem occured again:

Time: Wednesday 19 September 2018 - 08:48:24
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Backup.cpp
Error object: BACKUP (Line: 6487) 0x00000004 Check false failed
Program: ndbd
Pid: 2897
Version: mysql-5.7.22 ndb-7.6.6
Trace file name: ndb_12_trace.log.7
Trace file path: /home/cluster//ndb_12_trace.log.7 [t1..t1]
***EOM***
          

I hope we can get a fix ;)

Thanks
[24 Sep 2018 16:55] MySQL Verification Team
Hi,

This is rather simple 2 data node system. I doubt there's a general bug that you are hitting here (as we'd be seeing that a lot) so it must be somehow specific for your setup.

- are you running a backup when this crash happens?
- are you running any special cron job when this happens?
- are you sure your hardware is 100% (I see you have some filesystem errors in the logs, this could be from the crash but.. it's kinda too often)

It's weird that both nodes crash, can you share syslog too ?

thanks
Bogdan
[24 Sep 2018 23:55] Alex West
Hi Bogdan,

The hours and minutes are different and we don't make backup request (through cron). In fact we only do it monthly.

We got one more crash yesterday:
                                                           
Time: Monday 24 September 2018 - 10:33:04
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Backup.cpp
Error object: BACKUP (Line: 6487) 0x00000000 Check false failed
Program: ndbd
Pid: 49517
Version: mysql-5.7.22 ndb-7.6.6
Trace file name: ndb_11_trace.log.14
Trace file path: /home/cluster//ndb_11_trace.log.14 [t1..t1]
***EOM***

For the hardware, I made an HP ILO full test but nothing appeared early this month. 
It only started those strange crashes after I updated from an old version to the 7.6.6.

I will attach syslog.

Thanks,
Alex
[9 Oct 2018 20:38] Mikael Ronström
Looks like you hit a very real bug and I will work a bit more on it tomorrow.
I think I have been able to know the cause of the crash and how to fix it, will verify it
tomorrow.
[9 Oct 2018 21:21] Mikael Ronström
Found that this is a duplicate of BUG#91764, this bug has been fixed and will be fixed in the
7.6.8 release.