Bug #100639 | Forced node shutdown Caused by error 2334: 'Job buffer congestion' | ||
---|---|---|---|
Submitted: | 26 Aug 2020 5:11 | Modified: | 15 Sep 2020 5:46 |
Reporter: | John Wu | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 8.0.20 | OS: | Red Hat |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[26 Aug 2020 5:11]
John Wu
[1 Sep 2020 12:17]
MySQL Verification Team
Hi, For start, we really need FULL set of logs (ndb_error_reporter will create that for you), compress it and use "files" tab info to upload to us. Make sure you specify exact filename in the reply here (I cannot dl the file without exact name). Now, with regards the "at midnight or early morning when non peak usage" this in 99.9% cases means that you have some cron job or similar that's set to run at "non peak hours" that is killing your cluster. Please investigate what is it that you are running at midnight/morning, cron.daily jobs, cron.weekly, cron.monthly ... usually some run at midnight, 2am, 3am, 4am .. something from there is the culprit. Cluster, of course, should not crash, but finding what's causing the crash will help us solve the problem kind regards Bogdan
[4 Sep 2020 2:50]
John Wu
Hi I have uploaded the file. File name is mysql-bug-data-100639.zip Hope you can help out what's the issue on my case. Thanks very much!
[4 Sep 2020 13:30]
MySQL Verification Team
Hi, Thanks for the logs. This looks like your server was overloaded. Have you checked your cron schedule? Do you have server monitoring, can you check CPU usage during period when the crash happened? Thanks
[4 Sep 2020 16:00]
John Wu
Hi thanks for your prompt reply. If you can advise, I have a 32 core server and 252 GB of RAM and I thought it is already too much, But now I see my ndbd process is running at 90 over percent, when practically no one is accessing the system now. Can you advise what I can do on this? Thanks. I have no crontab -e job except for a mysql backup job running at 4am every morning, and only on one of the server.
[14 Sep 2020 15:04]
MySQL Verification Team
Hi, I cannot reproduce the problem but looking at the logs it looks like when the crash happens your system is overloaded. What is loading the system I cannot tell from the logs. You might want to setup some monitoring tools like MySQL Enterprise Monitor or at least Zabbix or Nagios or if nothing else sysstat so you can look at sar reports. But with existing data there's nothing I can do to detect why your crash is happening. All best Bogdan
[15 Sep 2020 5:46]
John Wu
Thank you very much for your attention on this matter. Best regards.