Bug #82627 Out of / Reenable event buffer messages flooding the cluster log too fast
Submitted: 18 Aug 2016 9:15 Modified: 22 Aug 2016 15:03
Reporter: Hartmut Holzgraefe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-cluster-7.4.11 OS:Linux
Assigned to: CPU Architecture:Any

[18 Aug 2016 9:15] Hartmut Holzgraefe
Description:
When running into Bug #82394 the "Out of event buffer" and "Reenable event buffer" messages are printed about ten times per second and data node.

On a 2 node cluster this produces log messages so fast that the log files not purged by rotation yet only cover the last 15-20 minutes.

IMHO reenabling the event buffer should only happen if the buffer usage has fallen below a certain low water mark, and not simply on every new epoch.

How to repeat:
Not sure how to trigger this yet, bug 82394 has more information on the incident.

The cluster logs attached to that bug clearly show the log flood though.

The reenable logic right now is just

5305   if(m_out_of_buffer_gci && gci > m_out_of_buffer_gci)
5306   {
5307     jam();
5308     infoEvent("Reenable event buffer");
5309     m_out_of_buffer_gci = 0;
5310     m_missing_data = false;
5311   }
(src/kernel/blocks/suma/Suma.cpp)

so if the event buffer is still full at this point in time it will trigger a new "Out of event buffer" message almost immediately.

Suggested fix:
Do not reenable the event buffer unless there is enough free space available in it again to do so
[18 Aug 2016 9:20] Hartmut Holzgraefe
Added "too fast" to the synopsis
[22 Aug 2016 15:03] MySQL Verification Team
Hi Hartmut,

yes, I agree with you, filling the log like this helps noone :(

Thanks for the report
Bogdan