Bug #39235 NDB : No dropped signal handling for fragmented signals
Submitted: 4 Sep 2008 9:59 Modified: 12 Dec 2008 13:42
Reporter: Frazer Clement Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Frazer Clement CPU Architecture:Any

[4 Sep 2008 9:59] Frazer Clement
Description:
When long signal buffer exhaustion in NDBD results in a signal being dropped, the usual handling mechanism does not take fragmented signals into account.
This could result in a crash as the fragmented signal handling would not know how to deal with the missing fragments.

How to repeat:
Send long fragmented signal that results in long signal buffer exhaustion.
Cases : 
  First fragment dropped, other fragments dropped : No problem
  First fragment dropped, other fragments arrive : Fragment assembly crash
  First fragment arrives, middle fragment(s) dropped, last fragment arrives : Fragment assembly succeeds with missing data
  First All fragments arrive except last : Fragment assembly record and long signal buffer leak.

Suggested fix:
Add fragmented signal handling to relevant SIGNAL_DROPPED_REP handling code.
[8 Dec 2008 22:37] Frazer Clement
Improved and tested patch with autotest testcase

Attachment: bug39235-with-test.patch (text/x-patch), 19.04 KiB.

[9 Dec 2008 9:19] Jonas Oreland
i would probably wrap the Dbtc::testFragmentDrop in #ifdef ERROR_INSERT
maybe not so much for "optimization" but more for documentation of the code

also: i used error code 8074 (i think) so you need to shift the numbers...

comment on this: I actually sometimes reserve numbers in ERROR_CODES.txt
  (commit + push) *before* pushing actual patch, to avoid such clashes.

summary: ok to push, would be nice with the #ifdef, but not necessary
[9 Dec 2008 11:46] Frazer Clement
8074 appears to be unused so far.

I've added the #ifdefs round testFragmentDrop() definition + usage.

Will push to 6.4.
[11 Dec 2008 15:18] Frazer Clement
Pushed to 6.4.0

http://lists.mysql.com/commits/61110
[12 Dec 2008 13:42] Jon Stephens
Documented bugfix in the ndb-6.4.0 changelog as follows:

        When long signal buffer exhaustion in the ndbd process resulted
        in a signal being dropped, the usual handling mechanism ddid not
        take fragmented signals into account. This could result in a
        crash of the data node because the fragmented signal handling
        mechanism was not able to work with the missing fragments.