MySQL Bugs: #77225: Signals delivered to incorrect or garbled 'trp

Bug #77225	Signals delivered to incorrect or garbled 'trp_client' -> Client crash or hangs
Submitted:	2 Jun 2015 13:08	Modified:	8 Jun 2015 16:33
Reporter:	Ole John Aske	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: NDB API	Severity:	S2 (Serious)
Version:	7.3.9	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
API Signals are delivered to the correct client by TransporterFacade::deliver_signal().
Based on the 'BlockNr' of the destination API client, the correct 'trp_client'
is looked up from the TransporterFacade::m_thread structure, which has Vector of
trp_client* as one of its members.

It turns out that there are no mutex protection when 'get'ing trp_client* 
from this Vector. This will cause issues when other clients connects to
the TransporterFacade, which could possibly cause the trp_client* Vector to
be ::expand'ed. During the expand operation the array of items are reallocated
and the old item array deleted *before* the realloced array is assigned.
Thus the deleted array can be grabbed by an other malloc and garbage 
written into it which the unproteced get'ter reads as a trp_client*.

We see crashes related to this in the AutoTest: testScan -n ScanRead488T
and there are also strange timeout and crashes in other testScan's which
we believe may be related to this.

How to repeat:
testScan -n ScanRead488T.

Force Vector::expand() to do a task switch by inserting
a microsleep just after the old item array has been deleted.
(And before the new is assigned)

Documented fix in the NDB 7.3.10 and 7.4.7 changelogs, as follows:

    Client lookup for delivery of API signals to the correct client
    by the internal TransporterFacade::deliver_signal() function had
    no mutex protection, which could cause issues such as timeouts
    encountered during testing, when other clients connected to the
    same TransporterFacade.
      
Closed.