Bug #105125 Inconsistency in checks of long signal to query thread
Submitted: 4 Oct 2021 16:43 Modified: 7 Oct 2021 13:37
Reporter: Mikael Ronström Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.0.26 OS:Any
Assigned to: CPU Architecture:Any

[4 Oct 2021 16:43] Mikael Ronström
Description:
Given that signals are scheduled individually by the receive
thread we cannot send fragmented signals to query threads in
other nodes.

In DBTC we sum the signal length and the section lengths. If
the sum of these are larger than 7400 we will not use
the query thread in the other node.

However in sendFirstFragment we instead check the size of the
sections summed with messageSize where messageSize is a constant
equal to 240 * 8 = 1920. This means that signals can still be
fragmented although not supposed to be so.

How to repeat:
testNdbApi -n MaxGetValues T1 T6 T13 can crash since a fragmented signal is sent to the query thread, but not the first one.

Suggested fix:
replace messageSize by length in sendFirstFragment (2 places)
[4 Oct 2021 19:15] MySQL Verification Team
Hi Mikael,

Thanks for the report. Can you help me reproduce this as I'm not able to reproduce this.

Thanks
Bogdan
[4 Oct 2021 21:21] Mikael Ronström
It is very hard to reproduce, I got it when running the test case
mentioned in the bug report when running autotest with 3 replicas.
It could be that the test case use locked reads or simple reads in
which case it won't attempt to use query threads. If that is the case
you also need to change test case to use Committed reads.

Even with that the probability of hitting the bug is low, one needs to
schedule the first part of the signal to a LDM thread and the second to
a query thread.

So might require running the test case many times, the machine where it
happened is a machine with many cores and used a fair amount of threads.

But the bug itself is on the other hand very easy to discover and so to
say reproduce by reading the code :)