MySQL Bugs: #113068: TransporterRegistry::m_status_overloaded may be set incorrectly

Bug #113068	TransporterRegistry::m_status_overloaded may be set incorrectly
Submitted:	13 Nov 2023 17:03	Modified:	13 Nov 2023 22:00
Reporter:	ZHAO SONG	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S4 (Feature request)
Version:	8.0.35	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The function Transporter::update_status_overloaded() is called in two places:

1. In TransporterRegistry::updateWritePtr(), it occurs after appending signals into thr_data::m_send_buffers[trp_id], and it checks the length of signals in that local buffer.
2. In Transporter::iovec_data_sent(), it occurs after sending signals, and it checks the remaining length of signals in g_thr_repository->m_send_buffers[trp_id]->m_sending_buffer, the global sending buffer.

I have two questions:

1. These two places check different buffer lengths—one is the local buffer, and the other is the global sending buffer. Both lengths are used to decide whether the transporter should be set to an overloaded status. Is it expected that both these buffers can determine the transporter's overloaded status? Given their distinct usages and lengths, using them together to decide the status could potentially influence each other.

2. If yes, then the first buffer appears to use the wrong length for checking. It utilizes only the used length of the last page of the local buffer, which is significantly smaller than the overloaded limitation. As a result, it can never reach the limitation.

Uint32
mt_send_handle::updateWritePtr(TrpId trp_id,
                               Uint32 lenBytes,
                               Uint32 prio)
{
  struct thr_send_buffer * b = m_selfptr->m_send_buffers+trp_id;
  thr_send_page * p = b->m_last_page;
  p->m_bytes += lenBytes;
  return p->m_bytes; //<-------------- return the used bytes of last page
}

void
TransporterRegistry::updateWritePtr(TransporterSendBufferHandle *handle,
                                    Transporter* t,
                                    TrpId trp_id,
                                    Uint32 lenBytes,
                                    Uint32 prio)
{
  Uint32 used = handle->updateWritePtr(trp_id, lenBytes, prio);
  t->update_status_overloaded(used); // <------------use it to decide the overloaded status

  if (t->send_limit_reached(used))
  {
    //-------------------------------------------------
    // Buffer is full and we are ready to send. We will
    // not wait since the signal is already in the buffer.
    // Force flag set has the same indication that we
    // should always send. If it is not possible to send
    // we will not worry since we will soon be back for
    // a renewed trial.
    //-------------------------------------------------
    if (t->send_is_possible(0))
    {
      //-------------------------------------------------
      // Send was possible, attempt at a send.
      //-------------------------------------------------
      handle->forceSend(trp_id);
    }//if
  }
}

How to repeat:
reviewing code at src/common/transporter/TransporterRegistry.cpp:L3895

Suggested fix:
Consider using a uniform checking place, such as the global send and sending buffer, instead of checking two different buffers. 
This approach can provide consistency in evaluating the overloaded status without relying on different buffer lengths and usages.

Hi,

Thank you for your review, we will pass this on to ndb dev team as feature request.

As for the questions, bugs system is not a place for such questions, you can use forums.mysql.com to communicate with developers directly.