MySQL Bugs: #77406: MTS must be able handle large packets with small slave_pending_jobs_size

Bug #77406	MTS must be able handle large packets with small slave_pending_jobs_size_max
Submitted:	18 Jun 2015 15:03	Modified:	21 Jun 2017 10:15
Reporter:	Andrii Nikitin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.25	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Currently slave_pending_jobs_size_max  serves two purposes:
1. Limit queue of workers' pending events
2. Limit maximum size of event which workers may process

This behavior is not consistent, because systems with large packets are not allowed to configure small queue (which among other influences execution time of "STOP SLAVE" in some cases)

How to repeat:
Configure MTS with small slave_pending_jobs_size_max and send huge replication event, following error is reported:

 "Cannot schedule event Query"

Suggested fix:
MTS should honor configuration specified in  slave_max_allowed_packet and workers should be able to process huge events with small queue configured in slave_pending_jobs_size_max.

Since memory allocation may become an issue (slave_parallel_workers*slave_max_allowed_packet), this may be implemented by e.g. introducing additional (shared) buffer where schedulers put huge events, which do not fit into workers' queue. Scheduler will not be able to proceed further if that buffer is full. 
Or scheduler will have to pause on huge events if they do not fit into workers' queue - then true parallelism will be achieved only if queue is configured to be large enough). (Small queue may be still acceptable for systems where large packets come only occasionally).

Posted by developer:
 
Possible solutions:

 1. Document this and don't fix it.
    + Current solution, no extra work from our side.
    - Does not help users.

 2. Allow an event to break the limit in case the queue is empty. Then
    no event can block the progress of the applier.
    + Big events cannot cause slave to stop.
    - May use up to slave_parallel_workers * master.max_allowed_packet
      bytes of memory.

 3. Allow an event to break the limit in case the queue is empty
    and at most N other events in this channel are currently breaking
    the limit. (e.g. N=1 or 2).
    + Big events cannot cause slave to stop.
    + Memory is capped to roughly
        slave_parallel_workers * slave_pending_jobs_max_size +
        master.max_allowed_packet
    - In case a transaction contains multiple big events, this
      may stall the scheduler even if other queues are empty so that
      we are below the limit
        slave_parallel_workers * slave_pending_jobs_max_size +
        master.max_allowed_packet
    - Requires coordination between workers to determine when the event
      can be queued.

 4. Allow an event to break the limit in case the total size of
    all worker's queue sizes for this channel is at most M bytes (e.g.
    M = slave_pending_jobs_max_size * slave_parallel_workers).
    + Big events cannot cause slave to stop.
    + Memory still capped to 
        slave_parallel_workers * slave_pending_jobs_max_size +
        master.max_allowed_packet
    + One (or even a few) queues may 'borrow' memory from other
      queues that don't need it.
    - Requires coordination between workers to determine when the event
      can be queued.

(I don't think we need a special buffer for big events as in the
'suggested fix', we can achieve similar results by just summing existing
queue sizes.)

Posted by developer:
 
The documentation changes have been made. The following text will appear in the change logs for the appropriate releases:

Replication: Multi-threaded slaves could not be configured with small queue sizes using slave_pending_jobs_size_max if they ever needed to process transactions larger than that size. Any packet larger than slave_pending_jobs_size_max was rejected with the error ER_MTS_EVENT_BIGGER_PENDING_JOBS_SIZE_MAX, even if the packet was smaller than the limit set by slave_max_allowed_packet.

With this fix, slave_pending_jobs_size_max becomes a soft limit rather than a hard limit. If the size of a packet exceeds slave_pending_jobs_size_max but is less than slave_max_allowed_packet, the transaction is held until all the slave workers have empty queues, and then processed. All subsequent transactions are held until the large transaction has been completed. The queue size for slave workers can therefore be limited while still allowing occasional larger transactions.