Bug #77406 | MTS must be able handle large packets with small slave_pending_jobs_size_max | ||
---|---|---|---|
Submitted: | 18 Jun 2015 15:03 | Modified: | 21 Jun 2017 10:15 |
Reporter: | Andrii Nikitin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.6.25 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[18 Jun 2015 15:03]
Andrii Nikitin
[24 Nov 2016 10:08]
Sven Sandberg
Posted by developer: Possible solutions: 1. Document this and don't fix it. + Current solution, no extra work from our side. - Does not help users. 2. Allow an event to break the limit in case the queue is empty. Then no event can block the progress of the applier. + Big events cannot cause slave to stop. - May use up to slave_parallel_workers * master.max_allowed_packet bytes of memory. 3. Allow an event to break the limit in case the queue is empty and at most N other events in this channel are currently breaking the limit. (e.g. N=1 or 2). + Big events cannot cause slave to stop. + Memory is capped to roughly slave_parallel_workers * slave_pending_jobs_max_size + master.max_allowed_packet - In case a transaction contains multiple big events, this may stall the scheduler even if other queues are empty so that we are below the limit slave_parallel_workers * slave_pending_jobs_max_size + master.max_allowed_packet - Requires coordination between workers to determine when the event can be queued. 4. Allow an event to break the limit in case the total size of all worker's queue sizes for this channel is at most M bytes (e.g. M = slave_pending_jobs_max_size * slave_parallel_workers). + Big events cannot cause slave to stop. + Memory still capped to slave_parallel_workers * slave_pending_jobs_max_size + master.max_allowed_packet + One (or even a few) queues may 'borrow' memory from other queues that don't need it. - Requires coordination between workers to determine when the event can be queued. (I don't think we need a special buffer for big events as in the 'suggested fix', we can achieve similar results by just summing existing queue sizes.)
[21 Jun 2017 10:15]
Margaret Fisher
Posted by developer: The documentation changes have been made. The following text will appear in the change logs for the appropriate releases: Replication: Multi-threaded slaves could not be configured with small queue sizes using slave_pending_jobs_size_max if they ever needed to process transactions larger than that size. Any packet larger than slave_pending_jobs_size_max was rejected with the error ER_MTS_EVENT_BIGGER_PENDING_JOBS_SIZE_MAX, even if the packet was smaller than the limit set by slave_max_allowed_packet. With this fix, slave_pending_jobs_size_max becomes a soft limit rather than a hard limit. If the size of a packet exceeds slave_pending_jobs_size_max but is less than slave_max_allowed_packet, the transaction is held until all the slave workers have empty queues, and then processed. All subsequent transactions are held until the large transaction has been completed. The queue size for slave workers can therefore be limited while still allowing occasional larger transactions.