Bug #81386 MTS hangs on file related *_EVENT forced checkpoint
Submitted: 12 May 2016 5:41 Modified: 8 Nov 13:15
Reporter: Trey Raymond Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S4 (Feature request)
Version:5.6.29 OS:Any
Assigned to: CPU Architecture:Any

[12 May 2016 5:41] Trey Raymond
mts forces a checkpoint at file rollover due to those binlog events

this can lead to unexpected slave hangs where all worker threads are waiting on one.  something big at the start of a file might behave just fine, but one towards the end causes severe lag.

How to repeat:
- set up a master/slave with multiple schemas that have write traffic going to them, and MTS enabled with a few threads
- create a table and populate with quite a few GB of data.  format doesn't matter, just size
- truncate this on the master (keep data on the slave)
- show master status until near the end of a binlog based on max_binlog_size
- alter table test_table engine=innodb; (with no data on master this gets into the repl stream immediately)
- observe mysql.slave_worker_info on slave, correlate with processlist/p_s threads, you'll see one executing the big alter, and one or more executing transactions on the other dbs
- wait for the threads' log file pos to hit the end of the binlog, they will stall waiting for a checkpoint, which the thread altering can't do until it is finished - thus, it's back to a single thread blocking, defeating the purpose of MTS

Suggested fix:
you can reduce the chance of this happening by increasing max_binlog_size, but that's not infinitely sustainable, and due to chance of exec time it can still cause major issues even with huge files.

fix would be to let the worker threads gracefully handle 'binlog management' events as specified in https://dev.mysql.com/doc/internals/en/binlog-event.html - this may be difficult to implement...maybe:

- detect events related to end of binlog/rotate to next binlog/start
- select next available worker thread for this batch of events in the same method workers are selected for batches of events on an actual database
- have that worker process the events gracefully, only the coordinator would wait on it
- coordinator can continue processing the next log once that batch is done by the worker

that's off the top of my head, it will be more complex in practice, but there's definitely a better way to handle this.
[12 May 2016 5:50] Trey Raymond
peeking into 5.7 code, looks like a dev noted this issue as well and had some comments (but no change in the code):
[7 Nov 2:41] Trey Raymond
linked to master so the comments got lost.  pasted here out of the code:
      Slave workers are unable to handle Format_description_log_event,
      Rotate_log_event and Previous_gtids_log_event correctly.
      However, when a transaction spans multiple relay logs, these
      events occur in the middle of a transaction. The way we handle
      this is by marking the events as 'ASYNC', meaning that the
      coordinator thread will handle the events without stopping the
      worker threads.
      @todo Refactor this: make Log_event::get_slave_worker handle
      transaction boundaries in a more robust way, so that it is able
      to process Format_description_log_event, Rotate_log_event, and
      Previous_gtids_log_event.  Then, when these events occur in the
      middle of a transaction, make them part of the transaction so
      that the worker that handles the transaction handles these
      events too. /Sven
[8 Nov 13:15] Sinisa Milivojevic

I have analysed the code and I came to the conclusion that this is not yet fixed, nor even in 5.7 nor 8.0.

However, this is not a bug, but a new feature.

Verified as a feature request !!!!

Thank you for your report.