MySQL Bugs: #93500: Question about Mysql8.0 Innodb's log

Bug #93500	Question about Mysql8.0 Innodb's log_writer
Submitted:	6 Dec 2018 7:48	Modified:	7 Dec 2018 13:34
Reporter:	Zongzhi Chen (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	8.0.13	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	log_writer

Description:
Hello guys

After reading the source code of InnoDB, I have a question about the log_writer thread.

we know that the log_writer thread will wait on the log.writer_event

    const auto wait_stats = os_event_wait_for(
        log.writer_event, max_spins, srv_log_writer_timeout, stop_condition);

However, in the normal configuration, we set srv_flush_log_at_trx_commit = 1. So no one will wake_up this thread. So it will wait until the srv_log_writer_timeous = 10 microsecond timeout. 

It is the log_write_up_to will write redolog buffer to redolog file, the in log_write_up_to function, it will wake up log_flusher thread, however, it won't wake up log_writer thread. That is really confused me..

  if (flush_to_disk) {
    if (log.flushed_to_disk_lsn.load() >= end_lsn) {
      return (Wait_stats{0});
    }

    Wait_stats wait_stats{0};

    /* The flusher might be sleeping for some time when flush is not
    forced, if we want to wait for flush in such cases (e.g. for DDL
    operations) we need to wake flusher up. */
    if (srv_flush_log_at_trx_commit != 1) {
      /* First we need to assure the data is written and ready
      for flusher to continue with. */
      wait_stats += log_wait_for_write(log, end_lsn);
      os_event_set(log.flusher_event);
    }

    /* Wait until log gets flushed up to end_lsn. */
    return (wait_stats + log_wait_for_flush(log, end_lsn));

  } else {
    if (log.write_lsn.load() >= end_lsn) {
      return (Wait_stats{0});
    }

    /* Wait until log gets written up to end_lsn. */
    return (log_wait_for_write(log, end_lsn));
  }

I don't know why we don't wake up this thread after we write the mtr's buffer to redolog's log buffer. Since the log_flusher has been wake up before. 

So please correct me if I make some mistakes, or some guys can tell me why InnoDB doing this way?
Thank you

How to repeat:
I have do the sys_bench, I just confuse the code..

Suggested fix:
If we want to wake up the log_flusher, I suggest wake up the log_writer together. 
Or we won't wake up both of these two thread. Since the wake up operation will call pthread_cond_boradcast function, and It will cause lots of context switch.
And the log_writer and log_flusher will up every 10 microseconds, I think It's  too frequently.. How about we adjust the wake_up time to a larger value, and we can wake up the thread after we writing data.

Hi,

Thank you for your bug report.

I have studied deeply your code analysis and I have discovered that you are right. Hence, this is a fully qualified performance bug.

Thank you for your contribution.

This has been already fixed (and done other way around) in 8.0.14 :-)

We:

1. Notify log_writer / log_flusher in log_wait_for_write / log_wait_for_flush at the beginning.

2. Then we are spinning (busy waiting, not to give away context).

3. Then we keep notifying log_writer / log_flusher every wait on event if spinning was not enough (including just before the first wait).

There is hardly any reason (and we most likely can't afford that) to notify every mtr_commit (every write to log buffer).

We notify only when we need redo written / flushed.

Regarding 10us:

1. First log thread tries busy waiting for some next work to be done.

2. Afterwards it falls back to waiting on event, and it starts with 10us, but it also multiplies that by 2 every 4 waits (up to maximum 100ms).

Note: in 8.0.14 we notify log_writer as soon as we start to wait on redo written (current write_lsn was not enough).

Sorry for the confusion.

It is very nice to see you interested in the code, and thank you for all the comments!

We are working and trying to improve things around :-)

Hi Pawel,

Thank you for informing us that this problem is resolved in 8.0.14. I have not yet had time to download and inspect the code.

Since the problem is fixed, this bug is therefore closed.

Fixed in 8.0.14 under the heading of

BUG#28616442 FIX TO BUG#28062382 DRAMATICALLY REDUCES THROUGHPUT WHEN THE BINLOG IS ACTIVE