Bug #94777 Question about the redo log write_ahead_buffer
Submitted: 26 Mar 2019 8:14 Modified: 5 Apr 2019 17:45
Reporter: Zongzhi Chen (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0.14 OS:Any
Assigned to: CPU Architecture:Any

[26 Mar 2019 8:14] Zongzhi Chen
Description:

I used to thought the redo log would write the whole write_ahead_buffer every time. Such as in the default setting, the write_ahead_buffer_size = 8k, we will doing the 8k write operation every time.  However, it seem that the write_ahead_buffer only used to avoid the un-complete block write.  it only use to store the data when some operation write smaller than 512. And it also won't used for avoid "read-on-write". 
every time when InnoDB call log_files_write_buffer(), it will aligned the buf_begin address. It is this operation that InnoDB avoid the "read-on-write" problem.

And I can print the write_size before we call  write_blocks
 fprintf(stderr, "baotiao: write_size %lu \n", write_size);
 write_blocks(log, write_buf, write_size, real_offset);

And when we compare with mysql 5.7.  We use the same test case to insert lots of data. It show us that, In mysql8.0 we always have small IO, and in mysql5.7 we alway have larger io

This is the debug message from 8.0:

baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 512
baotiao: write_size 4096
baotiao: write_size 512
baotiao: write_size 512

This is the debug message from 5.7:
baotiao: write_size 147456
baotiao: write_size 1024
baotiao: write_size 458240
baotiao: write_size 460288
baotiao: write_size 396800
baotiao: write_size 512
baotiao: write_size 1536
baotiao: write_size 9216
baotiao: write_size 512
baotiao: write_size 512

so my question is that:
is that your intent to use the redo log's write_ahead_buffer?  Since in my intuition, I thought every time the IO operation should be the size of write_ahead_buffer_size. However, it isn't. 

I thought if the write_size is 512, we still cause the "read-on-write" problem, and the problem even worse in mysql 8.0. Since it have much more small IO than mysql 5.7

How to repeat:
Add information before write_blocks
[29 Mar 2019 19:41] Zongzhi Chen
I wrote a program simulation the write_ahead_buffer pattern in InnoDB.
Every time I will write 512 bytes data to the file, And when I write the header of the 4k address, I will fill the block with 0 and aligned to 4k.

The upper one is the default way, which write 512 bytes sequentially and won't do any optimize, the under one is the optimized way as InnoDB 

```

  lseek(fd, 0, SEEK_SET);
  st = NowMicros();
  int num;
  off_t off = 0;
  for (uint64_t i = 0; i < file_size / dsize; i++) {
    num = pwrite(fd, aligned_buf, dsize, off);
    off += 512;
    // fsync(fd);
    if (num != dsize) {
      printf("write error num %d\n", num);
      return -1;
    }
  }
  ed = NowMicros();
  printf("write time microsecond(us) %lld\n", ed - st);
```

```
  lseek(fd, 0, SEEK_SET);
  st = NowMicros();
  int num;
  off_t off = 0;
  for (uint64_t i = 0; i < file_size / dsize; i++) {
    if (i % 8 == 0) {
      memset(aligned_buf + 512, 0, 4096);
      num = pwrite(fd, aligned_buf, 4096, off);
    } else {
      num = pwrite(fd, aligned_buf, dsize, off);
    }
    off += 512;
    // fsync(fd);
    if (num != dsize && num != 4096) {
      printf("write error num %d\n", num);
      return -1;
    }
  }
  ed = NowMicros();
  printf("write time microsecond(us) %lld\n", ed - st);
```

so I get a result that the upper one
:!./a.out
write time microsecond(us) 52342962

the downer one
:!./a.out
write time microsecond(us) 24818946

It seem that the way used in InnoDB get a better performance. about  2 times than the rough one. InnoDB really deeply optimize the IO path..
[29 Mar 2019 19:42] Zongzhi Chen
upper one

Attachment: t.cc (application/octet-stream, text), 1.03 KiB.

[29 Mar 2019 19:43] Zongzhi Chen
the under one.

Attachment: t2.cc (application/octet-stream, text), 1.21 KiB.

[2 Apr 2019 13:01] MySQL Verification Team
Hi,

If I understand you correctly, you concluded that current InnoDB code is much faster then the one your proposed.

Hence, does it mean that this performance bug report should be closed ?????
[4 Apr 2019 18:22] Zongzhi Chen
Yes, you are right, InnoDB really doing well work in the IO path.
As from my result, if we don't use the "write_ahead" algorithm,  writing 1GB file with fsync() will need  52s. use the "write_ahead" algorithm, we only need 25s. That is really awesome result.

We can closed this bug report..
[5 Apr 2019 12:43] MySQL Verification Team
Hi,

Thank you very much for your feedback.
[5 Apr 2019 15:59] Inaam Rana
I've a related question about write_ahead.  We determine whether or not we want to do a write_ahead based on log.write_ahead_end_offset. It is reset to zero when we start working on a new file (start_next_file()). But during start up in log_start() it is set to current file's end offset:

log.write_ahead_end_offset =
      ut_uint64_align_up(log.current_file_end_offset, srv_log_write_ahead_size);

Is this by design? If so, what is the reason for this. Because of above, at server startup, for the duration of first log file write_ahead stayed effectively disabled.
[5 Apr 2019 17:45] Zongzhi Chen
Hi Inaam. I thought this is a bug..

I thought this line should be
log.write_ahead_end_offset =
      ut_uint64_align_up(log.current_file_real_offset, srv_log_write_ahead_size);

or we can set log.write_ahead_end_offset = 0, then log.write_ahead_end_offset will be update by the update_current_write_ahead() function when we doing the first write..
[8 Apr 2019 12:44] MySQL Verification Team
Hi Inaam,

First of all, let me know how are you ...... 

Second, are you sure that this is the only access for you available for getting the answers to the questions like this one ???

Have a nice day ...
[8 Apr 2019 14:01] Inaam Rana
Hi Sinisa,

I am doing great. Thanks!

As for asking questions here, yeah I love to involve you in the discussion :)
[8 Apr 2019 14:03] MySQL Verification Team
Inaam,

Good grief !!!
[23 Apr 2019 13:12] MySQL Verification Team
For your information, this bug is fixed in 8.0.15.