Bug #69002 | InnoDB reads transaction logs on writes | ||
---|---|---|---|
Submitted: | 18 Apr 2013 21:03 | Modified: | 29 Oct 2018 18:06 |
Reporter: | Domas Mituzas | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6.12 | OS: | Linux |
Assigned to: | CPU Architecture: | Any |
[18 Apr 2013 21:03]
Domas Mituzas
[19 Apr 2013 16:39]
MySQL Verification Team
Domas, First of all, various OS's can have various page sizes, right or wrong ??? Hence, as this size is rarely configurable, you would vote for the configurable log entry sizes. Right or wrong ??? Next, with a specialized buffer for those logs, InnoDB would require less reads and would not require OS cache for it. Next, well set controller and / or hard disk caches would also lower a number of required reads. Regarding your proposed solutions, your favorite option, option 3, would require more disk reads, unless that page would be in memory until it's 4K are filled up. Tell me which of my presumptions are right and which are wrong here ....
[19 Apr 2013 17:47]
Domas Mituzas
Sinisa, 1. my knowledge is limited to very few systems, but on all of them there is a way to read OS page size setting from the OS. 2. InnoDB uses specialized buffer with innodb_flush_log_at_trx_commit=0, once you use 1 and 2 that buffer is not used, and cannot be used to offset this problem. The problem is about page-ins on partial page writes. 3. You are waving your hands. Well set controller usually has read-ahead disabled (why would you want read-ahead on write-only workload, in case of transaction log?), well set disks also have read-ahead disabled. The penalty of read-ahead is high, and we have measured it in various cases. Do note, that read-ahead properties deteriorate under heavy parallel workload, the amount of work sent by random access will make read-ahead opportunities impossible. Again, do tell me which raid/disk configuration would execute smart read-ahead in this pattern, I'm eager to hear. 4. I don't see how option 3 results in more disk reads, it results in less, it can be verified empirically. You may be correct only in case where we're talking about memory-mapped interfaces, where each write is kind of partial. InnoDB does not use mmap() for transaction logs, InnoDB uses pwrite(), and if you do page-sized pwrite to a page that is not in memory, there is no need for page-in. That can be seen, if you trace how I/O subsystems execute buffered InnoDB log I/O (flush=0). So, I have no idea, how you came up with your conclusion, but you did not involve any systems knowledge or experience in shaping it. So, I think that your presumptions 1, 2, 3 and 4 are wrong. Have a good day!
[28 Jun 2013 21:42]
James Day
Domas, nothing to surprise you in this reply, just letting you know that we know it's an issue and have during the 5.6 timeline tried various things that might or might not work better, but didn't find something that worked better without also working worse in other cases. It's something that's still on the agenda for 5.7. I don't know if the pad with zeros part your suggestion 3 was tried. I think that 1 and 2 were, along with at least one variation on 3 that did seem to be causing read-modify-write activity in tests. For others, Domas is right about the current effects: when setting innodb_log_file_size at least on linux you are also telling the OS to try to allocate that much RAM times the number of log files and have to account for that in your RAM allocation budget. Or less desirably, you might end up with physical page reads instead, if the log files are big enough for them to have been flushed from the OS buffer.
[6 Oct 2015 23:18]
Domas Mituzas
this was addressed in 5.7.4 (with no reference to this bug): "InnoDB: A new system variable, innodb_log_write_ahead_size, allows you to configure the write-ahead block size for redo logs to a value that matches the operating system or file system cache block size in order to avoid “read-on-write” for redo log writes. Read-on-write occurs when redo log blocks are not entirely cached to the operating system or file system due to a mismatch between write-ahead block size for redo logs and operating system or file system cache block size. Avoiding read-on-write improves throughput stability for redo log writes. (Bug #17571371)"
[29 Oct 2018 18:06]
Domas Mituzas
seems to be addressed