Bug #69842 doublewrite buffer is triplewriting
Submitted: 26 Jul 2013 0:30 Modified: 30 Jul 2013 4:25
Reporter: Domas Mituzas Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.1+ OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[26 Jul 2013 0:30] Domas Mituzas
Description:
For each page write happening to data files InnoDB will write out 16k page via doublewrite buffer.
For each 8k page write that means 16k of "doublewriting", so it is already triple-write buffer.
For each 4k page write that means quintuple-writing. 
My english language is limited to point out appropriate naming for 2k and 1k cases.

How to repeat:
Change a page in InnoDB, observe doublewriting.

Suggested fix:
Use either separate doublewrite buffers for different page sizes or actually support variable page sizes within doublewrite buffer.
[29 Jul 2013 12:20] Sinisa Milivojevic
We will fix this one later in the future. Right now, we can't fix it unless we allow different pages sizes in the double write buffer as noted by bug reporter.

However, problem is  misstated to make it sound worse than it is. It is true that when we write out a say a 4K page (only possible with compressed tables) we use one slot of 16K from from the double write buffer, where the first 4K is used and the remaining 12K is junk.

One future solution would be to have a double write buffer for every page size that we support. This would be easiest and cleanest solution.
[29 Jul 2013 18:16] Domas Mituzas
"to make it sound worse than it is"

No, I stated it exactly how it is - there is unnecessary write amplification when 16k slots are used for lower sized pages. 

And indeed, multiple double write buffers of different sizes was something that sounded like a decent idea last time we checked here, especially if they can be done with sufficient write parallelism.
[30 Jul 2013 0:30] Domas Mituzas
Also, can we please not close bugs that are open issues for MySQL users? It does not show up in search, etc, when you do 'deferred'.
[30 Jul 2013 4:25] James Day
I've changed this to a verified feature request and set the initial owner appropriately.

Is write amplification of a sort for spinning disks and some SSDs (if doublewrite is placed on SSD) and not others that do on-drive automatic compression to reduce physical write volume.

James Day, MySQL Senior Principal Support Engineer, Oracle
[30 Jul 2013 11:27] Mark Callaghan
2k-pages --> octuple-writing
1k-pages -->hexadecatuple writing
[31 Jul 2013 11:03] James Day
If you are interested in this there is a good chance that you will also be interested in these things that are delivered already in 5.6:

1. Compressed tables and the new innodb_log_compressed_pages option not to write the content of the page before compression to the InnoDB redo log. That reduces the amount of redo writing. There are also other improvements to compression summarised at https://blogs.oracle.com/mysqlinnodb/entry/innodb_compression_improvements_in_mysql . Many of those originated with excellent suggestions from Facebook and you might also find this presentation by their Nizam Ordulu of interest: http://www.slideshare.net/nizameddin/getting-innodb-compressionreadyforfacebookscale . The option is very minimally described at http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_log_compressed... but there's a whole manual section on compression that is worth reading: http://dev.mysql.com/doc/innodb/1.1/en/innodb-compression.html .

2. The new binlog_row_image options not to write the whole of a before row image to the binary log (minimal) or to skip just for unneeded text and blob columns (noblob). This also means they don't get sent over the wire to slaves, reducing network traffic and slave relay log space use. The image is used to look up rows on a replication slave when there is no primary key. You don't need the image if you have a primary key. Luis Soares wrote more about these images: http://d2-systems.blogspot.co.uk/2011/04/mysql-562-dm-optimized-row-based.html . The manual page is https://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_... .

If you care about disk I/O I recommend that you give serious consideration to using those options. The main reason that they are not enabled by default is backwards compatibility and/or risk that comes in part from us not knowing how every system out there is configured. We considered enabled by default but that's why we rejected the idea.

James Day, MySQL Senior Principal Support Engineer, Oracle