Bug #71696 | Provide an option to compress binlog events | ||
---|---|---|---|
Submitted: | 13 Feb 2014 7:11 | Modified: | 4 Jan 2015 1:55 |
Reporter: | Simon Mudd (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Utilities: Binlog Events | Severity: | S4 (Feature request) |
Version: | 5.7.3 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | binlogs, events, save space, windmill |
[13 Feb 2014 7:11]
Simon Mudd
[13 Feb 2014 8:35]
MySQL Verification Team
Hello Simon, Thank you for the feature request! Thanks, Umesh
[30 Dec 2014 12:43]
Daniël van Eeden
This could be a duplicate of Bug #46435
[2 Jan 2015 7:59]
Simon Mudd
I'm not sure that compressing binlog files is that helpful as to _read_ from them you need to know the offset into the binlog file to get the data. That makes the compressed files complicated to read without extra overhead. Compressing individual events on the other hand means the existing behaviour of MySQL to find a binlog event, including when a slave connects to a master, should stay unchanged. Only the content of the event would change being compressed (if this uses less space) or not otherwise and so the only change needed is to have a way to prior to writing compressing the event, and saving compressed if this is smaller, and on read uncompressing the event and processing the uncompressed event as normal afterwards. Most non-RBR events are just plain text so large SQL statements (I use a lot of large ... IN ( ... ) lists, or large IODKU statements) should compress pretty well. I've not checked how well the binlog events may compress but would expect a saving especially if not in minimal RBR mode. I've discussed this with various people in the past but I guess that you need to have a heavy enough write load (or write large enough binlog files) for this to matter, and perhaps for many people this is not really a problem. On some of my systems binlog space is an issue so reducing the size would be a useful win.
[2 Jan 2015 16:25]
Daniël van Eeden
Compression per event as opposed to compressing the whole file would indeed solve the issue with offsets. But in SBR this might not save a lot as the statements can be small. Some algorithms like snappy might be okay with this and won't compress an event if it's not worth it. But compression in larger blocks might work: Compress blocks of 32M, then you can calculate which blocks you need and decompress those. I encountered the need for compression in some small setup which used SBR and RBR. We had limited space, but wanted to retain the binlogs for some time. What we did is copy&compress the binlogs and the purge them from the server. When we needed them we had to uncompress them and used them with mysqlbinlog or re-add them to the server to make them available for slaves. Thus compression of in-active binlogs is useful. What might be an issue for compression blocks of 32M is that the binlogs must be written at (almost) every commit. When the last blocks needs to be rewritten for every commit this won't work from a performance standpoint. So an algorithm should be chosen which allow parts of the block to be written. The compression should do: - Reduce storage size - Improve performance (Less I/O needed per binlog event) - Reduce replication traffic
[4 Jan 2015 1:55]
Simon Mudd
I see "compressed events" as just another event type. Within that event once it's uncompressed whatever is stored there would be treated as normal. Indeed, compression should be tried, but if it's not more compact then the uncompressed event should always be saved to the binlog, so when it doesn't work there's no overhead as the original event will be sent downstream "unchanged". On most of the systems I use many cron type tasks do large inserts up to max_allowed_packet, often using IODKU and it's on this type of event I see most gain being had. I'd expect plain RBR to be compressable as the before and after image is sent for each row change, so unless the whole row changes on updates you're likely to get a lot of duplication. Similarly minimal RBR is less likely to generate a gain given it sends the PK+ changed columns so "a minimal image". Many people also use LOAD DATA INFILE which if sent via replication often provides a good compressable stream, so this is another place where size may be important. As you say if the way to compress the event is configurable then indeed there may be people who look for a small saving to avoid extra delay or cpu load on the master compressing or the slave uncompressing the event, but there may be people in your case where you want the maximum compression possible as disk space is the main concern. So there are plenty of possible options should this ever be taken up and as indicated several different use cases. This is also where I'd love to see it easier to build extra modules/plugins into MySQL and then just provide the hooks for others to build the appropriate functionality. This particular feature would probably work quite well that way if such infrastructure were in place. Not for 5.6 or likely for 5.7 but maybe something to consider for 5.8?
[4 Jan 2015 10:32]
Daniël van Eeden
For easier installation of plugins: Bug #73802 For easier building: a build service similar to the OpenSUSE Build Service? And for binlogs a new type of plugin should be created: Something like a binlog_write_plugin which is called with the original event and the returns a compressed (or encrypted/etc) event. On the slave the custom event type should then call a handler for that event type to decrypt/decompress the event after it's read. I could imagine this could also be interesting for enrichment of the binlog (similar to query log events for rbr).