Bug #78827 Speedup replication of compressed tables
Submitted: 14 Oct 2015 6:32 Modified: 14 Oct 2015 6:43
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: compression, innodb, replication

[14 Oct 2015 6:32] Daniël van Eeden
Description:
Replication and InnoDB compressed tables are not efficiently working together.

Related: Bug #46435 	compress binary files

How to repeat:
The current situation:
1. Create a table on the master with ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 ENGINE=InnoDB
2. Insert into this table (INSERT, LOAD DATA INFILE)

Now it will compress the data as insert it into the table om the master.
It will replicate the uncompressed row images (RBR) or the uncompressed statements to the slave.
Then the SQL Thread (or specific worker) has to do the compression and write the row to the table.
Without parallel replication this will result in slave lag. With parallel replication this is less severe, but can still be the case.

Suggested fix:
1. Create a separate compression thread on the slave.
2. Try to do compression with multiple threads. 
3. Already try to compress data for the next event (predictive compression, probably not possible or hard)
4. Replicate the already compressed data from the master to the slave. (e.g. a new compressed RBR format)
[14 Oct 2015 6:43] Daniël van Eeden
Another idea: Work with the connectors to make it possible to do the compression/decompression there (optionally!).

Now we have the compressed client-server protocol and compressed InnoDB tables with a lot of compression/decompression (especially with replication).

Would be nice if it could work like this
1. connector compresses data and sends this to the server
2. server writes this compressed to the table (iblogs, etc) and binlogs
3. slave also writes this compressed data to the table
4. client requests data from the slave and gets this compressed data
5. connector decompresses the data.

This is already possible with a BLOB field, but that's not transparent to the client.
This makes validation of the data hard, but if the compression is done per field for BLOB fields this should work. For TEXT fields is will be hard to filter out invalid utf8 sequences or bytes > 127 for ascii.