Bug #53776 Feature Request: Binary log space reduction/compression/hashing
Submitted: 19 May 2010 5:22 Modified: 19 May 2010 6:04
Reporter: Roel Van de Paar Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: General Severity:S4 (Feature request)
Version:5.5+ OS:Any
Assigned to: CPU Architecture:Any

[19 May 2010 5:22] Roel Van de Paar
Description:
C:\mysql5146\data>dir roelt\a.*
19/05/2010  03:17 PM             8,580 a.frm
19/05/2010  03:18 PM        24,999,975 a.MYD
19/05/2010  03:18 PM        10,263,552 a.MYI
               3 File(s)     35,272,107 bytes

C:\mysql5146\data>dir log-bin.000042
19/05/2010  03:18 PM       133,889,681 log-bin.000042

Even worse on InnoDB (use set @@global.innodb_flush_log_at_trx_commit=0; to make it quicker):

C:\mysql5146\data>dir log-bin.000044
19/05/2010  03:21 PM       229,889,585 log-bin.000044

How to repeat:
FLUSH TABLES;
FLUSH LOGS;
DELIMITER //
DROP PROCEDURE IF EXISTS setuplarge//
CREATE PROCEDURE setuplarge(IN numrows INT)
BEGIN
 DECLARE a INT;
 SET a = 1;
 DROP TABLE IF EXISTS a;
 CREATE TABLE `a` (`id` int,`p` char(20),PRIMARY KEY(`id`)) ENGINE=MyISAM;
 WHILE (a < numrows) DO
  INSERT INTO a VALUES (a,"       empty         ");
  SET a=a+1; 
 END WHILE;
END;
//
DELIMITER ;
CALL setuplarge(1000000); /* Takes about 40 seconds */
FLUSH TABLES;
FLUSH LOGS;
SHOW BINARY LOGS;
[19 May 2010 5:50] Roel Van de Paar
This is using statement based logging.
[19 May 2010 5:57] MySQL Verification Team
each row in the binlog is 129 bytes in length, as apposed to 25 bytes in the MYD tables. 

# at 2583
#100519  7:39:16 server id 1  end_log_pos 2712 
# Position  Timestamp   Type   Master ID        Size      Master Pos    Flags 
#      a17 84 79 f3 4b   02   01 00 00 00   81 00 00 00   98 0a 00 00   10 00
#      a2a 01 00 00 00 00 00 00 00  04 00 00 1a 00 00 00 40 |................|
#      a3a 00 00 01 00 00 00 00 00  00 00 00 06 03 73 74 64 |.............std|
#      a4a 04 08 00 08 00 08 00 74  65 73 74 00 49 4e 53 45 |.......test.INSE|
#      a5a 52 54 20 49 4e 54 4f 20  61 20 56 41 4c 55 45 53 |RT.INTO.a.VALUES|
#      a6a 20 28 20 4e 41 4d 45 5f  43 4f 4e 53 54 28 27 61 |...NAME.CONST..a|
#      a7a 27 2c 31 35 29 2c 22 20  20 20 20 20 20 20 65 6d |..15..........em|
#      a8a 70 74 79 20 20 20 20 20  20 20 20 20 22 29  |pty...........|
# 	Query	thread_id=1	exec_time=0	error_code=0
SET TIMESTAMP=1274247556/*!*/;
INSERT INTO a VALUES ( NAME_CONST('a',15),"       empty         ")
/*!*/;
[19 May 2010 5:59] Roel Van de Paar
It's all metadata. Basically this repeated over and over:

MyISAM:
==========
# at 13926
#100519 15:17:59 server id 1  end_log_pos 14057         Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1274246279/*!*/;
INSERT INTO a VALUES ( NAME_CONST('a',102),"       empty         ")
==========

InnoDB:
==========
# at 84231
#100519 15:20:33 server id 1  end_log_pos 84300         Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1274246433/*!*/;
BEGIN
/*!*/;
# at 84300
#100519 15:20:33 server id 1  end_log_pos 84431         Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1274246433/*!*/;
INSERT INTO a VALUES ( NAME_CONST('a',369),"       empty         ")
/*!*/;
# at 84431
#100519 15:20:33 server id 1  end_log_pos 84458         Xid = 3001127
COMMIT/*!*/;
==========

Can this be optimized? Maybe some sort of on-the-fly compression for the binary logs, or at least similar items could be dropped or hashed?
[19 May 2010 6:04] Valeriy Kravchuk
I agree that more space-efficient logging in needed in this case.
[19 May 2010 6:22] Roel Van de Paar
RBL: MyISAM: 93Mb InnoDB: 133Mb
MIXED: MyISAM: 133Mb InnoDB: 229Mb