Bug #31538 Backup file is too large.
Submitted: 11 Oct 2007 15:36 Modified: 26 Feb 2008 0:41
Reporter: Chuck Bell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S2 (Serious)
Version:5.2 OS:Any
Assigned to: Chuck Bell CPU Architecture:Any

[11 Oct 2007 15:36] Chuck Bell
Description:
The output file (archive) of online backup is too large and is unmanageable. For some storage engines, the output of the backup can be up to 20 times larger than the raw data files.

The output file (archive) should be approximately the same size of the raw data files. The size could be larger due to the storage of metadata, but the size should not be significantly larger than the raw data files.

How to repeat:
Create a table of the following structure (or similar):

CREATE TABLE `client_transaction_arc` (
  `client_transaction_id` int(11) NOT NULL DEFAULT '0',
  `client_id` int(11) NOT NULL DEFAULT '0',
  `investment_id` int(11) NOT NULL DEFAULT '0',
  `action` varchar(10) NOT NULL,
  `price` decimal(12,2) NOT NULL DEFAULT '0.00',
  `number_of_units` int(11) NOT NULL DEFAULT '0',
  `transaction_status` varchar(10) NOT NULL,
  `transaction_sub_timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `transaction_comp_timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `description` varchar(200) DEFAULT NULL,
  `broker_id` bigint(10) DEFAULT NULL,
  `broker_commission` decimal(10,2) DEFAULT NULL
) ENGINE=ARCHIVE DEFAULT CHARSET=latin1;

Insert 100,000 rows. 

Run the backup and observe the size of the archive and compare that to the raw data files.

Suggested fix:
Change the default driver (and consistent snapshot) to use the binary log format for packed rows. This should achieve a significant reduction in size for any tables containing large fixed-sized fields.

In order to achieve the goal of an archive of comparable size to the raw data files, the capability to pipe the stream (backup archive) to a compression application must be enabled. Once that is available, the archive file should be approximately the same size as the raw data files.
[11 Oct 2007 15:39] Chuck Bell
Solution found. Minor mods to rpl_record::unpack needed.
[11 Oct 2007 21:42] Chuck Bell
Patch ready for review.

http://lists.mysql.com/commits/35421
[25 Oct 2007 13:49] Chuck Bell
Submitted a second patch for review. See http://lists.mysql.com/commits/36305
[29 Oct 2007 8:45] Rafal Somla
Good to push.
[1 Nov 2007 19:54] Chuck Bell
Patch relies on another patch (WL#3324) but will be pushed on 6 November.
[25 Feb 2008 20:19] Bugs System
Pushed into 6.0.5-alpha
[26 Feb 2008 0:41] Paul DuBois
Noted in 6.0.5 changelog.

Output for Online Backup was too large, in some cases up to 20 times
larger than the raw data files for some storage engines.
[14 Mar 2008 1:30] Paul DuBois
Correction: No changelog entry needed; this bug did not appear in any released version.