Bug #31538 Backup file is too large.
Submitted: 11 Oct 2007 17:36 Modified: 26 Feb 2008 1:41
Reporter: Chuck Bell
Status: Closed
Category:Server: Backup Severity:S2 (Serious)
Version:5.2 OS:Any
Assigned to: Chuck Bell Target Version:
Triage: D3 (Medium)

[11 Oct 2007 17:36] Chuck Bell
Description:
The output file (archive) of online backup is too large and is unmanageable. For some
storage engines, the output of the backup can be up to 20 times larger than the raw data
files.

The output file (archive) should be approximately the same size of the raw data files.
The size could be larger due to the storage of metadata, but the size should not be
significantly larger than the raw data files.

How to repeat:
Create a table of the following structure (or similar):

CREATE TABLE `client_transaction_arc` (
  `client_transaction_id` int(11) NOT NULL DEFAULT '0',
  `client_id` int(11) NOT NULL DEFAULT '0',
  `investment_id` int(11) NOT NULL DEFAULT '0',
  `action` varchar(10) NOT NULL,
  `price` decimal(12,2) NOT NULL DEFAULT '0.00',
  `number_of_units` int(11) NOT NULL DEFAULT '0',
  `transaction_status` varchar(10) NOT NULL,
  `transaction_sub_timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `transaction_comp_timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `description` varchar(200) DEFAULT NULL,
  `broker_id` bigint(10) DEFAULT NULL,
  `broker_commission` decimal(10,2) DEFAULT NULL
) ENGINE=ARCHIVE DEFAULT CHARSET=latin1;

Insert 100,000 rows. 

Run the backup and observe the size of the archive and compare that to the raw data
files.

Suggested fix:
Change the default driver (and consistent snapshot) to use the binary log format for
packed rows. This should achieve a significant reduction in size for any tables
containing large fixed-sized fields.

In order to achieve the goal of an archive of comparable size to the raw data files, the
capability to pipe the stream (backup archive) to a compression application must be
enabled. Once that is available, the archive file should be approximately the same size
as the raw data files.
[11 Oct 2007 17:39] Chuck Bell
Solution found. Minor mods to rpl_record::unpack needed.
[11 Oct 2007 23:42] Chuck Bell
Patch ready for review.

http://lists.mysql.com/commits/35421
[25 Oct 2007 15:49] Chuck Bell
Submitted a second patch for review. See http://lists.mysql.com/commits/36305
[29 Oct 2007 9:45] Rafal Somla
Good to push.
[1 Nov 2007 20:54] Chuck Bell
Patch relies on another patch (WL#3324) but will be pushed on 6 November.
[25 Feb 2008 21:19] Bugs System
Pushed into 6.0.5-alpha
[26 Feb 2008 1:41] Paul DuBois
Noted in 6.0.5 changelog.

Output for Online Backup was too large, in some cases up to 20 times
larger than the raw data files for some storage engines.
[14 Mar 2008 2:30] Paul DuBois
Correction: No changelog entry needed; this bug did not appear in any released version.