MySQL Bugs: #37980: Incorrect backup file size in online backup table.

Bug #37980	Incorrect backup file size in online backup table.
Submitted:	9 Jul 2008 3:52	Modified:	13 Aug 2009 14:53
Reporter:	Hema Sridharan	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Backup	Severity:	S3 (Non-critical)
Version:	mysql-6.0-backup	OS:	Linux
Assigned to:	Rafal Somla	CPU Architecture:	Any

Description:
1) Create database and tables.
2) Perform backup and check the size of backup file.
This size varies from the size that is shown in online backup table.
 

How to repeat:
 The following table shows the actual backup file size against the one that is reported in the online backup_table. I have performed some iterations with different backup sizes with compression as well as without compression.
 
    Before compression  After Compression       Online backup_table
 
1)     846MB                 54MB                730478924bytes ~ 696MB
2)     158MB                 10MB                136023748bytes ~ 130MB 
3)     1.1GB                 69MB                932102360bytes ~ 888MB  

The online backup table shows the same size for both compressed data and  uncompressed data.   

mysql>select * from mysql.online_backup\G
*************************** 3. row ***************************
          backup_id: 3
         process_id: 0
         binlog_pos: 0
        binlog_file: NULL
       backup_state: complete
          operation: backup
          error_num: 0
        num_objects: 1
        total_bytes: 730478924
validity_point_time: 2008-07-08 15:21:13
         start_time: 2008-07-08 15:21:12
          stop_time: 2008-07-08 15:24:49
host_or_server_name: localhost
           username: root
        backup_file: /export/home/tmp/backup_dmp/dbb1.bak.gz
       user_comment:
            command: backup database dbb to '/export/home/tmp/backup_dmp/dbb1.bak.gz' 

with compression
            engines: Snapshot
*************************** 4. row ***************************
          backup_id: 4
         process_id: 0
         binlog_pos: 0
        binlog_file: NULL
       backup_state: complete
          operation: backup
          error_num: 0
        num_objects: 1
        total_bytes: 730478924
validity_point_time: 2008-07-08 17:27:49
         start_time: 2008-07-08 17:27:49
          stop_time: 2008-07-08 17:31:05
host_or_server_name: localhost
           username: root
        backup_file: /export/home/tmp/backup_dmp/dbb1_wocompr
       user_comment:
            command: backup database dbb to '/export/home/tmp/backup_dmp/dbb1_wocompr'
            engines: Snapshot

Many thanks for reporting a bug but this is not a bug:

create database bug37980 character set utf8;
use bug37980;
create table t(id serial, t text character set utf8);
delimiter #
create procedure fill_t() 
begin 
declare i integer default 0; 
while i<100 do 
insert into t(t)
select(replace(make_set(1024*1024*1024*4*rand(),'4','b','c','a','s','f','g','h','i','j','p
','l','1','6','o','k','q','r','e','t','u','v','8','y','z','0','m','2','3','d','5','n','7',
'x','9','.',',','>','<'),',','')) as word;
set i = i + 1; 
end while; 
end#
delimiter ;
call fill_t;

backup database bug37980 to 't1';

backup database bug37980 to 't2' with compression compression_algorithm=gzip;

$ file t1
t1: data

$ file t2
t2: gzip compressed data, from Unix

$ ls -lh
-rw-rw---- 1 myuser mygroup 8,2K 2008-07-11 15:20 t1
-rw-rw---- 1 myuser mygroup 2,8K 2008-07-11 15:21 t2

as you can see the file is compressed.

total_bytes does not mean the size of the file it means how many bytes will be backuped.

Means: your database has x bytes and these x bytes will be backuped, doesn't matter how huge the file size is.

Look here:

bug37980$ ls -lh
-rw-rw---- 1 myuser mygroup 3328 2008-07-11 15:20 t.MYD
-rw-rw---- 1 myuser mygroup 4096 2008-07-11 15:20 t.MYI

The size of .MYI + .MYD = 7426

mysql> select * from mysql.online_backup\G
...
total_bytes: 7426
...

As you can see, it is the size of the database not of the backup files. The backup files have less bytes.

The above explanation can hold true only with myisam storage engine (they store data in .MYI and .MYD format). But I can also take backup from Innodb, Falcon and other storage engines. 

Innodb and other engines manages table contents differently than the myisam engine (no MYI/MYD here). Myisam storage engine should not become a benchmark for all other storage engines and eventually there shouldn't be any discrepancy between the backed up file vs online backup file size.  Please see the example below,

CREATE DATABASE y;
USE y;
CREATE TABLE t1(id int)ENGINE=INNODB;
INSERT INTO t1 VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(0);
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;

INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;

BACKUP DATABASE y to 'y.bak';
BACKUP DATABASE y to 'y.bak.gz' WITH COMPRESSION;

-rw-rw----  1 ndbdev ndbdev  16K Jul 15 03:13 y.bak
-rw-rw----  1 ndbdev ndbdev  331 Jul 15 03:13 y.bak.gz

(From data directory)
y> ls -lrt
total 16
-rw-rw----  1 ndbdev ndbdev 8556 Jul 15 03:12 t1.frm
-rw-rw----  1 ndbdev ndbdev   65 Jul 15 03:12 db.opt

from online backup table;
***************************5. row ***************************
        total_bytes: 7680
        ......
        backup_file: /data1/backup_dmp/y.bak
       user_comment:
            command: backup database y to '/data1/backup_dmp/y.bak'
            engines: Snapshot
*************************** 6. row ***************************
          
        total_bytes: 7680
        ......
        backup_file: /data1/backup_dmp/y.bak.gz
       user_comment:
            command: backup database y to '/data1/backup_dmp/y.bak.gz' with compression
            engines: Snapshot

From the above 
backup file size: 16K  (uncompressed) 
backup file size: 331bytes (compressed)
from online backup table: 7680 bytes ~ 7.5K (for both compressed and uncompressed)

I think that total_bytes (in backup table)should show the size of the resulting backup file but it is not showing it. Example are mentioned above.

I'm not sure if it is realistic to expect that total_bytes reports the size of the resulting backup file. For one thing, there are scenarios where backup is done to e.g., a named pipe and then we have no file whose size could be masured. Also, a file is an operating system concept and I don't think a database server should try to access or refer to it directly.

My suggestion would be that total_bytes shows the number of bytes processed by a BACKUP/RESTORE command, which can be different from the size of the resulting backup file.

I think we need more discussion about that and make a clear decision about what should be reported in total_bytes. Only then we can fully establish whether we have a bug or not and fix it if needed. Hence, setting the status to "To be fixed later".

I think there are three alternatives for the meaning of the size:

1. Processed data inside the backup kernel
2. Backup Image file size *before* compression/encryption
3. Backup Image file size *after* compression/encryption

I think 3 is most useful and user friendly.

The original intent of this bug is a duplicate of a later bug BUG#41898. By solving BUG#41898, the solution supercedes the problems and theories/conclusions stated here.