Bug #76703 mysqldiskusage incorrect approach to calculate disk usage
Submitted: 15 Apr 2015 13:31 Modified: 16 Mar 2016 23:42
Reporter: Shahriyar Rzayev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Utilities Severity:S1 (Critical)
Version:1.5.4 OS:Linux (CentOS 7)
Assigned to: CPU Architecture:Any

[15 Apr 2015 13:31] Shahriyar Rzayev
Description:
Dear experts, i have recently revealed MySQL disk usage topic.
Decide to test mysqldiskusage as well.
First of all, little theory:

"InnoDB's data storage model uses 'spaces' often called 'tablespaces' in the context of MySQL. Each space in InnoDB is divided into pages, normally 16 Kib or 16384 bytes in size."

Let's create empty table:

mysql> create table t1(id int not null);
Query OK, 0 rows affected (0.08 sec)

[root@centos7 test_innodb]# ls -lt
total 208
-rw-rw----. 1 mysql mysql 98304 Apr 15 17:42 t1.ibd
-rw-rw----. 1 mysql mysql  8556 Apr 15 17:42 t1.frm
-rw-rw----. 1 mysql mysql    65 Apr  9 11:37 db.opt

The "Data size" of table (.ibd) file is 98304 bytes.
98304/16384 = 6 -> So in fact InnoDB reserves 6 pages.

If we query this information from information_schema.tables:

mysql> select data_length from information_schema.tables where table_schema='test_innodb' and table_name='t1';
+-------------+
| data_length |
+-------------+
|       16384 |
+-------------+
1 row in set (0.00 sec)

So now we now that in fact our table uses first 16 Kib page and another 5 is reserved. In fact it is allocated by OS.
For this reason information_schema should not be used for actual calculating disk usage.

Keeping this in mind while testing mysqldiskusage:

[root@centos7 ~]# mysqldiskusage --server=root:12345@localhost:3306
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+
| db_name             |       total  |
+---------------------+--------------+
| data_test           | 123,354      |
| employees           | 440,114,089  |
| mysql               | 1,750,169    |
| performance_schema  | 489,543      |
| storage_test        | 156,150      |
| test_innodb         | 123,354      |
+---------------------+--------------+

Total database disk usage = 442,756,659 bytes or 422.25 MB

#...done.

In fact employee database size is:

[root@centos7 ~]# du -sb /var/lib/mysql/employees/
234134441	/var/lib/mysql/employees/

I have figure out that 440114089 bytes is sum of data_size + misc_files:

[root@centos7 ~]# mysqldiskusage --server=root:12345@localhost:3306 -v
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+--------------+--------------+
| db_name             |   data_size  |  misc_files  |       total  |
+---------------------+--------------+--------------+--------------+
| data_test           | 16,384       | 106,970      | 123,354      |
| employees           | 205,979,648  | 234,134,441  | 440,114,089  |
| mysql               | 932,604      | 817,565      | 1,750,169    |
| performance_schema  | 0            | 489,543      | 489,543      |
| storage_test        | 32,768       | 123,382      | 156,150      |
| test_innodb         | 16,384       | 106,970      | 123,354      |
+---------------------+--------------+--------------+--------------+

Total database disk usage = 442,756,659 bytes or 422.25 MB

#...done.

Why? :)

In source code mysql-utilities-1.5.4/mysql/utilities/command/diskusage.py there is a code portion on line 400:
db_total = int(row[1]) + misc_files/ should be replaced by db_total = misc_files

And the Total size of all databases on line 403:
total += data_size + misc_files/ should be replaced by total += dbdir_size

Patch like diff:

diskusage.py
400c400
<             db_total = dbdir_size
---
>             db_total = int(row[1]) + misc_files
403c403
<         total += dbdir_size
---
>         total += data_size + misc_files

After changing source code it will give a correct result:

[root@centos7 ~]# /home/sh/test_utils/bin/mysqldiskusage --server=root:12345@localhost:3306
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+
| db_name             |       total  |
+---------------------+--------------+
| data_test           | 106,970      |
| employees           | 234,134,441  |
| mysql               | 1,604,191    |
| performance_schema  | 489,543      |
| storage_test        | 123,382      |
| test_innodb         | 106,970      |
+---------------------+--------------+

Total database disk usage = 236,565,497 bytes or 225.61 MB

#...done.

Another thing i want to share that there 2 functions which names confuse me a little: (line 389,390)

dbdir_size = _get_folder_size(os.path.join(datadir, db_dir))
misc_files = _get_db_dir_size(os.path.join(datadir, db_dir))

_get_folder_size = Get size of folder (directory) and all its contents.
_get_db_dir_size = Calculate total disk space used for a given directory.
    This method will sum all files in the directory except for the
    MyISAM files (.myd, .myi)

_get_db_dir_size -> will calculate .ibd files as well. but why? it should calculate only .frm , .opt or if exists some other files. If it is true, then add:

diskusage.py
140c140
<         if ext.upper() not in (".MYD", ".MYI",".IBD") and \
---
>         if ext.upper() not in (".MYD", ".MYI") and \

And now it returns truly needed results:

[root@centos7 ~]# /home/sh/test_utils/bin/mysqldiskusage --server=root:12345@localhost:3306 -v
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+-------------+--------------+
| db_name             |   data_size  | misc_files  |       total  |
+---------------------+--------------+-------------+--------------+
| data_test           | 16,384       | 8,666       | 106,970      |
| employees           | 205,979,648  | 56,233      | 234,134,441  |
| mysql               | 932,604      | 244,125     | 1,604,191    |
| performance_schema  | 0            | 489,543     | 489,543      |
| storage_test        | 32,768       | 8,694       | 123,382      |
| test_innodb         | 16,384       | 8,666       | 106,970      |
+---------------------+--------------+-------------+--------------+

Total database disk usage = 236,565,497 bytes or 225.61 MB

#...done.

How to repeat:
Install MySQL Utilities 1.5.4 from YUM repo and repeat steps from description.

Suggested fix:
Patch like code changes are provided.
[15 Apr 2015 13:36] Shahriyar Rzayev
Changed Affected version number.
[15 Apr 2015 15:51] Shahriyar Rzayev
Fixed typo
[18 May 2015 6:17] Shahriyar Rzayev
Patch for diskusage.py file

Attachment: mysql_bug_76703.patch (text/x-patch), 1022 bytes.

[16 Feb 2016 23:25] Philip Olson
I'm not sure exactly what changed here, or if the patch was applied without other changes, so changing status to "Need Doc Info".
[18 Feb 2016 6:20] Shahriyar Rzayev
I have seen the change of Status as -> "Patch approved" -> "Documenting"
[16 Mar 2016 23:54] Philip Olson
Fixed as of the upcoming MySQL Utilities 1.6.3, and here is the changelog entry:

The mysqldiskusage utility incorrectly calculated disk
usage, leading to an inaccurate result.

Thank you for the bug report.