Description:
Dear experts, i have recently revealed MySQL disk usage topic.
Decide to test mysqldiskusage as well.
First of all, little theory:
"InnoDB's data storage model uses 'spaces' often called 'tablespaces' in the context of MySQL. Each space in InnoDB is divided into pages, normally 16 Kib or 16384 bytes in size."
Let's create empty table:
mysql> create table t1(id int not null);
Query OK, 0 rows affected (0.08 sec)
[root@centos7 test_innodb]# ls -lt
total 208
-rw-rw----. 1 mysql mysql 98304 Apr 15 17:42 t1.ibd
-rw-rw----. 1 mysql mysql 8556 Apr 15 17:42 t1.frm
-rw-rw----. 1 mysql mysql 65 Apr 9 11:37 db.opt
The "Data size" of table (.ibd) file is 98304 bytes.
98304/16384 = 6 -> So in fact InnoDB reserves 6 pages.
If we query this information from information_schema.tables:
mysql> select data_length from information_schema.tables where table_schema='test_innodb' and table_name='t1';
+-------------+
| data_length |
+-------------+
| 16384 |
+-------------+
1 row in set (0.00 sec)
So now we now that in fact our table uses first 16 Kib page and another 5 is reserved. In fact it is allocated by OS.
For this reason information_schema should not be used for actual calculating disk usage.
Keeping this in mind while testing mysqldiskusage:
[root@centos7 ~]# mysqldiskusage --server=root:12345@localhost:3306
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+
| db_name | total |
+---------------------+--------------+
| data_test | 123,354 |
| employees | 440,114,089 |
| mysql | 1,750,169 |
| performance_schema | 489,543 |
| storage_test | 156,150 |
| test_innodb | 123,354 |
+---------------------+--------------+
Total database disk usage = 442,756,659 bytes or 422.25 MB
#...done.
In fact employee database size is:
[root@centos7 ~]# du -sb /var/lib/mysql/employees/
234134441 /var/lib/mysql/employees/
I have figure out that 440114089 bytes is sum of data_size + misc_files:
[root@centos7 ~]# mysqldiskusage --server=root:12345@localhost:3306 -v
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+--------------+--------------+
| db_name | data_size | misc_files | total |
+---------------------+--------------+--------------+--------------+
| data_test | 16,384 | 106,970 | 123,354 |
| employees | 205,979,648 | 234,134,441 | 440,114,089 |
| mysql | 932,604 | 817,565 | 1,750,169 |
| performance_schema | 0 | 489,543 | 489,543 |
| storage_test | 32,768 | 123,382 | 156,150 |
| test_innodb | 16,384 | 106,970 | 123,354 |
+---------------------+--------------+--------------+--------------+
Total database disk usage = 442,756,659 bytes or 422.25 MB
#...done.
Why? :)
In source code mysql-utilities-1.5.4/mysql/utilities/command/diskusage.py there is a code portion on line 400:
db_total = int(row[1]) + misc_files/ should be replaced by db_total = misc_files
And the Total size of all databases on line 403:
total += data_size + misc_files/ should be replaced by total += dbdir_size
Patch like diff:
diskusage.py
400c400
< db_total = dbdir_size
---
> db_total = int(row[1]) + misc_files
403c403
< total += dbdir_size
---
> total += data_size + misc_files
After changing source code it will give a correct result:
[root@centos7 ~]# /home/sh/test_utils/bin/mysqldiskusage --server=root:12345@localhost:3306
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+
| db_name | total |
+---------------------+--------------+
| data_test | 106,970 |
| employees | 234,134,441 |
| mysql | 1,604,191 |
| performance_schema | 489,543 |
| storage_test | 123,382 |
| test_innodb | 106,970 |
+---------------------+--------------+
Total database disk usage = 236,565,497 bytes or 225.61 MB
#...done.
Another thing i want to share that there 2 functions which names confuse me a little: (line 389,390)
dbdir_size = _get_folder_size(os.path.join(datadir, db_dir))
misc_files = _get_db_dir_size(os.path.join(datadir, db_dir))
_get_folder_size = Get size of folder (directory) and all its contents.
_get_db_dir_size = Calculate total disk space used for a given directory.
This method will sum all files in the directory except for the
MyISAM files (.myd, .myi)
_get_db_dir_size -> will calculate .ibd files as well. but why? it should calculate only .frm , .opt or if exists some other files. If it is true, then add:
diskusage.py
140c140
< if ext.upper() not in (".MYD", ".MYI",".IBD") and \
---
> if ext.upper() not in (".MYD", ".MYI") and \
And now it returns truly needed results:
[root@centos7 ~]# /home/sh/test_utils/bin/mysqldiskusage --server=root:12345@localhost:3306 -v
WARNING: Using a password on the command line interface can be insecure.
# Source on localhost: ... connected.
# Database totals:
+---------------------+--------------+-------------+--------------+
| db_name | data_size | misc_files | total |
+---------------------+--------------+-------------+--------------+
| data_test | 16,384 | 8,666 | 106,970 |
| employees | 205,979,648 | 56,233 | 234,134,441 |
| mysql | 932,604 | 244,125 | 1,604,191 |
| performance_schema | 0 | 489,543 | 489,543 |
| storage_test | 32,768 | 8,694 | 123,382 |
| test_innodb | 16,384 | 8,666 | 106,970 |
+---------------------+--------------+-------------+--------------+
Total database disk usage = 236,565,497 bytes or 225.61 MB
#...done.
How to repeat:
Install MySQL Utilities 1.5.4 from YUM repo and repeat steps from description.
Suggested fix:
Patch like code changes are provided.