Bug #100149 Request for method to get disk space per InnoDB table
Submitted: 8 Jul 2020 3:12 Modified: 13 Jul 2020 15:29
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:8.0.20 OS:Any
Assigned to: CPU Architecture:Any

[8 Jul 2020 3:12] Mark Callaghan
Description:
This is a request for a method to get the amount of disk space per InnoDB table, including indexes. In addition I want this method to be simple, reliable and efficient. The existing methods are simple but neither reliable nor efficient.

They aren't efficient because they require an entry in the table cache so if there are many tables there can be much churn in the table cache and more CPU/IO overhead in creating those entries.

They aren't reliable because:
1) Data is only provided for entries in the table cache. See "InnoDB tables are represented in this view if they have been opened since the last server restart and have not aged out of the table cache." as explained in: 
https://dev.mysql.com/doc/refman/8.0/en/information-schema-innodb-tablestats-table.html

The workaround for this is to run "select * from $foo limit 1" for a table before trying to get the size, but that is a lousy workaround.

2) Even when an entry is in the cache the values for innodb_tablestats clust_index_size and other_index size are zero after ANALYZE TABLE has been run per "ANALYZE TABLE clears table statistics from the INFORMATION_SCHEMA.INNODB_TABLESTATS table and sets the STATS_INITIALIZED column to Uninitialized. Statistics are collected again the next time the table is accessed." as explained in:
https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html

I am not sure this is documented, but in some of my workflows the values are also zero immediately after creating a secondary index.

Maybe it is about to be removed, but index_length in SHOW TABLE STATUS is frequently zero for tables that have secondary indexes.

How to repeat:
AFAIK code works as documented so there isn't a bug. I just doubt the feature is usable today.

But a simple repro is create a table, insert some data, run ANALYZE and then check the values in innodb_tablestats
[8 Jul 2020 3:12] Mark Callaghan
This is my experience report after a few hours of struggling with this feature today...

tl;dr - if you want to monitor the size of InnoDB tables stay far away from IS.innodb_table_stats and SHOW TABLE STATUS. Just run "ls" on the *.ibd files and avoid the fun below.

I am confused while writing scripts that get the size of an InnoDB table.

Lets start with IS.innodb_tablestats. That works for 8.0, it was named IS.innodb_sys_tablestats in 5.7. Have fun adding if/then/else to scripts that support multiple versions.

https://dev.mysql.com/…/information-schema-innodb-tablestat…

OK, so I should run ANALYZE to populate the stats for tables I want to monitor?
"This value might be null if no statistics are collected yet for the table."

https://dev.mysql.com/…/information-schema-innodb-tablestat…

Watch out, ANALYZE TABLE $foo clears values in IS.innodb_tablestats for $foo and to get the values back you need to access $foo (select, etc). Now my script gets more ugly. Why is this exposed to the user?

https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html

But 'select * from $foo limit 1' doesn't always work. In one case my workload is: load table, create secondary indexes, load more, read+write. Immediately after create index the other_index_size column is 0, run the "select * from $foo" query and it is still zero. Only if many queries + inserts are done will the column become non-zero for the entry in IS.innodb_tablestats.

Wait, maybe I can go back to using SHOW TABLE STATUS? Nope, Index_length is 0 there for my test tables that have much data and 3 secondary indexes.

It would be nice if clust_index_size and other_index_size were always truthy. These aren't statistics that require sampling from the table to compute.
[8 Jul 2020 5:45] MySQL Verification Team
Hello Mark,

Thank you for reasonable feature request!

Thanks,
Umesh
[10 Jul 2020 8:20] Frederic Descamps
Hi Mark, 

If you set information_schema_stats_expiry to 0:

set global information_schema_stats_expiry=0;

then you can use the following query:

SELECT NAME, TABLE_ROWS, format_bytes(data_length) DATA_SIZE,        
      format_bytes(index_length) INDEX_SIZE,        
      format_bytes(data_length+index_length) TOTAL_SIZE,
      format_bytes(data_free) DATA_FREE,  FILE_SIZE,
      format_bytes(FILE_SIZE) 'FILE SIZE',
      format_bytes((FILE_SIZE/10 - (data_length/10 + index_length/10))*10) WASTED_SIZE   
FROM information_schema.TABLES as t 
JOIN information_schema.INNODB_TABLESPACES as it
  ON it.name = concat(table_schema,"/",table_name)  
ORDER BY (file_size) desc limit 5;
+-----------------+------------+-------------+------------+------------+------------+------------+------------+-------------+
| NAME            | TABLE_ROWS | DATA_SIZE   | INDEX_SIZE | TOTAL_SIZE | DATA_FREE  | FILE_SIZE  | FILE SIZE  | WASTED_SIZE |
+-----------------+------------+-------------+------------+------------+------------+------------+------------+-------------+
| imdb/cast_info  |   19974383 | 942.00 MiB  |    0 bytes | 942.00 MiB |    0 bytes | 2394947584 | 2.23 GiB   | 1.31 GiB    |
| imdb/movie_info |    8611299 | 1006.98 MiB | 188.73 MiB | 1.17 GiB   |    0 bytes | 1426063360 | 1.33 GiB   | 164.28 MiB  |
| big/testing     |   18086645 | 1.05 GiB    |    0 bytes | 1.05 GiB   |    0 bytes | 1145044992 | 1.07 GiB   | 20.02 MiB   |
| big/testing2    |    8658846 | 492.00 MiB  |    0 bytes | 492.00 MiB |    0 bytes |  541065216 | 516.00 MiB | 24.00 MiB   |
| big/pktest      |    1617798 | 225.00 MiB  | 135.00 MiB | 360.00 MiB |    0 bytes |  390070272 | 372.00 MiB | 12.00 MiB   |
+-----------------+------------+-------------+------------+------------+------------+------------+------------+-------------+
5 rows in set (0.10 sec)

[root@imac ~]# ls -l /var/lib/mysql/imdb/movie_info.ibd 
-rw-r----- 1 mysql mysql 1426063360 May 20 22:59 /var/lib/mysql/imdb/movie_info.ibd
[root@imac ~]# ls -l /var/lib/mysql/imdb/cast_info.ibd 
-rw-r----- 1 mysql mysql 2394947584 May 21 21:30 /var/lib/mysql/imdb/cast_info.ibd

Cheers,
[13 Jul 2020 15:29] Mark Callaghan
Thank you for the advice