Bug #74338 data_length (in SELECT FROM I_S.TABLEs and SHOW TABLE STATUS
Submitted: 12 Oct 2014 13:12 Modified: 17 Feb 2016 19:02
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:any OS:Any
Assigned to: Daniel Price CPU Architecture:Any

[12 Oct 2014 13:12] Peter Laursen
Description:
http://bugs.mysql.com/report.php says simply: 
"Data_length: The length of the data file."

Since the term *file* is used it must be related to *storage requirements* not *memory requirement*. But I don't find that the numbers returned makes any sense at all.

How to repeat:
CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `txt` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

CREATE TABLE `t2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `txt` char(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

-- insert 50 rows to each with just an 'a' in the `txt` column and next 

SELECT `TABLE_NAME`, `DATA_LENGTH` FROM  `information_schema`.`TABLES` WHERE `TABLE_SCHEMA` = 'db';

/*returns

TABLE_NAME  DATA_LENGTH  
----------  -------------
t1                  16384
t2                  16384
*/

I expected it to return a lower value with the VARCHAR table than the CHAR table as each CHAR value will require much more storage than each VARCHAR.

The numbers here - not surprisingly - also don't match at all what I find the file system.

As a finalizing joke the value "62080" is returend for a similar MEMORY table.  And there is no data on disk for that table (I did not try BLACKHOLE).

Suggested fix:
Does the metrics here have any reasonable meaning with non-MyISAM tables? I doubt. But if it does it should be elaborated in docs how it is calculated. I think it assumes MyISAM?

(inspired by http://www.percona.com/blog/2014/10/10/mysql-compression-compressed-and-uncompressed-data-...)
[13 Oct 2014 14:08] Hartmut Holzgraefe
The meaning of Data_length (and Index_length and Data_free) is storage engine specific. Unfortunately the I_S.TABLES manual page

  http://dev.mysql.com/doc/refman/5.6/en/show-table-status.html

only has minimal information about these fields and the storage
engine specific meanings.

The SHOW TABLE STATUS page has more information. It first 
just says "The length of the data file." but has more detailed 
information near the end of the page, e.g.:

  "For MEMORY tables, the Data_length, Max_data_length, and 
   Index_length values approximate the actual amount of allocated 
   memory. The allocation algorithm reserves memory in large amounts 
   to reduce the number of allocation operations."

The main entry on Data_length should probably be changed
to "usually the length of the table data file, but see
notes further down for storage engine specific differences
in meaning" or so.

What I'm also missing is that with InnoDB (file_per_table
doesn't matter in this case) the amount shown is actually
the size of pages allocated to this table so that it is
always a multiple of the page size.

In your example the 50 rows of "a" fit into the InnoDB 
VARCHAR table just fine. 

With CHAR(255) they shouldn't fit, but AFAIR with InnoDB
this internally is also stored with variable / real and
not fixed length and is converted from/to fixed/variable
format on the fly on the SQL <-> engine boundary ...

MEMORY on the other hand always stores with fixed length,
even for VARCHAR, so 50 times "a" as VARCHAR(255)-utf8
becomes 50x255x3 already ... that would be > 30KB alone.

62KB sounds a bit high, but may be due to the MEMORY allocation
strategy as quoted from the SHOW TABLE STATUS page above:

"The allocation algorithm reserves memory in large amounts 
   to reduce the number of allocation operations"

So this is documentation issue for sure, but I don't see 
any jokes in it ...

--
Hartmut "not Oracle staff anymore" Holzgraefe <hartmut@mariadb.com>
[14 Oct 2014 11:19] Peter Laursen
In addition to docs improvement I think it would be nice if the statements returned some metrics that could be used for calculation of the *degree of the utilization of tablespaces* or the *amount of free space inside tablespaces (what is basically the same, I think). As the server is able to read from and write to the part of the fle system where tablespaces reside, I think it could also get relevant information from the file system without any need for more privileges.

(and besides I still find the way the statements are implemented for MEMORY funny, no matter what documentation says. Except for .frm there is nothing on disk)
[14 Oct 2014 15:14] MySQL Verification Team
This bug points to true lacks in the documentation, so it is fully verified, primarily as a documentation issue. There is also a feature request to be considered, which is complex due to the diversity of storage engines available.
[17 Feb 2016 19:02] Daniel Price
Posted by developer:
 
SHOW TABLE STATUS documentation was updated to reflect storage engine differences for Data_length and Index_length fields. A reference to storage engine notes at the bottom of the section was also added to field descriptions, where applicable. 

Changes should appear online within 24 hours.

Thank you for the bug report.