Bug #36850 data type storage requirements
Submitted: 21 May 2008 11:59 Modified: 15 Oct 2009 10:41
Reporter: Roland Bouman Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: MC Brown CPU Architecture:Any

[21 May 2008 11:59] Roland Bouman
The page http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html is becoming outdated. It is written from the perspective that data types somehow correspond to a particular amount of storage. 

This is not the case. Storage is in the domain of the storage engine. The storage engine chooses whatever way and whatever resources it needs to store data. This means that it is simply not possible to generically list the storage requirements per data type (even though there may be a certain common figure among storage engines)

There is a robust common factor though: the internal *runtime* representation of data. The amount of memory used to represent say, an INT is 4 bytes. 

Other parts in the doc mix up storage an internal representation too. For example: 

"The maximum size of a row in a MyISAM table is 65,535 bytes."

Even if this is the case, it is not relevant for this doc topic. What is higly relevent is that the maximum length of the internal buffer used to store row data is 64k. This is also why no table of whatever storage eninge can define rows that, as counted by summing the sizes of the runtime representations of the different data types, exceed 64 Kb row length.

How to repeat:

Suggested fix:
[21 May 2008 12:36] Paul DuBois
Actually, there is a row size limit of 65,535 bytes:

[21 May 2008 12:45] Roland Bouman
Paul, the point is that the current doc mentions MyISAM, which is irrelevant here. 

My initial report does mention the internal row size limit too, and that is indeed something that one needs to realize when discussing the internal size occupied by values of the different data types.

So, yes, there is an internal row size limit of 64k, but it has got nothing to do with "storage", and also not with a particular storage engine like MyISAM.
[21 May 2008 13:06] MySQL Verification Team
Thank you for the bug report.
[21 May 2008 19:43] Peter Laursen
It should also be clarified how that 'internal row size limit of 64k' applies to BLOB/TEXT types. For (disk) *storage* it does not apply (BLOB/TEXT type data are not stored within the table) but how for *memory*

If there is an 'internal row size limit of 64k' how then use a LONGBLOB (that to my best knowledge can be up to ~4 GB).

So I must assume that the 'internal row size limit of 64k' does not apply to BLOBs/TEXTs too?

Btw: also here http://bugs.mysql.com/bug.php?id=34689 i pointed to another paragraph that is also written with MyISAM in mind only.  That was *not a bug* however ... ;-)
[21 May 2008 19:47] Peter Laursen
Please ignore the btw: link was wrong and I am not able to find the right one!
[21 May 2008 19:52] Roland Bouman

good point. It would be nice to have the impact of blob storage noted here too. TO the best of my knowledge, BLOB data is stored "somewhere" in memory at runtime. The thing that is considered the "row" is limited to 64k. BLOB and TEXT fields write a pointer into that row buffer, and the pointer is the location where the data is at. 

Another way of putting it is to say that BLOBs and TEXT are stored "out of line" and other data is stored "inline".

I am not entirely sure if it works exactly like this, but this design allows you to have many rows in memory, without actually loading all the BLOBs in memory too. Instead, the BLOB data can be loaded only when the pertinent row is processed, which can then materialize the BLOB data.
[15 Oct 2009 10:41] MC Brown
I've reworded and expanded the content in this section to try and make it clearer what is going on and what is being described here, in addition to making some notes about the effects (And vagaries) of other storage engines and their impact on storage. 

Since it impossible to determine the storage requirements of engines that use inline compression (Archive, Falcon, Maria), or engines that use variable storage lengths according to the value being stored (most notable Falcon), adding more detail here is not practical. 

I have also added a separate note to the todo list so that we can at least attempt to describe these limits and storage requirements on an engine-by-engine basis in the future.