MySQL Bugs: #11644: large table with 4 byte pointers

Bug #11644	large table with 4 byte pointers
Submitted:	29 Jun 2005 17:10	Modified:	16 Jul 2005 20:38
Reporter:	Martin Friebe (Gold Quality Contributor) (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: MyISAM storage engine	Severity:	S4 (Feature request)
Version:	*	OS:	Any (*)
Assigned to:		CPU Architecture:	Any

Description:
This is more an idea, for a future release, than a feature request. (If someone@mysql, finds the idea interesting. or maybe it is anyway allready planned). And sorry, but I couldnt find a better place than the bug system.

currently you can choose for tables with variable row length to be less than 4G in size, or to have 8 Byte pointers in indexes. It is possible to increase the table size, with 4 Byte Pointers. The penality is a slight increase of the datasize, which could in some cases even be avoided.

take the following example:
id integer not null primary key,
birthdate integer not null,
lastname varchar(99),
firstname varchar(99),

the row length will be a minimum of 4 + 4 + 1 (len of varchar) +1 + 1 (for the null indicators) =11 bytes long, possible longer.

if the minrowsize = 11 was stored on the table, all index pointers could be multiplied by 11. The table could now have 44 GB of data size.

Of course a previous record could end anywhere before an 11th byte, and we dont want to fill up to 10 bytes with nothing:

Assume the 1st record ends on byte 26, and the 2nd record start at byte 27. the 2nd record with 11 bytes min length is warranteed to include possition 33. so the index pointer would be 3 (3*11 = 33). at byte 33 there would be a byte inserted into the record indicating, how many bytes earlier the record really startes (this would be 6)

so the raw table looks like
position data
26 end of record
27 record2
27-31 id
32,34-37 birthdate
38 lastname
....

if the record is read with the pointer 3 (pointing to 33). The record can start anywhere from position 23 (33-10) onwards, so reading starts from there. Evaluting the offset at pos 33, the real start can be found. while reading the record get rid of the byte at pos 33. This leaves the record as usual.

while reading records sequentialy without index, it would still be known that the "offset byte" is at the next pos, that can be divided by 11.

On a table like the above the datasize would increase roughly 9%, but indexes could be kept small.

In order to minimize growing datasize, it would be possible to use the remining bits of the "offset byte" for existing data (min-row-len 11, need 4 bit, the other 4 bit could be used for null indicators. In this case for the example above, the size of the datafile would not grow

min-row-size could also be configurable to higher values, if the person setting it knows, he will not have rows that are shorter, or accepts that shorter rows might get fill bytes.

How to repeat:
-

Suggested fix:
-

This is an interesting idea.
But it changes MyISAM file format in an incompatible way, so we need to add versioning information to MYI files first (which would be a nice feature, it is in discussion now - it would simplify lots of things)