Bug #36783 save 56% disc space for int. and a little for txt cols by new column definitions
Submitted: 18 May 2008 12:40 Modified: 13 Jan 2009 16:48
Reporter: Vorname Nachname Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Data Types Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: column definitions, disc space, INTEGER, text

[18 May 2008 12:40] Vorname Nachname
Description:
By not using the old column definitions like tinyint, smallint, bigint and so on but new ones you can save about 56% disc space which make your database queries faster.

Suggestion for new definitions:

------------------------------------------
old: tinyint
new: int_1

old signed values: -128 - 127
new signed values:    0 -   1

old unsigned values: 0 - 255
new unsigned values: 0 -   1

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 1 bit

you save: 87.5%
------------------------------------------
old: tinyint
new: int_2

old signed values: -128 - 127
new signed values:   -2 -   1

old unsigned values: 0 - 255
new unsigned values: 0 -   3

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 2 bit

you save: 75%
------------------------------------------
old: tinyint
new: int_3

old signed values: -128 - 127
new signed values:   -4 -   3

old unsigned values: 0 - 255
new unsigned values: 0 -   7

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 3 bit

you save: 62.5%
------------------------------------------
old: tinyint
new: int_4

old signed values: -128 - 127
new signed values:   -8 -   7

old unsigned values: 0 - 255
new unsigned values: 0 -  15

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 4 bit

you save: 50%
------------------------------------------
old: tinyint
new: int_5

old signed values: -128 - 127
new signed values:  -16 -  15

old unsigned values: 0 - 255
new unsigned values: 0 -  31

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 5 bit

you save: 37.5%
------------------------------------------
old: tinyint
new: int_6

old signed values: -128 - 127
new signed values:  -32 -  31

old unsigned values: 0 - 255
new unsigned values: 0 -  63

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 6 bit

you save: 25%
------------------------------------------
old: tinyint
new: int_7

old signed values: -128 - 127
new signed values:  -64 -  63

old unsigned values: 0 - 255
new unsigned values: 0 - 127

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 7 bit

you save: 12.5%
------------------------------------------
old: tinyint
new: int_8

old signed values: -128 - 127
new signed values: -128 - 127

old unsigned values: 0 - 255
new unsigned values: 0 - 255

old disc/memory space requirement: 1 byte
new disc/memory space requirement: 1 byte

you save: 0%

It's exactly the same as before. We can keep the old definitions and just handle "tinyint" as "int_8" to be downwards compatible.
------------------------------------------

------------------------------------------
old: smallint
new: int_9

old signed values: -32768 - 32767
new signed values:   -256 -   255

old unsigned values: 0 - 65535
new unsigned values: 0 -   511

old disc/memory space requirement: 2 bytes
new disc/memory space requirement: 9 bit

you save: 43.75%
------------------------------------------
old: smallint
new: int_10

old signed values: -32768 - 32767
new signed values:   -512 -   511

old unsigned values: 0 - 65535
new unsigned values: 0 -  1023

old disc/memory space requirement:  2 bytes
new disc/memory space requirement: 10 bit

you save: 37.5%
------------------------------------------
old: smallint
new: int_11

old signed values: -32768 - 32767
new signed values:  -1024 -  1023

old unsigned values: 0 - 65535
new unsigned values: 0 -  2047

old disc/memory space requirement:  2 bytes
new disc/memory space requirement: 11 bit

you save: 31.25%
------------------------------------------
old: smallint
new: int_12

old signed values: -32768 - 32767
new signed values:  -2048 -  2047

old unsigned values: 0 - 65535
new unsigned values: 0 -  4095

old disc/memory space requirement:  2 bytes
new disc/memory space requirement: 12 bit

you save: 25%
------------------------------------------
old: smallint
new: int_13

old signed values: -32768 - 32767
new signed values:  -4096 -  4095

old unsigned values: 0 - 65535
new unsigned values: 0 -  8191

old disc/memory space requirement:  2 bytes
new disc/memory space requirement: 13 bit

you save: 18.75%
------------------------------------------
old: smallint
new: int_14

old signed values: -32768 - 32767
new signed values:  -8192 -  8191

old unsigned values: 0 - 65535
new unsigned values: 0 - 16383

old disc/memory space requirement:  2 bytes
new disc/memory space requirement: 14 bit

you save: 12.5%
------------------------------------------
old: smallint
new: int_15

old signed values: -32768 - 32767
new signed values: -16384 - 16383

old unsigned values: 0 - 65535
new unsigned values: 0 - 32767

old disc/memory space requirement:  2 bytes
new disc/memory space requirement: 15 bit

you save: 6.25%
------------------------------------------
old: smallint
new: int_16

old signed values: -32768 - 32767
new signed values: -32768 - 32767

old unsigned values: 0 - 65535
new unsigned values: 0 - 65535

old disc/memory space requirement: 2 bytes
new disc/memory space requirement: 2 bytes

you save: 0%

It's exactly the same as before. We can keep the old definitions and just handle "smallint" as "int_16" to be downwards compatible.
------------------------------------------

And so on. If we allow any number behind the "int_" this change would make the integer columns compatible to any integer no matter how big the numbers in future will be. Think of "int_120" or "int_200" and so on.

The same could be done for text columns. Instead of forcing every developer to learn how long a "tinytext", a "text", a "mediumtext" and a "longtext" may be we could just write "text_17" instead. This would make column definitioning easier. Although by using "text_5" instead of "tinytext" you would save much less than in the integer examples.

But in no case you would need more disc/memory space than with the old definitions. In worst case you need exactly the same (see "tinyint"/"int_8" example).

How to repeat:
use old definitions

Suggested fix:
use new definitions
[18 May 2008 12:47] Vorname Nachname
oh sorry, new signed values of int_1 should be -1 and 0 instead of 0 and 1
[13 Jan 2009 16:48] Susanne Ebrecht
Many thanks for writing a feature request. We will discuss this.