Bug #28792 Compressing duplicate column values
Submitted: 30 May 2007 23:08 Modified: 31 May 2007 0:25
Reporter: Jason Shuler Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DDL Severity:S4 (Feature request)
Version:5.x OS:Any
Assigned to: CPU Architecture:Any
Tags: compress, duplicate values, flexible enum

[30 May 2007 23:08] Jason Shuler
Description:
There are some cases where a column will contain a higher percentage of a given set of values - such that an ENUM data type would be appropriate, except that the column must also be allowed to contain arbitrary values.

For example, say we have a table that stores time-statistics that are collected every 10 seconds. Several of these columns must be full-sized integers because there are times that the numbers become quite large.
However, many of these columns are usually 0. That is, over 50% of the rows contain a value of 0.

What would be ideal in this case is to use an ENUM type that also permits arbitrary values.

In Teradata, this feature is called compression, and can be set up for any data type when creating a table:

create table ex_1 (
  col_1 INT COMPRESS(0,1),
  col_2 VARCHAR(10) COMPRESS('','A','B','C')
);

In this case, when the value of col_1 is 0 or 1, rather than storing a complete integer, a reference to the compressed value list is stored.

In a large table with many columns and rows, the space savings would be immense, and I would imagine the overhead to be low. Much of the functionality already exists with the ENUM column type.

How to repeat:
n/a

Suggested fix:
One possibility would be to create a modifier for the ENUM type that allows arbitrary values.

Another is to implement a COMPRESS() column property.
[31 May 2007 0:25] MySQL Verification Team
Thank you for the bug feature request.