Description:
Hi all!
I would like to expose two possible improves.
Mostly hard job for any database, engine, is to retrieve data from disk. Common databases uses different data types organized in tables. Mostly sqls only retrieves few fields from different tables. Datawarehouse spend a lot of time in aggregate functions and the complexity is in the different functions for each field. I’m newbie in Mysql and I missed this capability. Why organize data joined together if we use separately?
When databases only have few data types (strings, numbers, lobs, objects (geospatial, java-code, etc), (any more?)??) we only will have 4-6 table types.
Most efficient and used data are commonly numbers.
With a miscellaneous metadata indicating PK & rowid we can manage same structure as if we have multiple fields table.
Metadata
PK Table rowid
1 1 1
1 2 1
….
2 1 2
2 2 2
…
This internal table will describe multifields tables, standard point of view.
But monofield table only will have 2 fields rowid & data
Table1 Table2
Rowid field1 Rowid field1
1 222 1 245
2 112 2 13
Under this point of view, we can manage better, for example, privileges on data.
We can parallelize summary functions.
Specially in DWHs data will be continuos and will be managed better.
Consider that you partition data by weeks. One file for week and field. No matter if we have thousands of monofield tables, managing same data types by SO blocks will be faster.
Second idea, is to develop as soon as possible drivers for NVIDIA CUBA parallelism. Geospatial operations, aggregate operations, etc will take adventage of it, and Oracle Imperium will be submited!
thanks
Jorge
(Excuse my poor english)
How to repeat:
Hi all!
I would like to expose two possible improves.
Mostly hard job for any database, engine, is to retrieve data from disk. Common databases uses different data types organized in tables. Mostly sqls only retrieves few fields from different tables. Datawarehouse spend a lot of time in aggregate functions and the complexity is in the different functions for each field. I’m newbie in Mysql and I missed this capability. Why organize data joined together if we use separately?
When databases only have few data types (strings, numbers, lobs, objects (geospatial, java-code, etc), (any more?)??) we only will have 4-6 table types.
Most efficient and used data are commonly numbers.
With a miscellaneous metadata indicating PK & rowid we can manage same structure as if we have multiple fields table.
Metadata
PK Table rowid
1 1 1
1 2 1
….
2 1 2
2 2 2
…
This internal table will describe multifields tables, standard point of view.
But monofield table only will have 2 fields rowid & data
Table1 Table2
Rowid field1 Rowid field1
1 222 1 245
2 112 2 13
Under this point of view, we can manage better, for example, privileges on data.
We can parallelize summary functions.
Specially in DWHs data will be continuos and will be managed better.
Consider that you partition data by weeks. One file for week and field. No matter if we have thousands of monofield tables, managing same data types by SO blocks will be faster.
Second idea, is to develop as soon as possible drivers for NVIDIA CUBA parallelism. Geospatial operations, aggregate operations, etc will take adventage of it, and Oracle Imperium will be submited!
thanks
Jorge