HEAVEN A Hierarchical Storage and Archive Environment for Multidimensional Array-DBMS Bernd Reiner...
-
Upload
eustacia-baker -
Category
Documents
-
view
222 -
download
0
Transcript of HEAVEN A Hierarchical Storage and Archive Environment for Multidimensional Array-DBMS Bernd Reiner...
HEAVENA Hierarchical Storage and Archive Environment for
Multidimensional Array-DBMS
Bernd Reiner
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 2
Array DBMS
Multidimensional object (MDD)
set of multidimensional tiles
tile = subarray
Access to subsets of MDDsAccess to subsets of MDDs
Multidimensional query language RasQLMultidimensional query language RasQL
Indextiles stored in relational DBMS BLOBS
multidimensional index (R+ tree)
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 3
Motivation
Increasing amount of data (up to Petabyte)
Hard disks too small/expensive to hold hundreds of Terabytes
Typically data stored as files on Hierarchical Storage Management Systems (HSM-System, e.g. Tapes)
DBMS only used for Metadata
With the multidimensional array DBMS RasDaMan only subsets must be transferred instead of whole MDDs
Include archived data in DBMS data accessInclude archived data in DBMS data access
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 4
System Architecture
ClientClient
RasQLRasQL
DBMSOracle / DB2
DBMSOracle / DB2
SQLSQL
DBMS(on HDD)
DBMS(on HDD)
Tertiary Storage ManagerTertiary Storage Manager
RasDaMan ServerRasDaMan Server
File Storage ManagerFile Storage Manager
MultidimensionalArray DBMS
Online Nearline Offline
Offline Storage
HSM
import
exportmigrate
stageCache
HSM
import
exportmigrate
stageCache
Hierarchical Storage
Management System
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 5
Optimization
Minimization of tape access operationsTiling, Object-Framing, Caching
Minimization of media exchange operationsClustering, ordered Query-Queue, “lazy eject”
Minimization of positioning timeClustering, ordered Query-Queue
ParallelizationInter, intra object parallelization
Publications: VLDB 2002, DEXA 2002, DEXA 2003
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 6
Super-TileSuper-Tile
algorithmalgorithm
Export to Tertiary Media
Tile 1Tile 1 Tile 2Tile 2 Tile 3Tile 3 Tile 4Tile 4
ST-1 ST-2 ST-3 ST-4
exportexport
Magnetic TapeMagnetic Tape
One TileOne Tile
Preserves multidim. clusteringon Tape
Preserves multidim. clusteringon Tape
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 7
Import from Tertiary Media
Super-Tiles
Super-Tiles
computecompute
ImportSuper-Tiles
ImportSuper-Tiles
RasDaManviewer
RasDaManviewer
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 9
Datenimport TCT (Objekt: mpim4d mit 1,35 GByte)
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Super-Tile-Nr. (48 MByte)
Zeit
[Sek
unde
n] Positionierung
Lesen der Super-Tilesvon DLT-Magnetband
DLT4000average accesstime 68s
Partitioning of data random
Data Retrieval from HSM
Super-Tile-No. (48 MByte)
Tim
e (s
ec.)
Object: mpim4d (1,35 GByte)
Positioning
Read data
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 10
Datenimport TCT (Objekt: mpim4d mit 1,35 GByte)
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Super-Tile-Nr. (48 MByte)
Zei
t [S
eku
nd
en]
Positionierung
Lesen der Super-Tilesvon DLT-Magnetband
Data Retrieval from HSM
Partitioning of data Super-Tile clustering
Super-Tile-No. (48 MByte)
Tim
e (s
ec.)
Object: mpim4d (1,35 GByte)
Positioning
Read data
DLT4000average accesstime 68s
© 2004 FORWISS / [email protected]
EDBT, March 2004, Slide 11
Clustering vs. Random order
Datenimport (Objekt: mpim4d mit 1,35 GByte)
19,66
43,78
0
5
10
15
20
25
30
35
40
45
50
Zei
t [M
inu
ten
]
Super-Tile clustering
random order
Tim
e (s
ec.)