Unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness...
-
Upload
louisa-murphy -
Category
Documents
-
view
222 -
download
0
Transcript of Unibasel Toward Replication in Grids for Digital Libraries with Freshness and Correctness...
unibasel
Toward Replication in Grids for Digital Libraries with Freshness and Correctness Guarantees*
Fuat Akal, Heiko Schuldt and Hans-Jörg Schek
<fuat.akal¦heiko.schuldt>@unibas.ch, [email protected]
University of Basel, Computer Science DepartmentBernoullistr 16, CH-4056, Basel, Switzerland
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007
* The work has been partly supported by the EU in the 6th framework programme within the project DILIGENT (contract No. IST-2003-004260).<< DIgital Library Infrastructure on Grid ENabled Technology >>
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 2unibasel
Example Scenario
• Satellite pictures of Mediterranean Sea are continuously taken and ...
• stored as complex documents in a Digital Library (DL).
• A typical activity is to generate periodical reports.
ImageFeatures
ImageFeaturesImage
Features
ImageFeaturesImage
Features
ImageFeatures
Storage PropertiesStorage PropertiesStorage PropertiesStorage Properties
<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> … <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe
Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal
</OVERLAP_REGIONS>...</DIMAP_DOCUMENT>
<DIMAP_DOCUMENT> <DATASET>MER_RR__2P</DATASET> <INSTRUMENT>MER</INSTRUMENT> … <LONMIN_INT>17000</LONMIN_INT> <LATMIN_INT>12000</LATMIN_INT> <LONMAX_INT>22000</LONMAX_INT> <LATMAX_INT>13500</LATMAX_INT> <COVER_REGIONS>World</COVER_REGIONS> <OVERLAP_REGIONS> World Europe Bigger_Europe Smaller_Europe
Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal
</OVERLAP_REGIONS>...</DIMAP_DOCUMENT>
Metadata as XML
Documents
Earth Observation
Simple Boolea
n Querie
s
Image Similarit
y Queries
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 3unibasel
Watching the Environment Closely
• Monitoring of the Mediterranean Sea
• There are some busy oil terminals in the region– Oil tankers
keep floating in the sea
– Potential oil spill into the sea
Earth Observation
Both are extremely concerned about the
environment!
Data GridData Grid
satellite images, metadata, image
features...
„I am interested inGreek coasts as of
last week“
„FreshTurkish water
please“
Scientist 1in Athens
Greece
Scientist 2in Antalya
Turkey
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 4unibasel
Desired Replica Management in the Grid
Scientist 1in Athens
Greece
Scientist 2in Antalya
Turkey
satellite images, metadata, image
features...
EntireMediterranean
TurkishCoasts
GreekCoasts
storagenode 0
sn1
sn2
sn3
GreekCoasts
Scientist 3in Thessaloniki
Greece
Data Grid
Assumption: Whole data is collected at a single node, e.g. ESA
in Italy
Assumption: Whole data is collected at a single node, e.g. ESA
in Italy
Automatic selection of the best replica from the user‘s location
Automatic selection of the best replica from the user‘s location
Replication at a higher level, e.g. collections,
subcollections.
Replication at a higher level, e.g. collections,
subcollections.
Dynamic decision on when/where to create
replicas, e.g. sn1 becomes a hot spot
Dynamic decision on when/where to create
replicas, e.g. sn1 becomes a hot spot
Freshness and correctness guarantees
on accessed data is insured, e.g. „I want
uptodate data“
Freshness and correctness guarantees
on accessed data is insured, e.g. „I want
uptodate data“Sophisticated
replication mechanism is
required!
Create Replica
Scientists may also 1) write back their reports
and/or 2) create versions of documents
or annotate
Scientists may also 1) write back their reports
and/or 2) create versions of documents
or annotate
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 5unibasel
Outline
• Digital Library built atop a grid middleware– Rich variety, structure, volume of data, e.g. traditional documents,
complex multimedia objects• Simple Boolean queries as well as sophisticated multi-feature similarity
queries
– Consistent access to up-to-date data may be essential
• Rest of the talk is...– Replication in a DB Cluster– Transition from a DB cluster to the Grid– DILIGENT Replication Architecture– Conclusions and Outlook
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 6unibasel
Replication in a DB Cluster (PDBREP)
• Available replication solutions for grid environments do not meet all of the desired properties just mentioned, e.g. freshness and correctness.
• In our previous work [VLDB2005], we devised a replication protocol for database clusters named PDBREP.– It provides already some properties of what we call desired replica
management in the Grid, e.g. freshnes, higher replication granularity.
• Our approach in this work is to start with this protocol and adapt it to the grid.
PDBREP stands for PowerDB Replication, which was a a project conducted at ETH Zurich partially supported by Microsoft.
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 7unibasel
Replication in a DB Cluster (PDBREP)
Update Node(s)
U: update(a) Q: query(a, b, fr)
a,ca,b,c,d
Coordination MiddlewareCoordination Middleware
Continuous Update
Broadcast
Read-only Nodes
Continuous Update Propagation Transactions
(only, when the node is idle)
Local Update Queue
Global Log
db,d b,c
U
w(a)
Q
r(b)r(a)
distributed query execution
fr : freshness requirement, e.g. „I am fine with 2 minutes old data“, „I want fresh data“ etc.
Refresh Transactions(on-demand)
++
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 8unibasel
Transition to the Grid
Updates Queries
Coordination MiddlewareCoordination Middleware
Update Node(s) Read-only Nodes
• We still distinguish update and read-only nodes• Potentially several update nodes
– We still assume that all updates are serialized into a global log• Broadcast of updates not feasible, replicas subscribe for changes instead• Service Oriented Architecture• More nodes which are heterogeneous• Failures are more likely to happen
Global Log
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 9unibasel
Replication Granularity
• The unit of replication is called a DataSet (DS)– A DataSet can be a collection of documents, a subcollection or as small as
a single document.
– Rule based definition: information on a specific region, documents not older than 30 days, created between date1 and date 2, etc...
Collection of Satellite
Images and its metadata
Subcollection 1 Subcollection 2
DataSet1
EntireMediterranean
TurkishCoasts
GreekCoasts
DS2
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 10unibasel
sn 1
sn 5
sn 2 sn 3
DILIGENT Grid Replication Architecture
Storage Node 4
DS1DS2
DS3
DS4
DS1
DS2 DS2
DS3
DS1 : 1DS2 : 2,3DS3 : 5DS4 : 4Replica Catalog
DS1 : 1DS2 : 2,3DS3 : 5DS4 : 4Replica Catalog
DS1 : <1, 0.7>DS2 : <2, 0.6>,<3, 0.7>DS3 : <5, 0.6>DS4 : <4, 0.6>Freshness Repository
DS1 : <1, 0.7>DS2 : <2, 0.6>,<3, 0.7>DS3 : <5, 0.6>DS4 : <4, 0.6>Freshness Repository
(1) Read(DS2(x), DS4(y), 0.6)
(2.1) Locate bestReplicas
Client
(3) Read Data
continuous propagation
Queue
....TSx, Wx, DSy
...
DS4
Update Queue
subscription
SN1 : 50%SN2 : 25%SN3 : 60%SN4 : 30%SN5 : 50%Load Repository
SN1 : 50%SN2 : 25%SN3 : 60%SN4 : 30%SN5 : 50%Load Repository
(2.2)
(2.3)
RMSRMS RSSRSS
FTSFTS
Access History
Access History (4) Log
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 11unibasel
Conclusions & Outlook
• We presented the first steps of our on-going work whose ultimate goal is to come up with a fully integrated and self-managing replication subsystem for the Grid
• We want to adapt an existing database replication mechanism, i.e. PDBREP from database clusters to data grids
• This looks feasible:– The infrastructure related assumptions like broadcasting of changes to
replicas can be replaced by a subscription mechanism easily– Additional components presented in the envisioned architecture to facilitate
scheduling of queries can be included in the PDBREP without requiring major changes.
• Implementation of the DILIGENT replication on top of gLite is still ongoing
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 12unibasel
Thank you!.. Questions?
3rd VLDB Workshop on Data Management in Grids, Wien, Austria, 23 September 2007 13unibasel
References
1. DILIGENT: A DIgital Library Infrastructure on Grid ENabled Technology. http://www.diligentproject.org/. IST-2003-004260
2. F. Akal, C. T¨urker, H.-J. Schek, Y. Breitbart, T. Grabs, and L. Veen. Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees. In VLDB, pages 565–576, 2005.