1 Managing Dynamic Metadata and Context Mehmet S. Aktas Computer Science, Informatics, Pervasive...
-
Upload
clyde-rodgers -
Category
Documents
-
view
214 -
download
2
Transcript of 1 Managing Dynamic Metadata and Context Mehmet S. Aktas Computer Science, Informatics, Pervasive...
11
Managing Dynamic Metadata and Context
Mehmet S. Aktas
Computer Science, Informatics, Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
22
Outline
Motivation Research Issues Proposed Approach Evaluation Conclusions Future Work
33
Context as Service Metadata in Gaggle of Services
Context is metadata associated to both services and their activities• interaction-independent
slowly varying, quasi-static context Ex: type or endpoint of a service, less likely to change
• interaction-dependent, generated as result of interaction of services
dynamic, highly updated context information associated to a single service, a session (service activity) or both Ex: session-id, URI of the coordinator of a workflow session
Gaggle of Services • set of actively collaborating managed services dynamically
assembled for specific tasks• generate events as result of interactions• very small part of the whole Grid
44
Collaboration Grids Multimedia Collaboration domain
• collaborative A/V sessions with varying types of dynamic metadata describing group of participants
real-time metadata describing audio/video streams• Collaboration Grids has also static metadata
information about service, available sessions, and media servers
• needs a distributed real-time session metadata management systems
Characteristics of the domain• widely distributed services• metadata of events (archival data)
mostly read-only persistent, but lifetime is bounded to lifetime of events
• QoS metadata associated to A/V services, media server, etc…
55
GIS/Sensor Grids Workflow-style applications in Geographic Information
System and Sensor Grids• sensor grid data services generates events when a certain
magnitude event occurs• firing off various codes, filtering, analyzing raw data,
generating images, maps • needs a distributed workflow session metadata management
systems to correlate workflow activities Characteristics of domain
• any number of widely distributed services can be involved• conversation metadata
transient multiple writers
• rarely changing descriptive, prescriptive service metadata
66
Problem Space and Requirements Practical Problem: We need management of all information
associated with services in Gaggle of Services for;• correlating activities of widely distributed services (1, 2)• enabling uniform query capabilities to both dialog or
monolog context information (3, 4) “Give me list of services satisfying C:{a,b,c..} QoS
requirements and participating S:{x,y,z..} sessions”• management of events especially in multimedia collaboration
providing information to enable (5)• real-time replay/playback and• session failure recovery capabilities
Requirements1) dynamism 2) performance3) uniformity
4) interoperability
5) persistence
77
Different Metadata Systems- I There are different standards defining interaction-
independent meta-data, such as UDDI and its extentions
And many different implementations from (extended) UDDI through MCAT of the Storage Research Broker
And of course representations including RDF and OWL Further there is system metadata (such as UDDI for core
services) and metadata catalogs for each application domain such as WRS (Web Registry Service) for GIS
They have different scope and different QoS trade-offs• e.g. Distributed Hash Tables (Chord) to achieve scalability in
large scale networks
• UDDI-Extensions
88
Different Metadata Systems- II There are various technologies addressing interaction-dependent
meta-data. Point-to-Point
• WS-Metadata Exchange• WS-Resource Framework
Point-to-Point methodologies • are limited to communication with metadata only from the
two services.• do not scale in managing activities of widely distributed
services in workflow style grid applications WS-Context is promising it has limitations
• limited query capability
• lack of support interaction-independent metadata
• centralized – single point of failure, performance bottleneck
Centralized • WS-Context
Centralized • WS-Context
99
Managing Context UDDI & It’s Extensions WS-Context
purpose standard way of publishing, discovering generic Web Service information
standard way of maintaining distributed session state information
metadata characteristics
interaction-independent, rarely-changing, small-size
interaction-dependent, highly dynamic, small-size
types of typical queries
high degree of complexity in inquiry arguments to improve the selectivity and increase the precision in the search results
simplicity in inquiry arguments, mostly key-based retrieval queries, selectivity of queries is one.
scalability Whole Grid, UDDI is a domain-independent service for generic service metadata
Sub-Grids, modest number interacting Web Services participating an activity
most desired features
better expressiveness power (e.g., RDF-enabled UDDI Registries), up-to-date service entries, metadata-oriented discovery capabilities, domain-specific capabilities (e.g., geospatial query capabilities) and etc…
high performance, light-weight storage, up-to-date entries, notification (members of an activity should be notified of the distributed state information), synchronous callback (support for loose-coupling of services) and etc…
1010
Motivations Lack of support for providing uniform programming interface
(with advanced query capabilities) to• large scale relatively static metadata as in searchable
repository of all the world’s services and session related dynamic metadata
Lack of support for managing small scale highly dynamic metadata as in dynamic workflows for sensor integration and collaboration • fault-tolerance and ability to support dynamic changes with
few millisecond delay• but only a modest number of involved services (up to 1000’s
in a session)• ability to adapt instantaneous changes in client demands• need Session NOT Service/Resource meta-data
1111
Research Issues How can we achieve a standard way of publishing
inquiring both interaction-independent and conversation-based service metadata through a uniform programming interface?
What is a novel architecture for a decentralized Information Service managing dynamic session-related metadata of widely distributed services?
For building a decentralized metadata-system, we investigate research issues related with;• performance• scalability• fault-tolerance• consistency enforcement
1212
Our approach: Hybrid WS-Context XML Metadata Service
We designed and built a WS-Context compliant XML Metadata services supporting distributed or central paradigms. This service a Fault Tolerant and High Performance Information Service (FTHPIS).
supports extensive metadata requirements of rich interacting systems, such as • correlating activities of widely distributed services, EX:
workflow style GIS Service Oriented Architectures, AND• optimizing Grid/Web Service messaging performance, EX:
mobile computing environment, AND• managing dynamic events especially in multimedia
collaboration, EX: collaboration Grid/Web service applications, AND
• providing information to enable session failure recovery capabilities.
1313
Hybrid XML Metadata Service WS-Context + UDDI
We combine extended functionalities of these two services: WS-Context AND UDDI in one hybrid service to manage Context (service metadata).• extended WS-Context controlling a workflow• extended UDDI providing a searchable repository for services• This approach meets the interoperability and uniformity
requirements of the problem. Our approach enables advanced query capabilities on service
metadata• hybrid functions operating on both metadata spaces• extended WS-Context functions operating on session metadata,
(parent-child relationships are implemented)• extended UDDI functions operating on interaction-
independent metadata • information security functions providing a simple
authentication and authorization mechanism to the shared data.
1414
Extended UDDI WSDL Service Interface Descriptions uddi_extended.wsdl
HTTP
Hybrid WSContext Service interface combining Extended UDDI and WS-Context WSDL Descriptions uddi_wscontext.wsdl
Database
JDBC
Extended UDDI Service
WSDL
HTTP(S)
WSDL
FTHPIS Client
WSDL
FTHPIS Client
WSDL WSDL
Hybrid WSContext Service
Database
WS
DL
JDBC
Hybrid WS-Context XML Metadata Service
1515
We also designed and implemented an extended UDDI XML Metadata Service (alternative to OGC Web Registry Services). This service,
supports GIS Metadata Catalog (functional metadata), user-defined metadata ((name, value) pairs), up-to-date service information (leasing), dynamic aggregation of geospatial services.
Our approach enables advanced query capabilities• geo-spatial and temporal queries , • metadata oriented queries,• domain independent queries such as XPATH
queries on metadata catalog.
Extended UDDI XML Metadata Services
1616
Key Design Features Message Dissemination
• communication method among the nodes of the network Caching
• usage of memory-built-in storage running on each node to minimize latency and meet the performance requirement
Access• methodology for redirecting client request to an appropriate
replica server to meet dynamism and the performance requirements
Storage• methodology for replicating data to meet fault tolerance and
performance requirements Consistency enforcement
• methodology to ensure all replicas of a context to be the same
1717
Message Dissemination
Publish-Subscribe exploited to support replicated storage e.g.
• Initial storage of context
• Dissemination of context access requests
• Dissemination of updates to make copies consistent
We used open source NaradaBrokering software to provide multi-publisher multicast communication mechanism
• topic based publish/subscribe messaging system
• runs on a network of cooperating broker nodes.
• provides support for variety of QoSs, such as low latency, reliable message delivery, multiple transfer protocols, security, and so forth.
1818
HTTP(S)
WSDL
Client
WSDL
Client
HTTP
Subscriber
Publisher
Database
JDBC
Extended UDDI Service
WSDL
Database
WSDL
Hybrid-WSContext Service
JDBC
Database
WSDL
Hybrid-WSContext Service
JDBC
Topic Based Publish-Subscribe Messaging System
Replica Server-2 Replica Server-N
WSDL WSDL
Hybrid-WSContext Service
Database
WS
DL
JDBC
Distributed Hybrid WS-Context XML Metadata Services
Replica Server-1
1919
Caching Strategy TupleSpaces paradigm exploited to support caching
• asynchronous communication
• pioneered by David Gelernter
• communication units are tuples data-structure consisting of one or more typed fields
Hybrid WS-Context Service employs/extends TupleSpaces: • use of A light-weight implementation of JavaSpaces
• all memory accesses. overhead is negligible (less than 1msec. for inquiries)
• data sharing - mutual exclusive access to tuples
• associative lookup - content based search, appropriate for key-based caching
• temporal, spatial uncoupling of communicating parties
• e.g. a tuple: ("context_id", Context). This indicates a tuple with two fields: a) a string, "context_id" and b) a Java object, "Context".
• back-up with frequent time intervals for fault-tolerance
2020
Access: Request Distribution Peer-to-Peer based message distribution methodology exploited
for redirecting a client request to the appropriate replica server
• Use of pub-sub system for request distribution
• broadcast-based Context access request dissemination
• servers that can satisfy the query unicast a response with a copy of the context under demand
Advantages: does not keep track of locations of every single data, makes use of redundant copies kept only for fault-tolerance reasons, improves the responsiveness
Practical Problem: If the number of repetitive queries that require probing the network increased, this may amplify the network consumption and affect the system performance
Approach: use of dynamic replication for moving/replicating highly-demanded copies in the proximity of their requestors to minimize the need for probing the network
2121
Storage: Replica placement Peer-to-Peer based message distribution methodology exploited
for creating initial permanent-copies of a context
• Use of pub-sub system for permanent-replication
• Use of non-blocking replica placement
• 1st step: initiator creates a temporary copy at every capable replica server
• 2nd step: initiator keeps permanent copies only at a few first answering replica servers for fault-tolerance
Advantages: [1] the publishing client does not block until the replication is completed, [2] a temporary full-replication methodology exploited to improve the responsiveness, [3] permanent-copies remain as backup facility to meet the fault-tolerance requirement
2222
Storage: Dynamic replication Dynamic replication methodology exploited for creating server-
initiated (temporary) copies of a context
• Use of pub-sub system for server-initiated replication
• replication decision belongs to the server (autonomous)
• we keep the popularity (# of access requests) record for each copy of a context and flush it on regular time intervals
• unpopular server-initiated copies of a context are deleted
• popular copies of a context are moved in the proximity of their requestors (where the requests are originated)
• very popular copies of a context are replicated in the proximity of their requestors (where the requests are originated)
Advantages: [1] this strategy exploits locality which in turn improves the responsiveness, [2] this strategy also captures dynamism by adjusting the system to changing user demands
2323
Consistency enforcement Consistency enforcement methodologies exploited to keep copies
of a context consistent.
• Use of weak consistency model: copies of a context can be different, however, updates are propagated to replicas whenever it is needed for consistent view of information.
• Use of pub-sub system for update propagation
• Use of primary-copy approach, all updates for a specific context are initiated at a single server
• Use of synchronized timestamps (as versions) to give sequence to each published context to impose an order for concurrent write operations on the same data
• updates are pulled by a replica server from the primary-copy if the replica server realizes that it has a stale copy
• updates are pushed (broadcasted) by the primary-copy if it realizes that there exist a server that has not yet been updated
2424
Consistency enforcement - II Advantage: this strategy employs non-blocking primary-copy
approach, thus the publisher does not block until an update operation is completed that in turn improves responsiveness
Practical Problems: [1] with this strategy, one cannot update a data item more frequently than one operation per 30 milliseconds, which the NaradaBrokering NTP-protocol based synchronized timestamp accuracy. [2] with this strategy, a client cannot make sure if the update operation is carried out correctly.
Approach: 1 update operation per 30 millisecond is acceptable update rate considering our application use domains. As the performance is a requirement, we favor solutions that do not require blocking client applications.
2525
Prototype Evaluation We evaluated the prototype implementation for three
distinct aspects of distributed systems: Performance
baseline performance effect of the network latency on the baseline performance
Scalability performance degradation of the system under increasing
message sizes or message rates scalability gain both in numbers and in performance when
moving from a centralized system to a distributed system under the same workload.
Fault-tolerance the empirical cost of the fault-tolerance in terms of
execution time of standard operations on a tight cluster or on a network with significant network distances
2626
TESTBED: Cluster node configuration
Processor Intel® Xeon™ CPU (2.40GHz)
RAM 2GB total
Network Bandwidth900 Mbits/sec.[1] (among the cluster nodes)
OS GNU/Linux (kernel release 2.4.22)
Java VersionJava 2 platform, Standard Edition (1.4.2-beta-b19)
SOAP Engine Axis 2 (in Tomcat 5.5.8)
Machine Configurations
2727
Test-4. extended UDDI inquiry/publication
WS
DL
single threaded W
SD
L
extended UDDI Client
1 user/1000 transactions
Extended UDDI Server
Extended UDDIServer Engine
Test-1. Dummy Server
WS
DL
single threaded W
SD
L
Client
1 user/1000 transactions
Dummy Server
DummyServer
Test-2. Hybrid-WSContext inquiry/publication without database access
WS
DL
single threaded W
SD
L
WS-Context Client
1 user/1000 transactions
Hybrid-WSContext Service
PublishingQueryingModule
JDBC Handler
Expeditor
Test -3. Hybrid-WSContext inquiry/publication with database access
WS
DL
single threaded W
SD
L
WS-Context Client
1 user/1000 transactions
Hybrid-WSContext Service
PublishingQueryingModule
JDBC Handler
Expeditor
RESPONSIVENESS EXPERIMENT
2828
If query can be satisfied by Javaspaces cache, the query can be satisfied in < 1ms plus the few milliseconds of Web service overhead
comparable performance for standard operations with the existing metadata management services.
Round Trip Time Chart for Inquiry Requests
5
7
9
11
13
15
17
19
1 2 3 4 5
aver
age
resp
on
se t
ime
(mse
c) p
er r
equ
est
Test-1: Dummy service
Test-2: WS-Context inquirywith memory access
Test-3: WS-Context inquirywith dabase access
Test-4: UDDI inquiry
Metadata Services
Avg. latency for inquiries
JUDDI 40 ms
UDDI-MT 20.37 ms
JWSD 18.99 ms
Test2 - Test1 is JavaSpaces overhead
2929
TEST-1 - Hybrid-WSContext inquiry/publication with increasing message sizes
TEST-2 - Hybrid-WSContext inquiry/publication with increasing message rates (# of messages per
second)
single threaded W
SD
L
WS-Context Client
1 user/100 transactions
WS
DL
Hybrid FTHPIS-WSContext Service
PublishingQueryingModule
JDBC Handler
Expeditor
HTTP(S)
WS
DLThread
Pool
WS
DLThread
Pool
WS
DL
Hybrid-WSContext Service
PublishingQueryingModule
JDBC Handler
Expeditor
5 Client distributed to cluster nodes 1 to 5, with each running
1 to 15 threadsSCALABILITY TEST-1
3030
0
5
10
15
20
25
30
0.1 1.0 10.0 100.0
context payload size (KB)
av
g r
ou
nd
tri
p t
ime
(m
illis
ec
on
ds
)
Tinquiry=T(RTT)
Tpublication=T(RTT)
The results indicate that the system performs well for small-size context payloads.
The results also indicate that the cost of inquiry and publication operations remains the same, as the context’s payload size increases from 100Bytes up to 10KBytes.
Stdev=1.42 Stdev=2.68 Stdev=3.09
Stdev=11.03
Stdev=11.54
Stdev=8.27 Stdev=6.95 Stdev=6.72
Stdev=10.07
Stdev=13.01
3131
The system can scale up to 940 simultaneous querying clients and 222 simultaneous publishing clients where each client sending one query per second, for small size context payloads with 30 milliseconds backup interval time for fault tolerance.
Multi-core hosts will improve performance dramatically.
0
10
20
30
40
50
60
70
80
90
0 100 200 300 400 500 600 700 800 900 1000
message rate (message/per second)
avg
ro
un
d t
rip
tim
e(m
s)
inquiry message rate
publication message rate
Stdev=10.31
Stdev=39.49Stdev=53
Stdev=0.65 Stdev=0.97Stdev=0.91
Stdev=33.52
3232
HTTP(S)
WS
DLThread
Pool
WS
DLThread
Pool
5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads firing messages to randomly selected servers.
We investigate scalability when moving from a centralized server to a distributed one under heavy workloads.
Numbered rectangle shapes correspond to an N-node FTHPIS system with various Publish-Subscribe topologies (this does NOT affect performance)
5 different FTHPIS system tested when N range from 1 to 5 under the same workload.
At each testing case, same volume of data is evenly distributed among the nodes.
node-1
node-5
node-1
node-5
node-4
node-3
node-2
node-1
node-5
node-3
node-1
node-5
node-3
node-2
2 3 4 5
node-5
1
SCALABILITY TEST-2
3333
The scalability of metadata store can be increased when moving from a centralized service to a distributed system.
900
950
1000
1050
1100
1150
1200
1250
1300
1 2 3 4 5
number of nodes
me
ss
ag
e r
ate
(m
sg
/se
co
nd
)
Hybrid WS-Context inquiry operation
# of nodes message ratemean ± error (ms)
Stdev(ms)
1 940 47.05 ± 0.24 33.52
2 1005 40.76 ± 0.43 38.22
3 1082 38.58 ± 0.45 34.93
4 1148 36.28 ± 0.42 32.24
5 1221 34.13 ± 0.4 30.76
Non-optimal caching algorithm as does database access BEFORE Publish-Subscribe. Reversingthis choice should lead to throughputLinear in #nodesPub-Sub overhead~ 2ms
3434
node-1
node-5
node-4
node-3
node-2
client
node-1
node-5
node-4
node-3
node-2
link-1
link-2
link-3
link-4
client
Test-1. LAN experiment. All nodes and client are located on a tightly coupled local area network.
Test-2. WAN experiment. Nodes are located on a loosely coupled wide area network.
San Diego, CAnode-4
Bloomington, IN, CGL
node-5
Austin, TXnode-3
Tallahassee, FL
node-2
Indianapolis, IN
node-1
Bloomington, IN, CGL
client
locationsnodes
15.3 mslink-3
11.3 mslink-2
0.83 mslink-1
31.4 mslink-4
latencylinks
FAULT-TOLERANCE TEST
3535
0
2
4
6
8
10
12
14
16
18
1 2 3 4 5
number of replicas
Tim
e (m
sec)
Test1 - LAN testing case -publication
Test2 - WAN testing case -publication
Test3 - Inquiry operation (requestgranted locally with memoryaccess)
Test4 - Inquiry operation (requestgranted locally with databaseaccess)
FAULT-TOLERANCE TEST RESULTS
Fault-tolerance ?? vs. Performance??.
The lower the level of fault-tolerance, the higher the performance.
High degree of replication could be succeeded (by utilizing an asynchronous communication model) without increasing the cost of fault-tolerance.
3636
Summary of Contributions specification on managing all service metadata
• a method to achieve uniform programming interface to both interaction-
independent and session-related metadata. This method also introduces a data
model for storing session-related metadata
specification on managing interaction-independent service
metadata
• a method to achieve a Geographical Information Systems compatible, domain-
independent and metadata-oriented management of interaction-independent
service metadata
fault tolerant and high performance information service
• a method to achieve management of dynamic metadata and Context in subgrids
- dynamically assembled, collaborating, modest number of services - put
together to perform a particular task
3737
Future work transaction scheduling
• Investigate how to minimize the time required to complete transactions on two diff. metadata systems with diff. time constraints
evaluation of dynamic replication• Carry out simulations for evaluation of dynamic replication
optimal caching methodologies• Implement and test more optimal caching methodologies
smoothening the impacts of backups on performance• Investigate how to minimize the impact of the time spent (high
peeks) for backups on average transaction response time.
3838
Questions?
3939
Appendix
4040
Summary of machine configurations
Location Processor RAM OS Java Version
gf6.ucs.indiana.edu
Bloomington, IN, USA
Intel® Xeon™ CPU (2.40GHz)
2GB GNU/Linux (kernel release 2.4.22)
Java 2, STE, (1.4.2-beta-b19)
complexity.ucs.indiana.edu
Indianapolis, IN, USA
Sun-Fire-880, sun4u sparc SUNW
16GB SunOS 5.9 Java HotSpot(TM) 64-Bit Server VM(1.4.2-01)
lonestar.tacc.utexas.edu
Austing, TX, USA
Intel(R) Xeon(TM) CPU 3.20GHz
4GB GNU/Linux (kernel release 2.6.9)
Java 2, STE, (1.4.2-beta-b19)
tg-login.sdsc.teragrid.org
San Diego, CA, USA
GenuineIntel IA-64, Itanium 2, 4 processors
8GB GNU/Linux Java 2, STE, (1.4.2-beta-b19)
vlab2.scs.fsu.edu
Tallahase, FL, USA
Dual Core AMD Opteron(tm) Processor 270
2GB GNU/Linux (kernel release 2.6.16)
Java 2, STE, (1.4.2-beta-b19)
FAULT-TOLERANCE EXPERIMENT TEST BED
4141
<?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://www.w3..."> <soap:Header encodingStyle=“URL" mustUnderstand="true"> <context xmlns=“ctxt schema“ timeout="100"> <context-id>http..</context-id> <context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service> <p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header>
SOAP header
for Context
The Pattern Informatics GIS-SOA based workflow application
5,6: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id
7,8,9: HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result
10: HPSearch writes the URI of the of the output file into Context
11: WMS polls the information from Context Service
12: WMS retrieves the generated output file by workflow script and generates a map
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service>
<content> HPSearch associated additional data generated during execution of workflow. </content>
</context>
service associated
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../WMS</ context-service>
<activity-list mustUnderstand="true" mustPropagate="true">
<service>http://.../WMS</service>
<service>http://.../HPSearch</service>
</activity-list>
</context>
session
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service><parent-context>http://../abcdef:012345<parent-context/><content> profile information related WMS </content>
</context>
user profile
<context xsd:type="ContextType"timeout=“100"> <context-id>http://../abcdef:012345<context-id/>
<context-service>http://.../HPSearch</ context-service>
<content>http://danube.ucs.indiana.edu:8080\x.xml</content>
</context>shared state
<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service><parent-context>http://../abcdef:012345<parent-context/><content> shared data for HPSearch activity </content>
<activity-list mustUnderstand="true" mustPropagate="true">
<service>http://.../DataFilter1</service>
<service>http://.../PICode</service>
<service>http://.../DataFilter2</service>
</activity-list>
</context>
activity
3WMS
WFS
http://..../..../..txt
HP Search
Data Filter
PI Code
Data Filterhttp://..../..../x.gml
Context Information Service
4
7,8,9
10
6
5,11
WMS Client
Extended UDDI
0
1
2
Dynamic Metadata Examples for a GIS Workflow