1 Managing Dynamic Metadata and Context Mehmet S. Aktas Computer Science, Informatics, Pervasive...

11

Managing Dynamic Metadata and Context

Mehmet S. Aktas

Computer Science, Informatics, Pervasive Technology Laboratories

Indiana University Bloomington IN 47401

[email protected]

22

Outline

Motivation Research Issues Proposed Approach Evaluation Conclusions Future Work

33

Context as Service Metadata in Gaggle of Services

Context is metadata associated to both services and their activities• interaction-independent

slowly varying, quasi-static context Ex: type or endpoint of a service, less likely to change

• interaction-dependent, generated as result of interaction of services

dynamic, highly updated context information associated to a single service, a session (service activity) or both Ex: session-id, URI of the coordinator of a workflow session

Gaggle of Services • set of actively collaborating managed services dynamically

assembled for specific tasks• generate events as result of interactions• very small part of the whole Grid

44

Collaboration Grids Multimedia Collaboration domain

• collaborative A/V sessions with varying types of dynamic metadata describing group of participants

real-time metadata describing audio/video streams• Collaboration Grids has also static metadata

information about service, available sessions, and media servers

• needs a distributed real-time session metadata management systems

Characteristics of the domain• widely distributed services• metadata of events (archival data)

mostly read-only persistent, but lifetime is bounded to lifetime of events

• QoS metadata associated to A/V services, media server, etc…

55

GIS/Sensor Grids Workflow-style applications in Geographic Information

System and Sensor Grids• sensor grid data services generates events when a certain

magnitude event occurs• firing off various codes, filtering, analyzing raw data,

generating images, maps • needs a distributed workflow session metadata management

systems to correlate workflow activities Characteristics of domain

• any number of widely distributed services can be involved• conversation metadata

transient multiple writers

• rarely changing descriptive, prescriptive service metadata

66

Problem Space and Requirements Practical Problem: We need management of all information

associated with services in Gaggle of Services for;• correlating activities of widely distributed services (1, 2)• enabling uniform query capabilities to both dialog or

monolog context information (3, 4) “Give me list of services satisfying C:{a,b,c..} QoS

requirements and participating S:{x,y,z..} sessions”• management of events especially in multimedia collaboration

providing information to enable (5)• real-time replay/playback and• session failure recovery capabilities

Requirements1) dynamism 2) performance3) uniformity

4) interoperability

5) persistence

77

Different Metadata Systems- I There are different standards defining interaction-

independent meta-data, such as UDDI and its extentions

And many different implementations from (extended) UDDI through MCAT of the Storage Research Broker

And of course representations including RDF and OWL Further there is system metadata (such as UDDI for core

services) and metadata catalogs for each application domain such as WRS (Web Registry Service) for GIS

They have different scope and different QoS trade-offs• e.g. Distributed Hash Tables (Chord) to achieve scalability in

large scale networks

• UDDI-Extensions

88

Different Metadata Systems- II There are various technologies addressing interaction-dependent

meta-data. Point-to-Point

• WS-Metadata Exchange• WS-Resource Framework

Point-to-Point methodologies • are limited to communication with metadata only from the

two services.• do not scale in managing activities of widely distributed

services in workflow style grid applications WS-Context is promising it has limitations

• limited query capability

• lack of support interaction-independent metadata

• centralized – single point of failure, performance bottleneck

Centralized • WS-Context

Centralized • WS-Context

99

Managing Context UDDI & It’s Extensions WS-Context

purpose standard way of publishing, discovering generic Web Service information

standard way of maintaining distributed session state information

metadata characteristics

interaction-independent, rarely-changing, small-size

interaction-dependent, highly dynamic, small-size

types of typical queries

high degree of complexity in inquiry arguments to improve the selectivity and increase the precision in the search results

simplicity in inquiry arguments, mostly key-based retrieval queries, selectivity of queries is one.

scalability Whole Grid, UDDI is a domain-independent service for generic service metadata

Sub-Grids, modest number interacting Web Services participating an activity

most desired features

better expressiveness power (e.g., RDF-enabled UDDI Registries), up-to-date service entries, metadata-oriented discovery capabilities, domain-specific capabilities (e.g., geospatial query capabilities) and etc…

high performance, light-weight storage, up-to-date entries, notification (members of an activity should be notified of the distributed state information), synchronous callback (support for loose-coupling of services) and etc…

1010

Motivations Lack of support for providing uniform programming interface

(with advanced query capabilities) to• large scale relatively static metadata as in searchable

repository of all the world’s services and session related dynamic metadata

Lack of support for managing small scale highly dynamic metadata as in dynamic workflows for sensor integration and collaboration • fault-tolerance and ability to support dynamic changes with

few millisecond delay• but only a modest number of involved services (up to 1000’s

in a session)• ability to adapt instantaneous changes in client demands• need Session NOT Service/Resource meta-data

1111

Research Issues How can we achieve a standard way of publishing

inquiring both interaction-independent and conversation-based service metadata through a uniform programming interface?

What is a novel architecture for a decentralized Information Service managing dynamic session-related metadata of widely distributed services?

For building a decentralized metadata-system, we investigate research issues related with;• performance• scalability• fault-tolerance• consistency enforcement

1212

Our approach: Hybrid WS-Context XML Metadata Service

We designed and built a WS-Context compliant XML Metadata services supporting distributed or central paradigms. This service a Fault Tolerant and High Performance Information Service (FTHPIS).

supports extensive metadata requirements of rich interacting systems, such as • correlating activities of widely distributed services, EX:

workflow style GIS Service Oriented Architectures, AND• optimizing Grid/Web Service messaging performance, EX:

mobile computing environment, AND• managing dynamic events especially in multimedia

collaboration, EX: collaboration Grid/Web service applications, AND

• providing information to enable session failure recovery capabilities.

1313

Hybrid XML Metadata Service WS-Context + UDDI

We combine extended functionalities of these two services: WS-Context AND UDDI in one hybrid service to manage Context (service metadata).• extended WS-Context controlling a workflow• extended UDDI providing a searchable repository for services• This approach meets the interoperability and uniformity

requirements of the problem. Our approach enables advanced query capabilities on service

metadata• hybrid functions operating on both metadata spaces• extended WS-Context functions operating on session metadata,

(parent-child relationships are implemented)• extended UDDI functions operating on interaction-

independent metadata • information security functions providing a simple

authentication and authorization mechanism to the shared data.

1414

Extended UDDI WSDL Service Interface Descriptions uddi_extended.wsdl

HTTP

Hybrid WSContext Service interface combining Extended UDDI and WS-Context WSDL Descriptions uddi_wscontext.wsdl

Database

JDBC

Extended UDDI Service

WSDL

HTTP(S)

WSDL

FTHPIS Client

WSDL

FTHPIS Client

WSDL WSDL

Hybrid WSContext Service

Database

WS

DL

JDBC

Hybrid WS-Context XML Metadata Service

1515

We also designed and implemented an extended UDDI XML Metadata Service (alternative to OGC Web Registry Services). This service,

supports GIS Metadata Catalog (functional metadata), user-defined metadata ((name, value) pairs), up-to-date service information (leasing), dynamic aggregation of geospatial services.

Our approach enables advanced query capabilities• geo-spatial and temporal queries , • metadata oriented queries,• domain independent queries such as XPATH

queries on metadata catalog.

Extended UDDI XML Metadata Services

1616

Key Design Features Message Dissemination

• communication method among the nodes of the network Caching

• usage of memory-built-in storage running on each node to minimize latency and meet the performance requirement

Access• methodology for redirecting client request to an appropriate

replica server to meet dynamism and the performance requirements

Storage• methodology for replicating data to meet fault tolerance and

performance requirements Consistency enforcement

• methodology to ensure all replicas of a context to be the same

1717

Message Dissemination

Publish-Subscribe exploited to support replicated storage e.g.

• Initial storage of context

• Dissemination of context access requests

• Dissemination of updates to make copies consistent

We used open source NaradaBrokering software to provide multi-publisher multicast communication mechanism

• topic based publish/subscribe messaging system

• runs on a network of cooperating broker nodes.

• provides support for variety of QoSs, such as low latency, reliable message delivery, multiple transfer protocols, security, and so forth.

1818

HTTP(S)

WSDL

Client

WSDL

Client

HTTP

Subscriber

Publisher

Database

JDBC

Extended UDDI Service

WSDL

Database

WSDL

Hybrid-WSContext Service

JDBC

Database

WSDL


JDBC

Topic Based Publish-Subscribe Messaging System

Replica Server-2 Replica Server-N

WSDL WSDL


Database

WS

DL

JDBC

Distributed Hybrid WS-Context XML Metadata Services

Replica Server-1

1919

Caching Strategy TupleSpaces paradigm exploited to support caching

• asynchronous communication

• pioneered by David Gelernter

• communication units are tuples data-structure consisting of one or more typed fields

Hybrid WS-Context Service employs/extends TupleSpaces: • use of A light-weight implementation of JavaSpaces

• all memory accesses. overhead is negligible (less than 1msec. for inquiries)

• data sharing - mutual exclusive access to tuples

• associative lookup - content based search, appropriate for key-based caching

• temporal, spatial uncoupling of communicating parties

• e.g. a tuple: ("context_id", Context). This indicates a tuple with two fields: a) a string, "context_id" and b) a Java object, "Context".

• back-up with frequent time intervals for fault-tolerance

2020

Access: Request Distribution Peer-to-Peer based message distribution methodology exploited

for redirecting a client request to the appropriate replica server

• Use of pub-sub system for request distribution

• broadcast-based Context access request dissemination

• servers that can satisfy the query unicast a response with a copy of the context under demand

Advantages: does not keep track of locations of every single data, makes use of redundant copies kept only for fault-tolerance reasons, improves the responsiveness

Practical Problem: If the number of repetitive queries that require probing the network increased, this may amplify the network consumption and affect the system performance

Approach: use of dynamic replication for moving/replicating highly-demanded copies in the proximity of their requestors to minimize the need for probing the network

2121

Storage: Replica placement Peer-to-Peer based message distribution methodology exploited

for creating initial permanent-copies of a context

• Use of pub-sub system for permanent-replication

• Use of non-blocking replica placement

• 1st step: initiator creates a temporary copy at every capable replica server

• 2nd step: initiator keeps permanent copies only at a few first answering replica servers for fault-tolerance

Advantages: [1] the publishing client does not block until the replication is completed, [2] a temporary full-replication methodology exploited to improve the responsiveness, [3] permanent-copies remain as backup facility to meet the fault-tolerance requirement

2222

Storage: Dynamic replication Dynamic replication methodology exploited for creating server-

initiated (temporary) copies of a context

• Use of pub-sub system for server-initiated replication

• replication decision belongs to the server (autonomous)

• we keep the popularity (# of access requests) record for each copy of a context and flush it on regular time intervals

• unpopular server-initiated copies of a context are deleted

• popular copies of a context are moved in the proximity of their requestors (where the requests are originated)

• very popular copies of a context are replicated in the proximity of their requestors (where the requests are originated)

Advantages: [1] this strategy exploits locality which in turn improves the responsiveness, [2] this strategy also captures dynamism by adjusting the system to changing user demands

2323

Consistency enforcement Consistency enforcement methodologies exploited to keep copies

of a context consistent.

• Use of weak consistency model: copies of a context can be different, however, updates are propagated to replicas whenever it is needed for consistent view of information.

• Use of pub-sub system for update propagation

• Use of primary-copy approach, all updates for a specific context are initiated at a single server

• Use of synchronized timestamps (as versions) to give sequence to each published context to impose an order for concurrent write operations on the same data

• updates are pulled by a replica server from the primary-copy if the replica server realizes that it has a stale copy

• updates are pushed (broadcasted) by the primary-copy if it realizes that there exist a server that has not yet been updated

2424

Consistency enforcement - II Advantage: this strategy employs non-blocking primary-copy

approach, thus the publisher does not block until an update operation is completed that in turn improves responsiveness

Practical Problems: [1] with this strategy, one cannot update a data item more frequently than one operation per 30 milliseconds, which the NaradaBrokering NTP-protocol based synchronized timestamp accuracy. [2] with this strategy, a client cannot make sure if the update operation is carried out correctly.

Approach: 1 update operation per 30 millisecond is acceptable update rate considering our application use domains. As the performance is a requirement, we favor solutions that do not require blocking client applications.

2525

Prototype Evaluation We evaluated the prototype implementation for three

distinct aspects of distributed systems: Performance

baseline performance effect of the network latency on the baseline performance

Scalability performance degradation of the system under increasing

message sizes or message rates scalability gain both in numbers and in performance when

moving from a centralized system to a distributed system under the same workload.

Fault-tolerance the empirical cost of the fault-tolerance in terms of

execution time of standard operations on a tight cluster or on a network with significant network distances

2626

TESTBED: Cluster node configuration

Processor Intel® Xeon™ CPU (2.40GHz)

RAM 2GB total

Network Bandwidth900 Mbits/sec.[1] (among the cluster nodes)

OS GNU/Linux (kernel release 2.4.22)

Java VersionJava 2 platform, Standard Edition (1.4.2-beta-b19)

SOAP Engine Axis 2 (in Tomcat 5.5.8)

Machine Configurations

2727

Test-4. extended UDDI inquiry/publication

WS

DL

single threaded W

SD

L

extended UDDI Client

1 user/1000 transactions

Extended UDDI Server

Extended UDDIServer Engine

Test-1. Dummy Server

WS

DL

single threaded W

SD

L

Client


Dummy Server

DummyServer

Test-2. Hybrid-WSContext inquiry/publication without database access

WS

DL

single threaded W

SD

L

WS-Context Client



PublishingQueryingModule

JDBC Handler

Expeditor

Test -3. Hybrid-WSContext inquiry/publication with database access

WS

DL

single threaded W

SD

L

WS-Context Client




JDBC Handler

Expeditor

RESPONSIVENESS EXPERIMENT

2828

If query can be satisfied by Javaspaces cache, the query can be satisfied in < 1ms plus the few milliseconds of Web service overhead

comparable performance for standard operations with the existing metadata management services.

Round Trip Time Chart for Inquiry Requests

5

7

9

11

13

15

17

19

1 2 3 4 5

aver

age

resp

on

se t

ime

(mse

c) p

er r

equ

est

Test-1: Dummy service

Test-2: WS-Context inquirywith memory access

Test-3: WS-Context inquirywith dabase access

Test-4: UDDI inquiry

Metadata Services

Avg. latency for inquiries

JUDDI 40 ms

UDDI-MT 20.37 ms

JWSD 18.99 ms

Test2 - Test1 is JavaSpaces overhead

2929

TEST-1 - Hybrid-WSContext inquiry/publication with increasing message sizes

TEST-2 - Hybrid-WSContext inquiry/publication with increasing message rates (# of messages per

second)

single threaded W

SD

L

WS-Context Client


WS

DL

Hybrid FTHPIS-WSContext Service


JDBC Handler

Expeditor

HTTP(S)

WS

DLThread

Pool

WS

DLThread

Pool

WS

DL



JDBC Handler

Expeditor

5 Client distributed to cluster nodes 1 to 5, with each running

1 to 15 threadsSCALABILITY TEST-1

3030

0

5

10

15

20

25

30

0.1 1.0 10.0 100.0

context payload size (KB)

av

g r

ou

nd

tri

p t

ime

(m

illis

ec

on

ds

)

Tinquiry=T(RTT)

Tpublication=T(RTT)

The results indicate that the system performs well for small-size context payloads.

The results also indicate that the cost of inquiry and publication operations remains the same, as the context’s payload size increases from 100Bytes up to 10KBytes.

Stdev=1.42 Stdev=2.68 Stdev=3.09

Stdev=11.03

Stdev=11.54

Stdev=8.27 Stdev=6.95 Stdev=6.72

Stdev=10.07

Stdev=13.01

3131

The system can scale up to 940 simultaneous querying clients and 222 simultaneous publishing clients where each client sending one query per second, for small size context payloads with 30 milliseconds backup interval time for fault tolerance.

Multi-core hosts will improve performance dramatically.

0

10

20

30

40

50

60

70

80

90

0 100 200 300 400 500 600 700 800 900 1000

message rate (message/per second)

avg

ro

un

d t

rip

tim

e(m

s)

inquiry message rate

publication message rate

Stdev=10.31

Stdev=39.49Stdev=53

Stdev=0.65 Stdev=0.97Stdev=0.91

Stdev=33.52

3232

HTTP(S)

WS

DLThread

Pool

WS

DLThread

Pool

5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads firing messages to randomly selected servers.

We investigate scalability when moving from a centralized server to a distributed one under heavy workloads.

Numbered rectangle shapes correspond to an N-node FTHPIS system with various Publish-Subscribe topologies (this does NOT affect performance)

5 different FTHPIS system tested when N range from 1 to 5 under the same workload.

At each testing case, same volume of data is evenly distributed among the nodes.

node-1

node-5

node-1

node-5

node-4

node-3

node-2

node-1

node-5

node-3

node-1

node-5

node-3

node-2

2 3 4 5

node-5

1

SCALABILITY TEST-2

3333

The scalability of metadata store can be increased when moving from a centralized service to a distributed system.

900

950

1000

1050

1100

1150

1200

1250

1300

1 2 3 4 5

number of nodes

me

ss

ag

e r

ate

(m

sg

/se

co

nd

)

Hybrid WS-Context inquiry operation

# of nodes message ratemean ± error (ms)

Stdev(ms)

1 940 47.05 ± 0.24 33.52

2 1005 40.76 ± 0.43 38.22

3 1082 38.58 ± 0.45 34.93

4 1148 36.28 ± 0.42 32.24

5 1221 34.13 ± 0.4 30.76

Non-optimal caching algorithm as does database access BEFORE Publish-Subscribe. Reversingthis choice should lead to throughputLinear in #nodesPub-Sub overhead~ 2ms

3434

node-1

node-5

node-4

node-3

node-2

client

node-1

node-5

node-4

node-3

node-2

link-1

link-2

link-3

link-4

client

Test-1. LAN experiment. All nodes and client are located on a tightly coupled local area network.

Test-2. WAN experiment. Nodes are located on a loosely coupled wide area network.

San Diego, CAnode-4

Bloomington, IN, CGL

node-5

Austin, TXnode-3

Tallahassee, FL

node-2

Indianapolis, IN

node-1

Bloomington, IN, CGL

client

locationsnodes

15.3 mslink-3

11.3 mslink-2

0.83 mslink-1

31.4 mslink-4

latencylinks

FAULT-TOLERANCE TEST

3535

0

2

4

6

8

10

12

14

16

18

1 2 3 4 5

number of replicas

Tim

e (m

sec)

Test1 - LAN testing case -publication

Test2 - WAN testing case -publication

Test3 - Inquiry operation (requestgranted locally with memoryaccess)

Test4 - Inquiry operation (requestgranted locally with databaseaccess)

FAULT-TOLERANCE TEST RESULTS

Fault-tolerance ?? vs. Performance??.

The lower the level of fault-tolerance, the higher the performance.

High degree of replication could be succeeded (by utilizing an asynchronous communication model) without increasing the cost of fault-tolerance.

3636

Summary of Contributions specification on managing all service metadata

• a method to achieve uniform programming interface to both interaction-

independent and session-related metadata. This method also introduces a data

model for storing session-related metadata

specification on managing interaction-independent service

metadata

• a method to achieve a Geographical Information Systems compatible, domain-

independent and metadata-oriented management of interaction-independent

service metadata

fault tolerant and high performance information service

• a method to achieve management of dynamic metadata and Context in subgrids

- dynamically assembled, collaborating, modest number of services - put

together to perform a particular task

3737

Future work transaction scheduling

• Investigate how to minimize the time required to complete transactions on two diff. metadata systems with diff. time constraints

evaluation of dynamic replication• Carry out simulations for evaluation of dynamic replication

optimal caching methodologies• Implement and test more optimal caching methodologies

smoothening the impacts of backups on performance• Investigate how to minimize the impact of the time spent (high

peeks) for backups on average transaction response time.

3838

Questions?

3939

Appendix

4040

Summary of machine configurations

Location Processor RAM OS Java Version

gf6.ucs.indiana.edu

Bloomington, IN, USA

Intel® Xeon™ CPU (2.40GHz)

2GB GNU/Linux (kernel release 2.4.22)

Java 2, STE, (1.4.2-beta-b19)

complexity.ucs.indiana.edu

Indianapolis, IN, USA

Sun-Fire-880, sun4u sparc SUNW

16GB SunOS 5.9 Java HotSpot(TM) 64-Bit Server VM(1.4.2-01)

lonestar.tacc.utexas.edu

Austing, TX, USA

Intel(R) Xeon(TM) CPU 3.20GHz



tg-login.sdsc.teragrid.org

San Diego, CA, USA

GenuineIntel IA-64, Itanium 2, 4 processors

8GB GNU/Linux Java 2, STE, (1.4.2-beta-b19)

vlab2.scs.fsu.edu

Tallahase, FL, USA

Dual Core AMD Opteron(tm) Processor 270



FAULT-TOLERANCE EXPERIMENT TEST BED

4141

<?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://www.w3..."> <soap:Header encodingStyle=“URL" mustUnderstand="true"> <context xmlns=“ctxt schema“ timeout="100"> <context-id>http..</context-id> <context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service> <p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header>

SOAP header

for Context

The Pattern Informatics GIS-SOA based workflow application

5,6: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id

7,8,9: HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result

10: HPSearch writes the URI of the of the output file into Context

11: WMS polls the information from Context Service

12: WMS retrieves the generated output file by workflow script and generates a map

<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service>

<content> HPSearch associated additional data generated during execution of workflow. </content>

</context>

service associated

<context xsd:type="ContextType"timeout=“100"><context-service>http://.../WMS</ context-service>

<activity-list mustUnderstand="true" mustPropagate="true">

<service>http://.../WMS</service>

<service>http://.../HPSearch</service>

</activity-list>

</context>

session

<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service><parent-context>http://../abcdef:012345<parent-context/><content> profile information related WMS </content>

</context>

user profile

<context xsd:type="ContextType"timeout=“100"> <context-id>http://../abcdef:012345<context-id/>

<context-service>http://.../HPSearch</ context-service>

<content>http://danube.ucs.indiana.edu:8080\x.xml</content>

</context>shared state

<context xsd:type="ContextType"timeout=“100"><context-service>http://.../HPSearch</ context-service><parent-context>http://../abcdef:012345<parent-context/><content> shared data for HPSearch activity </content>

<activity-list mustUnderstand="true" mustPropagate="true">

<service>http://.../DataFilter1</service>

<service>http://.../PICode</service>

<service>http://.../DataFilter2</service>

</activity-list>

</context>

activity

3WMS

WFS

http://..../..../..txt

HP Search

Data Filter

PI Code

Data Filterhttp://..../..../x.gml

Context Information Service

4

7,8,9

10

6

5,11

WMS Client

Extended UDDI

0

1

2

Dynamic Metadata Examples for a GIS Workflow

1 Managing Dynamic Metadata and Context Mehmet S. Aktas Computer Science, Informatics, Pervasive...

Documents

Transcript of 1 Managing Dynamic Metadata and Context Mehmet S. Aktas Computer Science, Informatics, Pervasive...