Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented...

36
Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil [email protected] http://www.hpsearch.org

Transcript of Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented...

Page 1: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting

Thesis ProposalHarshawardhan Gadgil

[email protected]

http://www.hpsearch.org

Page 2: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Outline

Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Page 3: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Motivation Critical Infrastructure systems connect

disparate data sources, high-performance computing applications and visualization services for real-time data processing.

Real-time data processing Results required in real-time. Data available in

streams. Requires pre-processing (e.g. filtering data to remove unwanted parts).

Scalability Potentially large number of data sources (Static,

dynamic) or data processing elements (services) Unpredictable behavior

Fault-tolerance a key factor. E.g. Incorporate new data sources or processing units on the fly

Page 4: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Motivation (contd.)

System Management Increasing complexity of application

implies more metadata. Proper management required to ensure

smooth functioning of the system. Require easy access to manage system

characteristics.

Page 5: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

MotivationStreaming data Processing

Critical Infrastructure systems (Scientific applications)

Real-time streaming sources exist E.g. sensors, satellite stations

OR Static data sources (databases containing previously warehoused observations)

Data filtering / transformation essential in most cases for converting data to proper format for processing application

Real-time processing required. Crucial for critical infrastructure applications

Audio/video applications.

Real-time sourcesE.g. Collaborative sessions

ORStatic data source (stored A/V files)

Pre-processing required to modify A/V characteristic

Format (encoding) / bit rate (quality) etc…

Real-time processing crucial for collaborative environments

Page 6: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Outline

Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Page 7: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Literature Survey Services (Web / Grid) Scripting Languages

Benefits Possible problems

Handling data flow in applications File-based vs. Streaming

Workflow Systems Enable gluing High performance components GUI – based building and programming flavor

Component based architectures Messaging systems (for High throughput data

transfer) System Management

Page 8: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Service

“Service is a logical manifestation of a logical /physical resource (DB, programs, devices, humans etc) and/or some application logic exposed to network”

- Web Service Grids: An Evolutionary Approach (2004)

Web Services Simple mechanism for distributed computing Language independent, firewall friendly

Grid Services Are essentially Web Services Transient – (can be created, destroyed, or die

naturally) State – Maintained between calls to the Web Service

Page 9: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Scripting Languages Benefits

Enables Rapid prototyping (less code size and development time)

Less effort to Perform complex tasks Interface with OS (hosting environment) Glue code to tie programs

Usually portable Primarily for Plugging existing components together However, some disadvantages too

Weak typing Less structure, difficult to maintain

Some examples Rhino – Java script for JAVA Perl, VBScript, (P/J)ython

Scripting vs GUI builders GUI Builders – Ease of involvement of novice design engineer Scripting – Provides more flexibility thru direct access

Page 10: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Scripting EnvironmentsHosting Services

OGSI:Lite & WSRF:Lite Based on Perl Rapidly deploy grid services

Matlab / Jython from GEODISE GEODISE – Suite of CAD integrated with

distributed grid-enabled computing, data, analysis and knowledge resources

Uses Matlab to provide programatic access to GEODISE functions along with an existing suite of Matlab tools

Jython used to provide a hosting environment using Java CoG kit.

Page 11: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Data flow in applications

Real-time processing required. Typically data transfer involves temporary

storing of data. This data may be transferred using files (E.g. Grid FTP). Every component of the chain processes data

from input file, writes processed data to output file.

Time and Space critical in real-time applications hence file-based transfer is undesirable for real-time applications.

Tools to automate data transfer and invoke applications (E.g. Grid Ant, Karajan)

Page 12: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Workflow Architectures Triana – Graphical PSE to compose scientific

applications Composed of one or more Triana engines. Distributed version Data transfer takes place using JXTA pipes.

Taverna Can interact with arbitrary services. Plugins to mediate / operate the service in each case Uses XScufl (derived from WSFL) workflow language.

Kepler Java packages for designing and execution. Has a graphical interface for composing complex workflows Can wrap existing code written in different languages. For

e.g. Perl script or Matlab script

Page 13: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Component Architectures

XCAT @ IU-Extreme Connects components (Provides and Uses

ports) Jython based scripting to do application

management tasks (create application, set properties, invoke application)

Data transfer by GridFTP between components, Globus Reliable File Transfer (fault tolerance).

Many other systems Focus mainly on invocation of services as in a

Workflow

Page 14: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Messaging systems JXTA – P2P middleware, JMS for communication Pastry

Fault tolerant P2P middleware Based on Distributed Hash tables No real-time routing possible

NaradaBrokering @ IU – http://www.naradabrokering.org Event- brokering system designed to run on a large

network of co-operating brokers. Implements high-performance protocols (message transit

time < 1 ms per broker) Order-preserving optimized message transport Interface with reliable storage for persistent events Fault tolerant data transport Support for different underlying transport implementations

such as TCP, UDP, Multicast, SSL, RTP

Page 15: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

System Management

Increasing complexity of systems implies increasing amount of metadata to be managed

Provide access to System and management of System metadata

- WS - Management

E.g. Performance metrics, logs, service metadata

Require ability to query system data and take actions affecting the characteristics of the system.

For e.g. Perl provides hooks to query system data

Page 16: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Outline

Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Page 17: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Research Issues Support for streaming data processing.

Data transfer and processing in real-time Data transfer to be carried on between the end-

points (sender and recipient) without the flow engine mediating

- Grid Services Flow Language Design a run-time system that allows merging data

sources, data filtering and processing applications and visualization tools in a service-oriented architecture Assume all components available as Web (Grid)

services. Scalability an issue – Addition of data sources or

processing applications (Services) should not degrade the system performance

Fault-tolerance – Services and data sources may be lost. Allow system to detect faults and discover and incorporate new components.

Page 18: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Research Issues System Management Interface - Allow access

to system and manipulate the characteristics of system by querying system metadata Create Virtual topology for application

deployment Query performance metrics to design policies to

change routing substrate characteristics (E.g. Add new brokers or links between existing brokers to aid efficient routing)

Discover Services / brokers / topics of interest. To dynamically rewire components with

data streams. Replay events

Useful for achieving recovery after failure

Page 19: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Outline

Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Page 20: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

HPSearch Binds URI to a scripting language

We use Mozilla Rhino (A Javascript implementation, Refer: http://www.mozilla.org/rhino), but the principles may be applied to any other scripting language

Every Resource may be identified by a URI and HPSearch allows us to manipulate the resource using the URI.

For e.g. Read from a web address and write it to a local file

x = “http://trex.ucs.indiana.edu/data.txt”;y = “file:///u/hgadgil/data.txt”;Resource r = new Resource(“Copier”);r.port[0].subscribeFrom(x); /* read from */ r.port[0].publishTo(y); /* write to */

f = new Flow();f.addStartActivities(r);f.start(“1”);

Adding support for WS-Addressing construct, under investigation

Page 21: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

HPSearch (contd.) Currently provide bindings for the following

file:// socket://ip:port http://, ftp:// topic:// jdbc:

Host-objects to do specific tasks WSDL – invoke web-services using SOAP PerfMetrics – Bind NaradaBrokering performance metrics.

Store published metrics and allow querying Resource – Every data source / filter / sink is a resource. Flow – To create a data flow between resources. Useful

for creating data flows For more information, visit

http://www.hpsearch.org

Page 22: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Architecture Consists of

SHELL Front end to scripting.

TASK_SCHEDULER (FLOW_ENGINE) Distributes tasks among co-operating engines for load-

balancing purposes. WSPROXY -

An AXIS web service wraps an actual service. The behavior of the service can be controlled by making simple WS calls to this proxy.

Can be controlled by any Workflow Engine WSProxy handles streaming data communication on behalf of the

service. Service only sees I/P and O/P streams. These could be files or a

remote data stream or even a file transferred via HTTP / FTP or results from a database query

Can be deployed in standard Web Service containers (such as Tomcat)

Page 23: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Architecture WSProxy - Interfaces

Runnable More control over execution (start, suspend, resume,

stop…) Basic idea (read block of data, process it, write it out) Ideal for designing quick filtering applications that

process data in streams. Wrapped

Wrap an existing service (Executables [*.exe], Matlab scripts, shell / Perl scripts etc…)

Less control, can only start, stop Ideal for wrapping existing programs / services to

expose as a pluggable component / web service

Page 24: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

HPSearchArchitecture Overview

Request Handler

Java script Shell

Task SchedulerFlow Handler

Web Service EP

Other Objects

HPSearch Kernel

URIHandler

DBHandler

WSDLHandler

WSProxyHandler

Request Handler

HPSearch Kernel

HPSearch Kernel

Broker Network

. . .

DataBase

Web Service

FilesSocketsTopics

WSProxy

Service

WSProxy

Service

WSProxy

Service

Page 25: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

So what is the overhead ?Partial results as of now

Taken on 1.6 GHz Pentium 4 machine w/ 256 MB RAM running Java 1.4.1_02, NB version 0.98 rc2, Rhino 1.5R3

Shell Init: 2085 mSec (average) Results from RDAHMM Script (26 lines, small script)

takes about 15 mSec (average per line) to execute Task distribution (2 engine, 4 tasks) 3897.645 mSec WSProxy (Init – depends on number of streams to

initialize) 700 – 2000 mSec (approximate value using System.currentTimeMillis).

Page 26: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Outline

Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Page 27: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Contribution of this Thesis Stream and Service Management - Program data-flows

Incorporate static and dynamic data sources WSProxy ensures that data flows directly between

components (Services) without the HPSearch engine mediating. Useful for streaming large amounts of data without clouding the controller.

Scalable ? We use NB as our messaging substrate which can handle large

number of clients All components (data sources, data processing and

visualization applications) are clients. HPSearch manages streams and connects and steers components.

Fault – tolerant ? Data source, data filter (processing application) failure

possible. HPSearch can use the discovery service to invoke new services

(in lieu of failed services) and reconnect components via streams to continue data flow

Page 28: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Contribution of this Thesis(contd.)

System Management - Scripting admin tasks Creating network (virtual broker network) topology Querying Performance metrics Topic / Broker discovery

Rapid deployment of applications Deploy Network topology Set Application properties Deploy Application

In short: Provide alternative programmatic (scripting) access

to remote services / resources

Page 29: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Milestones Implement WS front-end to shell

Remotely submit a script for execution, possibly through a portal WSProxy / Handler: Fault tolerance to handle situations

when The machine hosting the WSProxy dies The broker which is used by the proxy dies The HPSearch Engine dies

Design Application Interface Allow users to create applications using this interface Set Application properties, Allow modification of

application properties at runtime using scripting NB Admin objects

NaradaBroker, PerfMetrics, NBDiscovery, ReplayService

Page 30: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Milestones (contd.)

Design stream negotiation module to allow WSProxy to negotiate stream characteristics Select best possible transport and other

QoS elements for data transfer between two services (for a particular stream)

Applications - To demonstrate the use Audio / Video mixer application Multiple data sources and data filtering

applications joined in a chain.

Page 31: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Outline

Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Page 32: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Applications Streaming Data Filtering

GPS Data

Data FilterFilters the input data to get only the estimate and error

values

RDAHMMAnalyze the data

Matlab PlottingScript

Graph

HPSearchKernel - TSE

Kernel - TSE

Kernel - TSE

(Distributed) Services

Sensor Source

Page 33: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

trex.ucs.indiana.edu

school.cs.indiana.edu

Applications Creating Virtual Broker Network for deploying applicationsb = new NaradaBroker("school.cs.indiana.edu");b.create(""); /* OR b.create("file:///u/hgadgil/alternateConfig.conf"); */b.connectTo("156.56.104.170", "5045", "t", "");b.requestNodeAddress("156-56-104-170.bl-dhcp.indiana.edu:5045", "0");

c = new NaradaBroker("trex.ucs.indiana.edu");c.create("");c.connectTo("156.56.104.170", "5045", "t", "");c.requestNodeAddress("tcp://156-56-104-170.bl-dhcp.indiana.edu:5045", "0");

156.56.104.170school.cs.indiana.edu

trex.cs.indiana.eduHPSearchShell

Page 34: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Applications Invoking Arbitrary Web Services

approved = false;userID = "111-22-3333";if(loanAmt < 10000)

approved = true;else { wsRA = new

WSDL("http://www.riskAssessor.com/services/RiskAssessor"); risk = wsRA.invoke("assessRisk", userID, loanAmt); if(risk > 50)

approved = false; else

approved = true;}

Print "Loan Approved: " + approved;

risk = WS_riskAssessor(userID, loanAmt)

approved = true

Print result

loanAmt < 10000

approved = trueapproved = false

risk > 50

Page 35: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Outline

Motivation Literature Survey Research Issues HPSearch Architecture Contributions and Milestones Applications Summary

Page 36: Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting Thesis Proposal Harshawardhan Gadgil.

Summary This thesis addresses

Managing data streams (Dynamic and static) Enabling connecting data sources and data

processing components (available as Web Services) for processing data in real-time for critical infrastructure applications

Develop a general purpose scripting architecture (like Perl) for a multitude of tasks

Goal is to create an architecture that is Pluggable / Extensible Manageable - Programmable Similar to the UNIX Pipe-Filter Architecture, but

implemented on a Distributed scale