Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

53
Grid Monitoring Futures Grid Monitoring Futures with Globus with Globus Jennifer M. Schopf Jennifer M. Schopf Argonne National Lab Argonne National Lab April 2003 April 2003

Transcript of Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

Page 1: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

Grid Monitoring Futures Grid Monitoring Futures with Globuswith Globus

Jennifer M. SchopfJennifer M. Schopf

Argonne National LabArgonne National Lab

April 2003April 2003

Page 2: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

2

My DefinitionsMy Definitions

Grid:– Shared resources– Coordinated problem solving– Multiple sites (multiple institutions)

Monitoring:– Discovery

> Registry service> Contains descriptions of data that is available

– Expression of data> Access to sensors, archives, etc.

Page 3: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

3

What do *I* mean by What do *I* mean by GridGrid monitoring? monitoring?

Different levels of monitoring needed:– Application specific– Node level– Cluster/site Level– Grid level

Grid level monitoring concerns data – Shared between administrative domains– For use by multiple people– (think scalability)

Page 4: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

4

Grid Monitoring Does Not Include…Grid Monitoring Does Not Include…

All the data about every node of every site Years of utilization logs to use for planning next

hardware purchase Low-level application progress details for a single

user Application debugging data (except perhaps

notification of a failure of a heartbeat) Point-to-point sharing of all data over all sites

Page 5: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

5

Overview of This TalkOverview of This Talk

Evaluation of information infrastructures– Globus Toolkit MDS2, R-GMA, Hawkeye

– Insights into performance issues

– (publication at HPDC 2003) What monitoring and discovery could be Next-generation information architecture

– Open Grid Services Architecture mechanisms

– Integrated monitoring & discovery arch for GT3

Page 6: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

6

Performance and the GridPerformance and the Grid

It’s not enough to use the Grid, it has to perform – otherwise, why bother?

First prototypes rarely consider performance (tradeoff with dev’t time)– MDS1–centralized LDAP

– MDS2–decentralized LDAP

– MDS3–decentralized Grid service Often performance is simply not known

Page 7: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

7

Globus Monitoring andGlobus Monitoring andDiscovery Service (MDS2)Discovery Service (MDS2)

Part of Globus Toolkit, compatible with other elements

Used most often for resource selection– aid user/agent to identify host(s) on which to run an

application Standard mechanism for publishing and discovery Decentralized, hierarchical structure Soft-state protocols Caching Grid Security Infrastructure credentials

Page 8: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

8

MDS2 ArchitectureMDS2 Architecture

GI IS

Cache contains info fromA and B

GI IS requests infofrom GRIS services

Client 1 Client 2

Client 2 uses GI IS for searching collective information

GRIS register with GI IS

Resource A

GRIS

IPIP Resource B

GRIS

IPIP

IP

Client 1 searchesthe GRIS directly

GI IS

Cache contains info fromA and B

GI IS requests infofrom GRIS services

Client 1 Client 2

Client 2 uses GI IS for searching collective information

GRIS register with GI IS

Resource A

GRIS

IPIPResource A

GRIS

IPIP Resource B

GRIS

IPIP

IP

Resource B

GRIS

IPIP

IP

Client 1 searchesthe GRIS directly

Page 9: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

9

Relational Grid Monitoring Relational Grid Monitoring Architecture (R-GMA)Architecture (R-GMA)

Implementation of the Grid Monitoring Architecture (GMA) defined within the Global Grid Forum (GGF)

Three components

– Consumers

– Producers

– Registry

GMA as defined currently does not specify the protocols or the underlying data model to be used.

Page 10: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

11

R-GMAR-GMA

Monitoring used in the EU Datagrid Project

– Steve Fisher, RAL, and James Magowan, IBM-UK

Based on the relational data model Used Java Servlet technologies Focus on notification of events User can subscribe to a flow of data with specific

properties directly from a data source

Page 11: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

12

R-GMA ArchitectureR-GMA Architecture

Page 12: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

13

HawkeyeHawkeye

Developed by Condor Group Focus – automatic problem detection Underlying infrastructure builds on the Condor and

ClassAd technologies

– Condor ClassAd Language to identify resources in a pool

– ClassAd Matchmaking to execute jobs based on attribute values of resources to identify problems in a pool

Page 13: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

14

Hawkeye ArchitectureHawkeye Architecture

Page 14: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

15

Comparing Information SystemsComparing Information Systems 

 

MDS2 R-GMA Hawkeye

Info

Collector

Information Provider

Producer Module

Info

Server

GRIS Producer Servlet

Agent

Aggregate Info Server

GIIS (Archiver)

Manager

Directory Server

GIIS Registry Manager

Page 15: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

16

Some Architecture ConsiderationsSome Architecture Considerations

Similar functional components– Grid-wide for MDS2, R-GMA; Pool for Hawkeye– Global schema

Different use cases will lead to different strengths– GIIS for decentralized registry; no standard protocol

to distribute multiple R-GMA registries– R-GMA meant for streaming data – currently used for

NW data; Hawkeye and MDS2 for single queries Push vs Pull

– MDS2 is PULL only– R-GMA allows push and pull– Hawkeye allows triggers – push model

Page 16: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

17

ExperimentsExperiments

How many users can query an information server at a time?

How many users can query a directory server?

How does an information server scale with the amount of data in it?

How does an aggregator scale with the number of information servers registered to it?

Page 17: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

18

TestbedTestbed

Lucky cluster at Argonne– 7 nodes, each has two 1133 MHz Intel PIII CPUs (with

a 512 KB cache) and 512 MB main memory Users simulated at the UC nodes

– 20 P3 Linux nodes, mostly 1.1 GHz– R-GMA has an issue with the shared file system, so

we also simulated users on Lucky nodes All figures are 10 minute averages Queries happening with a one second wait

between each query (think synchronous send with a 1 second wait)

Page 18: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

19

MetricsMetrics

Throughput– Number of requests processed per second

Response time– Average amount of time (in sec) to handle a request

Load– percentage of CPU cycles spent in user mode and

system mode, recorded by Ganglia– High when running small number compute intensive aps

Load1– average number of processes in the ready queue

waiting to run, 1 minute average, from Ganglia– High when large number of aps blocking on I/O

Page 19: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

20

Performance of Information Performance of Information Servers vs. Number of UsersServers vs. Number of Users

0

20

40

60

80

100

120

140

160

0 100 200 300 400 500 600No. of Users

Thro

ughp

ut(q

uerie

s/se

c)

MDS GRIS (cache) MDS GRIS (nocache)Hawkeye Agent R-GMA ProducerServlet(lucky)R-GMA ProducerServlet(UC)

Page 20: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

21

Experiment 1 SummaryExperiment 1 Summary

Caching can significantly improve performance of the information server– Particularly desirable if one wishes the server to scale well

with an increasing number of users When setting up an information server, care should be

taken to make sure the server is on a well-connected machine– Network behavior plays a larger role than expected

– If this is not an option, thought should be given to duplicating the server if more than 200 users are expected to query it

Page 21: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

24

Information Service Throughput Information Service Throughput vs. Num. of Information Collectorsvs. Num. of Information Collectors

0

1

2

3

4

5

6

7

8

0 20 40 60 80 100# of information Collectors

Thro

ugh

put(

que

ries

/sec

)

MDS GRIS(cache) MDS GRIS(no cache)Hawkeye Agent R-GMA ProducerServlet

Page 22: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

25

Experiment 3 SummaryExperiment 3 Summary

Too many information collectors is a performance bottleneck

Caching data helps Alternatively, register to more instances of

information servers with each handling a subset of the collectors

Page 23: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

26

Overall ResultsOverall Results

Performance can be a matter of deployment – Effect of background load

– Effect of network bandwidth Performance can be affected by underlying

infrastructure– LDAP/Java strengths and weaknesses

Performance can be improved using standard techniques– Caching; multi-threading; etc.

Page 24: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

27

So what could monitoring be?So what could monitoring be?

Basic functionality– Push and pull (subscription and notification)– Aggregation and Caching

More information available More higher-level services

– Triggers like Hawkeye– Viz of archive data like Ganglia

Plug and Play– Well defined protocols, interfaces and schemas

Performance considerations– Easy searching– Keep load off of clients

Page 25: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

28

TopicsTopics

Evaluation of information infrastructures– Globus Toolkit MDS2, RGMA, Hawkeye

– Throughput, response time, load

– Insights into performance issues What monitoring and discovery could be Next-generation information architecture

– Open Grid Services Architecture mechanisms

– Integrated monitoring & discovery arch for GT3

Page 26: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

29

Open Grid Services Architecture Open Grid Services Architecture (OGSA)(OGSA)

Defines standard interfaces and behaviors for distributed system integration, especially:– Standard XML-based service information

model

– Standard interfaces for push and pull mode access to service data

> Notification and subscription

Page 27: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

30

Key OGSI concept - serviceDataKey OGSI concept - serviceData

Every service has it’s own service data– OGSA has common mechanism to expose a

service instance’s state data to service requestors for query, update and change notification

– Monitoring data is “baked right in”

– Service-level concept, not host-level concept

Page 28: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

31

serviceDataserviceData

Every Grid Service can expose internal state as serviceData elements– An XML element of arbitrary complexity

Each service has a serviceData set– The collection of serviceData Elements

(SDEs) Example: state of a host is exposed as an

SDE by GRAM. Similar to MDS2 GRIS functionality, but in

each service (rather than once per host)

Page 29: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

32

Example:Example:Reliable File Transfer ServiceReliable File Transfer Service

Performance

Policy

Faults

servicedataelements

Pending

FileTransfer

InternalState

GridService

Notf’nSource

Policy

interfacesQuery &/orsubscribe

to service data

FaultMonitor

Perf.Monitor

Client Client Client

Request and manage file transfer operations

Data transfer operations

Page 30: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

33

MDS3MDS3 Monitoring and Discovery System Monitoring and Discovery System

Consists of a various components– Core functionality

– Information providers

– Higher level services

– Clients

Page 31: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

34

Core FunctionalityCore Functionality

Xpath support – XPath is a language that describes a way to locate

and process items in XML docs by using an addressing syntax based on a path through the document's logical structure or hierarchy

Xindice support – native XML database Registry support

Page 32: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

35

Schema IssuesSchema Issues

Need to keep track of service data schema– Avoid conflicts

– Find the data easier Should really have unified naming

approach

All of the tool are schema-agnostic, but interoperability needs a well-understood common language

Page 33: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

36

MDS3 Information Providers in MDS3 Information Providers in June ReleaseJune Release

All the data currently in core MDS2 Full data in the GLUE schema for compute

elements (CE)– Ganglia information provider for cluster data will also

be available from Ganglia folks (with luck) Service data from RFT, RLS, GRAM GT2 to GT3 work

– GridFTP server data– Software version and path data – Documentation for translating your GT2 information

provider to a GT3 information provider

Page 34: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

37

MDS3 Higher Level ProductsMDS3 Higher Level Products

Higher-level services can perform actions on service data collected from other services

Part of this functionality can be provided by a set of building blocks provided– Provider interface: GRIS-style API for writing

information providers

– Service Data Aggregator: set up subscriptions to data for other services, and publish it as a single data stream

– Hierarchy Builder: allow for hierarchy of aggregators

Page 35: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

38

MDS3 Index ServerMDS3 Index Server

Simplest higher-level service is the caching index service

Much like the GIIS in MDS2 Will have configurablity like an GIIS

hierarchy Will also have PHP-style scripts, much as

available today

Page 36: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

39

Page 37: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

40

Clients Clients currently in GT3currently in GT3

findServiceData command line client– Same functionality of grid-info-search

C bindings – Core C bindings provide findServiceData C

function

– findServiceData command line client gives an example of using it to parse out information (in this case, registry contents)

Page 38: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

41

Service Data BrowserService Data Browser

GUI client to display service data from any service

Extensible for data-specific visualization A version was released with GT3 alpha 

http://www.globus.org/ogsa/releases/ alpha/docs/infosvcs/sdbquickstart.html

Page 39: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

42

Comparing Information SystemsComparing Information Systems 

 

MDS2 R-GMA Hawkeye MDS3

Info

Collector

Info. Provider Producer Module Service (SDEs)

Info

Server

GRIS Producer Servlet

Agent Service (service data set)

Agg. Info Server

GIIS (Archiver) Manager Index Service

Directory Server

GIIS Registry Manager Index Service

Page 40: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

43

Is this enough?Is this enough?

No! Many places where additional help

developing MDS3 is needed

Page 41: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

44

We Need More Basic InformationWe Need More Basic Information

Interfaces to other sources of data– GPT data– Other monitoring systems– Others?

Service data from other components– Every service has service data– OGSA-DAI

Will need to interface on schema

Page 42: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

45

We Will Need More We Will Need More GUIs and ClientsGUIs and Clients

Additional GUI visualizers may be implemented to display service data specific to a particular port type (as part of service data browser)

Additional Client interfaces possibly– Integration into current portals, brokers

Page 43: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

46

We Need MoreWe Need MoreHigher Level ServicesHigher Level Services

We have a couple planned– Archiving service

– Trigger template

Page 44: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

47

Post-3.0 release: Archiving ServicePost-3.0 release: Archiving Service

Will allow subscription to service data Logging in a flexible way Well defined interfaces for mining Open questions:

– Best way to store time-series of arbitrary XML?

– Best way to query this archive?– Link to OGSA-DAI?– Link to other archivers?

Page 45: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

48

Post-3.0 release: Trigger TemplatePost-3.0 release: Trigger Template

Will provide a template to allow subscription to data, reasoning about that data, and a course of action to take place

Essentially, a gateway service between OGSA Notifications and some other notification framework, with filtering of notifications

Example: Subscribe to disk space information, send mail to sys admin when it reached 90% full

Needed: trigger template and several small examples of common triggers, and documentation for how users could extend them or write new ones.

Page 46: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

49

Other Possible HigherOther Possible HigherLevel ServicesLevel Services

Site Validation Service Job Tracking Service Interfacing to Netlogger?

Page 47: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

50

We Need SecurityWe Need Security

Need I say more?

Page 48: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

51

SummarySummary

Current monitoring systems– Insights into performance issues

What we really want for monitoring and discovery is a combination of all the current systems

Next-generation information architecture– Open Grid Services Architecture

mechanisms– MDS3 plans

Additional work needed!

Page 49: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

52

ThanksThanks

Testbed/Experiment support and comments– John Mcgee, ISI; James Magowan, IBM-UK; Alain Roy

and Nick LeRoy at University of Wisconsin, Madison;Scott Gose and Charles Bacon, ANL; Steve Fisher, RAL; Brian Tierney and Dan Gunter, LBNL.

This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under contract W-31-109-Eng-38. This work also supported by DOESG SciDAC Grant, iVDGL from NSF, and others.

Page 50: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

53

Additional InformationAdditional Information

MDS3 technology coordinators – Ben Clifford ([email protected]) – Jennifer Schopf ([email protected])

Zhang, Freschl and Schopf, “A Performance Study of Monitoring and Information Services for Distributed Systems”, to appear in HPDC 2003– http://people.cs.uchicago.edu/~hai/hpdcv25.doc

MDS-3 information– Soon at www.globus.org/mds

Page 51: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

Extra SlidesExtra Slides

Page 52: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

56

Performance of GIS Information Performance of GIS Information Servers vs. Number of UsersServers vs. Number of Users

0

20

40

60

80

100

120

140

160

0 100 200 300 400 500 600No. of Users

Thro

ughp

ut(q

uerie

s/se

c)

MDS GRIS (cache) MDS GRIS (nocache)Hawkeye Agent R-GMA ProducerServlet(lucky)R-GMA ProducerServlet(UC)

Page 53: Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.

62

Directory Server ScalabilityDirectory Server Scalability

00.5

11.5

22.5

33.5

44.5

1 10 50 100 200 300 400 500 600No. of Users

Load

1

MDS GIIS Hawkeye ManagerR-GMA Registry(lucky) R-GMA Registry(UC)