Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

55
National Partnership for Advanced Computational Infrastructure Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services Reagan W. Moore San Diego Supercomputer Center [email protected] http://www.npaci.edu/DICE

description

Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services. Reagan W. Moore San Diego Supercomputer Center [email protected] http://www.npaci.edu/DICE. Information Based Computing. Data Mining. Distributed Archives. Application. Collection Building. - PowerPoint PPT Presentation

Transcript of Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

Page 1: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Data Intensive Computing

Information Based Computing

Digital Libraries / Metacomputing Services

Reagan W. MooreSan Diego Supercomputer Center

[email protected]://www.npaci.edu/DICE

Page 2: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Distributed Archives

Application

Digital Library

Data Mining

Information Based Computing

Information Discovery

CollectionBuilding

Page 3: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Co-evolution of Technology

• Supercomputer Centers and Digital Libraries• Both support large scale processing & storage of data

• Will the supercomputer centers of the future be digital libraries?

Page 4: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Researchers

Chaitanya BaruAmarnath Gupta

Bertram LudaescherRichard Marciano

Yannis PapakonstantinouArcot Rajasekar

Wayne SchroederMichael Wan

Page 5: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Outline

• Two views of computing• Executionenvironment - metacomputing systems• Data Management environment - digital library

• Analysis for moving data to the process or the process to the data

• Data Management Environment• Information Based Computing

Page 6: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Dig

ital

Lib

rari

es

Mul

tim

edia

/ G

IS /

MV

D /

XM

L /

LD

AP

/ C

OR

BA

/ Z

39.5

0

Publication / Services Environment

Presentation Interface

Object Based Information Model

Data Management for publication

Data Resources

Parallel I/O - MPI

Constructors: turning data sets into objects

Data Resources

Data Management for execution

Metacomputing Environment

Execution Environment

Page 7: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Choice between Environments

• Should we provide services for manipulating information• Move the process to the data

• Should we provide execution environments • Move data to the process

Page 8: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Data Distribution Comparison

Data Handling Platform

Supercomputer

Execution rate r < RBandwidths linking systems are B & bOperations per bit for analysis is OOperations per bit for data transfer is o

Reduce size of data from S bytes to s bytes and analyze

Should the data reduction be done before transmission?

Data B b

Page 9: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Distributing ServicesCompare times for analyzing data with size reduction from S to s

Read Data

Reduce Data

TransmitData

Network ReceiveData

Read Data

Reduce Data

TransmitData

Network ReceiveData

S / B O S / r o s / r s / b o s / R

o S / Ro S / r S / b O S / RS / B

Data Handling Platform Supercomputer

Data Handling Platform Supercomputer

Page 10: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Comparison of Time

T(Super) = S/B + OS/r + os/r + s/b + os/R

Processing at supercomputer

Processing at archive

T(Archive) = S/B + oS/r + S/b + oS/R + OS/R

Page 11: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Optimization Parameter Selection

Have algebraic equation with eight independent variables.

T (Super) < T (Archive)

S/B + OS/r + os/r + s/b + os/R < S/B + oS/r + S/b + oS/R + OS/R

Which variable provides the simplest optimizationCriterion?

Page 12: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Scaling Parameters

Data size reduction ratio s/SExecution slow down ratio r/RProblem complexity o/OCommunication/Execution balance r/(ob)

When r/(ob) = 1, the data processing rate is the same as the data transmission rate.

Optimal designs have r/(ob) = 1

Note (r/o) is the number of bits/sec that can be processed.

Page 13: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Complexity Analysis

Moving all of the data is faster, T(Super) < T(Archive)Sufficiently complex analysis

O > o (1-s/S) [1 + r/R + r/(ob)] / (1-r/R)

Note, as the execution ratio approaches 1, the required complexity becomes infinite

Also, as the amount of data reduction goes to zero,the required complexity goes to zero.

Page 14: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Bandwidth Optimization

Moving all of the data is faster, T(Super) < T(Archive)Sufficiently fast network

b > (r /O) (1 - s/S) / [1 - r/R - (o/O) (1 + r/R) (1 - s/S)]

Note the denominator changes sign when

O < o (1 + r/R) / [(1 - r/R) (1 - s/S)]

Even with an infinitely fast network, it is better to do the processing at the archive if the complexity is too small.

Page 15: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Execution Rate Optimization

Moving all of the data is faster, T(Super) < T(Archive)Sufficiently fast supercomputer

R > r [1 + (o/O) (1 - s/S)] / [1 - (o/O) (1 - s/S) (1 + r/(ob)]

Note the denominator changes sign whenO < o (1 - s/S) [1 + r/(ob)]

Even with an infinitely fast supercomputer, it is better toprocess at the archive if the complexity is too small.

Page 16: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Data Reduction Optimization

Moving all of the data is faster, T(Super) < T(Archive)Data reduction is small enough

s > S {1 - (O/o)(1 - r/R) / [1 + r/R + r/(ob)]}

Note criteria changes sign whenO > o [1 + r/R + r/(ob)] / (1 - r/R)

When the complexity is sufficiently large, it is faster toprocess on the supercomputer even when data can be reduced to one bit.

Page 17: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Is the Future Environment a Metacomputer or a Digital Library?

• Sufficiently high complexity• Move data to processing engine

• Digital Library execution of remote services• Traditional supercomputer processing of applications

• Sufficiently low complexity• Move process to the data source

• Metacomputing execution of remote applications• Traditional digital library service

Page 18: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

The IBM Digital Library Architecture Application(DL client)

Metadata inDB2 or Oracle

Videocharger DB2 ADSM Oracle

Library Server

Text and Image indices

“Federated” search

Object Server

Distributed storage resources

(SRB)(MCAT)

Page 19: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Generalization of Digital Library• Scaling transparency

• Support for arbitrary size data sets• Support for arbitrary data type

• Location transparency• Access to remote data• Access to heterogeneous (non-uniform) storage systems• Remove restriction of local disk space size

• Name service transparency• Support for multiple views (naming conventions) for data

• Presentation transparency• Support for alternate representations of data

Page 20: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Describing Information Content

Information Level Infrastructure -Scientific Data

Infrastructure - Text

Federation Ontology Digital Library

Data Collection Schema Dublin Core

Data Set Metadata Provenance

Features XML XML

Logical type Vector bundle Mime Type

Structure MPI Datatype DTD

File Format HDF v5 Electronic record

Page 21: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

State-of-the-art Information Management: Digital Library

Infrastructure Levels Language

Data Flow Systems Data Control

Format Presentation

OntologiesSchema Definition

Schema Manipulation

Access Discovery

MetadataMetadata Definition

Metadata Manipulation

Database Handling

ArchiveCollection Layout

Storage Management

Media Storage

Page 22: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

High Performance Storage

• Provide access to tertiary storage - scale size of repository• Disk caches• Tape robots• Manage migration of data between disk and tape

• High Performance Storage System - IBM• Provides service classes • Support for parallel I/O• Support for terabyte sized data sets• Provide recoverable name space

Page 23: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

State-of-the-art Storage: HPSS

• Store Teraflops computer output• Growth - 200 TB data per year • Data access rate - 7 TB/day = 80 MB/sec• 2-week data cache - 10 TB• Scalable control platform

• 8-node SP (32 processors)

• Support digital libraries• Support for millions of data sets • Integration with database meta-data catalogs

Page 24: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

HPSS Archival Storage System

108 GB

SSA RAID

High Performance Gateway Node

High Node Disk Mover HiPPI driver

Wide Node Disk Mover HiPPI driver

54 GB

SSA RAID

108 GB

SSA RAID

108 GB

SSA RAID

54 GB

SSA RAID

108 GB

SSA RAID

108 GB

SSA RAID

Silver NodeStorage / PurgeBitfile / Migration Nameservice/PVL Log Daemon

Silver NodeTape / disk mover DCE / FTP /HIS Log Client

160 GB

SSA RAID

Silver Node Tape / disk mover DCE / FTP /HIS Log Client

830 GB

MaxStrat RAID

9490 RobotFourDrives

3490 Tape

RS6000Tape MoverPVR (9490)

HiPPISwitch

Trail-Blazer3Switch

Silver Node Tape / disk mover DCE / FTP /HIS Log Client

Silver Node Tape / disk mover DCE / FTP /HIS Log ClientSilver Node Tape / disk mover DCE / FTP /HIS Log ClientSilver Node Tape / disk mover DCE / FTP /HIS Log ClientSilver Node Tape / disk mover DCE / FTP /HIS Log Client

Magstar3590 Tape

9490 RobotEight Tape

Drives

Magstar3590 Tape

9490 RobotSeven Tape

Drives

Page 25: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

• SDSC has achieved:

• Striping required to achieve desired I/O rates

HPSS Bandwidths

Node-HPGN 90 MB/sTexas Memory Box 80 MB/sMax Strat disk 60 MB/sSSA Raid 20-30 MB/s

Page 26: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Turning Archives into Digital Libraries

• Meta-data based access to data sets• Support for application of methods (procedures) to data

sets• Support for information discovery• Support for publication of data sets

• Research issue - optimization of data distribution between database and archive

Page 27: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Database TableC4 C5C1 C2 C3

DB2/HPSS Integration

DB2

HPSS

DB2Disk

buffer

HPSSDiskcache

• Collaboration with IBM TJ Watson Research Center• Ming-Ling Lo, Sriram

Padmanabhan, Vibby Gottemukkala

• Features:• Prototype, works with DB2 UDB

(Version 5) • DB2 is able to use a HPSS file as

a tablespace container• DB2 handles DCE authentication

to HPSS• Regular as well as long (LOB)

data can be stored in HPSS• Optional disk buffer between DB2

and HPSS

Page 28: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Generalizing Digital Libraries

• SRB - Location transparency• Access to heterogeneous systems• Access to remote systems

• MCAT - Name service transparency• Extensible Schema support

• MIX - Presentation transparency• Mediation of information with XML• Support for semi-structured data

• Access scaling• MPI-I/O access to data sets using parallel I/O

Page 29: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

SRB

UniTree HPSS DB2 Illustra Unix

SRB Software Architecture

SRB APIs

User AuthenticationDataset LocationAccess ControlTypeReplicationLogging

MetadataCatalogMCAT

Application(SRB client)

Page 30: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

14 Installed SRB Sites

U Michigan

U Maryland

Washington U

UTexasU Houston

UC DavisUC BerkeleyUC Santa Barbara

UCLAUCSD

Caltech

RutgersNCSA

Montana State University

Large Archives

Page 31: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

SRB / MCAT Features• Support for Collection

hierarchy• allows grouping of hetero-

geneous data sets into a single logical collection

• hierarchical access control, with ticket mechanism

• Replication• optional replication at the time of

creation• can choose replica on read

• Proxy operations• supports proxy (remote) move

and copy operations

• Monitoring capability

• Supports storing/querying of system- and user-defined “metadata” for data sets and resources

• API for ad hoc querying of metadata

• Ability to extend schemas and define new schemas

• Ability to associate data sets with multiple metadata schemas

• Ability to relate attributes across schemas

• Implemented in Oracle and DB2

Page 32: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

MCAT Schema Integration

• Publish schema for each collection• Clusters of attributes form a table• Tables implement the schema

• Use Tokens to define semantic meaning• Associate Token with each attribute

• Use DAG to automate queries• Specify directed linkage between clusters of attributes• Tokens - Clusters - Attributes

Page 33: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

PublishingA NewSchema

Page 34: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

AddingAttributes

to theNew

Schema

Page 35: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Displaying Attributes

From SelectedSchemas

Page 36: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Security

• Integration of SDSC Encryption Authentication system (SEA) with Globus GSI• Kerberos within security domain• Globus for inter-realm authentication

• Access control lists per data set• Audit trails of usage

• Need support for third-party authentication• User A accesses data under the control of digital library B

when the data is stored at site C

Page 37: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

XMAS query

XMAS query “fragment”

MIX: Mediation of Information using XML

Mediator

Wrapper

ActiveView 1

Convert XMAS query to local query language,and data in native format to XML

SQL Database

Wrapper Wrapper

Spreadsheet HTML files

XML data

XML data

Support for “active” views

ActiveView 2

BBQ Interface BBQ Interface

Local Data Repository

Page 38: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Integration of Digital Librarywith Metacomputing Systems

• NTON OC-192 network (LLNL - Caltech - SDSC)• HPSS archive• Globus metacomputing system• SRB data handling system• MCAT extensible metadata• MIX semi-structured data mediation using XML• ICE collaboration environment• Feature extraction

Page 39: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

INFO

RM

ATIO

N S

ER

VIC

ES

Data Intensive and High-Performance Distributed Computing

Local Resource Management

Data Repositories

Resources Layer

Fault Detection

Resource Management

Generic Services Layer

Domain Specific Services Layer

Application Toolkits

Network Caching

Metadata

Communication Libs. Grid-enabled Libs Visualization

Resource Discovery Resource Brokering

End-to-End QoS

Remote Data Access

Interdomain Security

Scheduling

Page 40: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Research Activities

• Support for remote execution of data manipulation procedures• Globus - SRB integration

• Automated feature extraction• XML based tagging of features• XML query language for storing attributes into the

Intelligent Archive

• Integration with RIO - parallel I/O transport

Page 41: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Views of Software Infrastructure

• Software infrastructure supports user applications

• Reason for existence of software is to provide explicit capabilities required by applications

• What is the user perspective for building new software systems?

• Is the integration of digital library and metacomputing systems the final version?

Page 42: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Software Integration Projects• NSF

• Computational Grid - Middleware using distributed state information to support metacomputing services

• DOE• Data Visualization Corridor - collaboratively visualize multi-

terabyte sized data sets

• NASA• Information Power Grid - integrate data repositories with

applications and visualization systems

• DARPA• Quorum - provide quality of service guarantees

Page 43: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

User Requirements - Five Software Environments

• Code Development• Resources support

• Run-time• Parallel Tools and Libraries

• Distributed Run-Time • Metacomputing environment

• Interaction Environments• Collaboration, presentation

• Publication / Discovery / Retrieval• Data intensive computing environment

Page 44: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Metacomputing Environment Data Flow Perspective

Archival Storage System

Remote Data Manipulation

Data Handling System

Data Staging System

Data Caching System

Distributed Execution Environment

Object Oriented Interface

Application

Page 45: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Publication Environment Data Flow Perspective

Archival Storage System

Remote Data Manipulation

Data Handling System

Collection Management Software

Digital Library Services

Data Set Constructor

Run-time Access

Application

Page 46: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Run-time Environment Data Flow Perspective

Archival Storage System

Data Handling System

Data Caching System

Library Interoperation

Data Structures Library

Memory Tiling

Parallel I/O Library

Application

Page 47: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Interaction Environment Data Flow Perspective

Archival Storage System

Data Manipulation System

Data Caching System

Data Formatting System

Rendering System

Visualization Environment

Collaboration Environment

Application

Page 48: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Taxonomy of User Requirements

Environment Capabilities

Code Development Run-time

Distributed Run-Time / Metacomputing

Collaboratories / Interaction / Presentation

Publication / Discovery / Retrieval

Data manipulation Data caching Data subsetting Data analysisData discovery API

Common directory

Information discovery

Data naming / aggregation

location transparency

File system federation

collection federation

Data accessSmall file manipulation Parallel I/O Remote I/O Remote data access

distributed data access

Data organization Data structures Data format schemas

ArchivesVersion management

High-performance archive Large data storage

Persistent archive

Page 49: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Comparison of Environments

Environment Capabilities

Code Development Run-time

Distributed Run-Time / Metacomputing

Collaboratories / Interaction / Presentation

Publication / Discovery / Retrieval

Product sharingApplication publication

persistent objects

Visualization modules

Data collection building

Reuseable software

Math / Thread libraries

Parallel thread libraries

Distributed thread libraries

Application building

Debuggers, compilers Task graph Data flow systems

Interoperability shared dataLanguage interoperation view control

schema interoperability

Performance PerformanceResource utilization

Useability GUI

Look and feelDesktop environment

Distributed desktop

presentation architecture

digital library workspace

Reservation resource reservation

instrument reservation

disk space reservation

Queuing local queuing global queuing

Schedulingjob mix scheduling

Distributed scheduling

Page 50: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Comparison of Environments

Environment Capabilities

Code Development Run-time

Distributed Run-Time / Metacomputing

Collaboratories / Interaction / Presentation

Publication / Discovery / Retrieval

Communication software

Heterogeneous network

Dynamic controlreal-time steering teleinstrumentation

Execution job execution Load balancingDistributed execution

collaboration service remote service

Operating System Clusters

Distributed clusters

Authorization access control global access access control

Authenticationauthentication for CPU Single sign-on

authentication for data

Page 51: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

PACI Environments

Environment Capabilities

Code Development Run-time

Distributed Run-Time / Metacomputing

Collaboratories / Interaction / Presentation

Publication / Discovery / Retrieval

Data manipulation

Data caching-ADR

Data subsetting-SRB, Vis5D

Data analysis tools-Rocke

Data discovery API

Common directory structure

Objec ID-Legion, Pathname-Globus

Information discovery-MCAT, Infobus

Data naming / aggregation

location transparency-DFS

File system federation- Legion

collection federation-MCAT

Data access

Small file manipulation-Unix

Parallel I/O-MPI, PANDA Remote I/O

Remote data access-Corba/SRB

distributed data access-SRB, Infobus

Data organization

Data structures-KeLP,SDDA,CARTE Data format-HPFv5 schemas-MCAT

Archives

Version management-CVS,RCS GASS

Large data storage-UDB

Persistent archive, HPSS

Page 52: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

PACI EnvironmentsEnvironment Capabilities

Code Development Run-time

Distributed Run-Time / Metacomputing

Collaboratories / Interaction / Presentation

Publication / Discovery / Retrieval

Product sharing

Application publication-LDAP

persistent objects-Legion

Visualization modules-AVS

Data collection building-MCAT

Reuseable software

Netsolve, Symera DCOM support

Application building

Debuggers, compilers-Titanium, P compiler

Task graph, AppLeS, Treadmarks-distributed shared memory

Data flow systems-AVS

Interoperabilityshared data-HPSS

Language interoperation-Metachaos view control-ICE

schema interoperability-MCAT

PerformancePerformance-Pablo, Paradyne

Resource utilization-NWS

Useability Documentation GUI-Pancake

Look and feel

Desktop environment-Unix

Distributed desktop-?

presentation architecture-ICE, CORBA

digital library workspace-ADL,ELIB, UMDL, MSU, MSD, ESA

Page 53: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

PACI EnvironmentsEnvironment Capabilities

Code Development Run-time

Distributed Run-Time / Metacomputing

Collaboratories / Interaction / Presentation

Publication / Discovery / Retrieval

Reservation

Job performance monitor

resource reservation-Maui scheduler

instrument reservation-?

disk space reservation-?

Queuinglocal queuing-LSF,Loadleveler

Generic batch interface - Globus

Scheduling

job mix scheduling-MAUI

Distributed scheduling-Vernon

Communication software

Heterogeneous network-Nexus

Dynamic controlreal-time steering-?

teleinstrumentation-ICE

Executionjob execution-Unix

Load balancing-KeLP

Distributed execution-Globus, Legion, Condor, HPVM, High-performance Java

collaboration service-ICE, Java, Habanero, Tango, Virtual Director

remote service-ELIB, ADL

Operating System Clusters-NOW

Distributed clusters-Millenium

Authorizationaccess control-Unix

global access-Globus/LDAP

access control-MCAT/SEA

Page 54: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Future Systems

• Automation of • Information discovery • Application execution• Publication of results

• Integration of• Code Development• Run-time support• Distributed computing• Collaborative analysis• Information publication

Page 55: Data Intensive Computing Information Based Computing Digital Libraries / Metacomputing Services

National Partnership for Advanced Computational Infrastructure

Further Information

http://www.npaci.edu/DICE