Session: Application Deployment on Grids/eScience The 28 th Open Grid Forum München, Germany.
Autonomic Grid Computing Autonomics for eScience · 2017. 10. 30. · Autonomic Grid Computing...
Transcript of Autonomic Grid Computing Autonomics for eScience · 2017. 10. 30. · Autonomic Grid Computing...
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 1
Autonomic Grid Computing Autonomics for eScience
Manish Parashar Rutgers University, USA
&Omer Rana
Cardiff University, UK
Pervasive Computation Ecosystems & Next Generation eScience/Engineering
• Pervasive Computational Ecosystems (Pervasive Grids)– Addressing applications in an end-to-end manner!– On-demand/ad-hoc access to geographically distributed computing,
communication and information resources• Computers, networks, data archives, instruments, observatories,
experiments, sensors/actuators, ambient information, etc.
• Knowledge-based, information/data-driven, context/content-aware computationally intensive, pervasive applications– Symbiotically and opportunistically combine services/computations,
real-time information, experiments, observations, and to manage, control, predict, adapt, optimize, …
• Crisis management, monitor and predict natural phenomenon, monitor and manage engineered systems, optimize business processes
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 2
Autonomic Grid Ecosystems and Knowledge-based Dynamic Information-driven Investigation
Information-driven Management of Subsurface Geosystems: The Instrumented Oil Field (with UT-CSM, UT-IG, OSU, UMD, ANL)
Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.
Assimilate data & reservoir properties intothe evolving reservoir model.
Use simulation and optimization to guide future production.
Data Driven
ModelDriven
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 3
Management of Subsurface/Near Surface Geosystems
LandfillsLandfills OilfieldsOilfields
ModelsModels SimulationSimulation
DataDataControlControlUndergroundPollution
UndergroundPollution
UnderseaReservoirsUnderseaReservoirs
Vision: Diverse Geosystems – Similar Solutions
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 4
Management of the Ruby Gulch Waste Repository (with UT-CSM, INL, OU)
– Flowmeter at bottom of dump– Weather-station– Manually sampled chemical/air
ports in wells– Approx 40K measurements/day
• Ruby Gulch Waste Repository/Gilt Edge Mine, South Dakota – ~ 20 million cubic yard of
waste rock– AMD (acid mine drainage)
impacting drinking water supplies
• Monitoring System– Multi electrode resistivity system
(523)• One data point every 2.4
seconds from any 4 electrodes – Temperature & Moisture sensors
in four wells
“Towards Dynamic Data-Driven Management of the Ruby Gulch Waste Repository,” M. Parashar, et al, DDDAS Workshop, ICCS 2006, Reading, UK, LNCS, Springer Verlag, Vol. 3993, pp. 384 – 392, May 2006.
Inverse Modeling
System responses
Parameters,Boundary & Initial Conditions
Forward Modeling
Prediction
ComparisonWith observations
Networkdesign
Applicationgoodbad
Adaptive Fusion of Stochastic Information for Imaging Fractured Vadose Zones
• Near-Real Time Monitoring, Characterization and Prediction of Flow Through Fractured Rocks
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 5
Data-Driven Forest Fire Simulation (U of AZ)
• Predict the behavior and spread of wildfires (intensity, propagation speed and direction, modes of spread) – based on both dynamic and
static environmental and vegetation conditions
– factors include fuel characteristics and configurations, chemical reactions, balances between different modes of hear transfer, topography, and fire/atmosphere interactions.
“Self-Optimizing of Large Scale Wild Fire Simulations,” J. Yang*, H. Chen*, S. Hariri and M. Parashar, Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, GA, USA, Springer-Verlag, May 2005.
Scientific Understanding
Observations
Forecasts
The Instrumented Ocean Ecosystem
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 6
NJ Turnpike Delaware River Bridge
Focus of Fatigue Study
System for Laser Treatment of Cancer – UT, Austin
Source: L. Demkowicz, UT Austin
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 7
Synthetic Environment for Continuous Experimentation – Purdue University
Source: A. Chaturvedi, Purdue Univ.
Structural Health Monitoring and Critical Event Prediction – Stanford University
Pilot Display of Crisis
On-board Sensing & Processing
n-line Prognosis & Processing
DamageDetection
Material CharacterizationHigh-Performance Off-Line
Prognosis by Modeling and Simulation
On-Demand Maintenance
Detailed Report
Data for Model Update
Pilot Display of CrisisPilot Display of Crisis
On-board Sensing & Processing
n-line Prognosis & Processing
On-board Sensing & Processing
n-line Prognosis & Processing
DamageDetection
Material Characterization
Material CharacterizationHigh-Performance Off-Line
Prognosis by Modeling and SimulationHigh-Performance Off-Line
Prognosis by Modeling and Simulation
On-Demand Maintenance
Detailed Report
On-Demand Maintenance
Detailed Report
Data for Model Update
Source: C. Farhat, SU
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 8
Integrated Wireless Phone Based Emergency Response System – Notre Dame
• Detect abnormal patterns in mobile call activity and locations• Initiate dynamic data driven simulations to predict the evolution of
the abnormality• Initiate higher resolution data collection in localities of interest• Interface with emergency response Decision Support Systems
Source: G. Madey, ND
Auto-Steered Information-Decision Processes for Electrical Systems Asset Management – Iowa State Univ.
• Develop a hardware-software prototype for auto-steering the information-decision cycles inherent to managing operations, maintenance, & planning of high-voltage electric power transmission systems.
Remote ControlHeadquarters
WideArea
Network
SecureAccess
SatelliteEthernet Radio Optical Fiber Power Line Internet
2.4 or 5 GHz ISM Band
Wireless Sensor
Network Transceiver
Local Communication Box
NetworkSwitch
Base Station
Port Server WAN
Connection
Data Processor
Source: J. McCalley, Iowa State
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 9
Many Application Areas ….
• Hazard prevention, mitigation and response– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist
attacks• Critical infrastructure systems
– Condition monitoring and prediction of future capability• Transportation of humans and goods
– Safe, speedy, and cost effective transportation networks and vehicles (air, ground, space)
• Energy and environment– Safe and efficient power grids, safe and efficient operation of regional collections
of buildings• Health
– Reliable and cost effective health care systems with improved outcomes• Enterprise-wide decision making
– Coordination of dynamic distributed decisions for supply chains under uncertainty• Next generation communication systems
– Reliable wireless networks for homes and businesses• … … … …
• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org
Source: M. Rotea, NSF
Some Application Characteristics (Vectors)
• Close to loop! – acquire, assimilate, analyze/predict, control, ….
• Manage uncertainty– Incomplete knowledge, system/application dynamics, ….
• Opportunistic– applications do not depend on the availability of the
information, but can opportunistically use information if it is available
• Compositional– multiple (independent) application entities cooperate with
each other to collectively get the end-to-end capabilities
• Control– Provide sensor steering/autonomic control capabilities using
actuation devices
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 10
The Research Challenge: Addressing Uncertainty!
• System Uncertainty– Very large scales– Ad hoc structures/behaviors– Dynamics– Heterogeneous
• capability, connectivity, reliability, guarantees, QoS
– Lack of guarantees– Lack of common/complete
knowledge
• Information Uncertainty– Availability, resolution, quality of
information– Devices capability, operation,
calibration– Trust in data, data models – Semantics
• Application Uncertainty– Dynamic behaviors
• space-time adaptivity– Dynamic and complex couplings– Dynamic and complex
(opportunistic) interactions– Software/systems engineering
issues• Emergent rather than by design
• Must be addressed at multiple levels– Algorithms
• Asynchronous/chaotic, failure tolerant, …• Uncertainty estimation/characterization (probabilistic
collocation)
– Programming systems• Adaptive, application/system aware, proactive, …
– Infrastructure• Decoupled, self-managing, resilient, …
Pervasive Grid Computing – Research Issues, Opportunities• Programming systems/models for data integration and runtime self-
management– components and compositions capable of adapting behavior, interactions
and information– correctness, consistency, performance, quality-of-service constraints
• Content-based asynchronous and decentralized discovery and access services– semantics, metadata definition, indexing, querying, notification
• Data management mechanisms for data acquisition and transport with real time, space and data quality constraints– high data volumes/rates, heterogeneous data qualities, sources – in-network aggregation, integration, assimilation, caching
• Runtime execution services that guarantee correct, reliable execution with predictable and controllable response time– data assimilation, injection, adaptation
• Security, trust, access control, data provenance, audit trails, accounting
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 11
Organization Organization Organization
Grid Middleware Infrastructure
Virtual Organization
Applications
Programming System(Programming Model + Abstract Machine)
Virtualization(Create and manage VOs and VMs)
Abstraction(Support abstract behaviors/assumptions)
Virtual Machine Virtual Machine Virtual Machine
Organization Organization OrganizationOrganizationOrganization OrganizationOrganization OrganizationOrganization
Grid Middleware Infrastructure
Virtual Organization
Applications
Programming System(Programming Model + Abstract Machine)
Virtualization(Create and manage VOs and VMs)
Abstraction(Support abstract behaviors/assumptions)
Virtualization(Create and manage VOs and VMs)
Abstraction(Support abstract behaviors/assumptions)
Virtual Machine Virtual Machine Virtual Machine
Programming Pervasive Grid Systems
– Programming System• programming model,
languages/abstraction – syntax + semantics
– entities, operations, rules of composition, models of coordination/communication
• abstract machine – execution context and assumptions
• infrastructure, middleware and runtime
– Hide v/s expose uncertainty?• virtual homogeneity, ease of
programming • robustness • cross-layer adaptation and
optimization– the inverted stack …
“Conceptual and Implementation Models for the Grid,” M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.
Project AutoMate: Autonomics for Applications
• Conceptual models and implementation architectures – programming systems based on popular programming models– asynchronous computational engines– content-based coordination and messaging middleware– amorphous and emergent overlays
• WWW: http://automate.rutgers.edu
Rudder
Coordination
Middlew
are
Sesame/D
AIS P
rotection Service
Autonomic Grid Applications
Programming SystemAutonomic Components, Dynamic Composition,
Opportunistic Interactions, Collaborative Monitoring/Control
Decentralized Coordination EngineAgent Framework,
Decentralized Reactive Tuple Space
Semantic Middleware ServicesContent-based Discovery, Associative Messaging
Content OverlayContent-based Routing Engine,
Self-Organizing Overlay
Ont
olog
y, T
axon
omy
Met
eor/S
quid
Con
tent
-bas
edM
iddl
ewar
e
Acc
ord
Prog
ram
min
gFr
amew
ork
“AutoMate: Enabling Autonomic Grid Applications,” M. Parashar et al, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Vol. 9, No. 2, pp. 161 – 174, 2006.
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 12
Efforts @ TASSL
• Programming Abstractions and Systems (GridMap/iZone, Accord, Seine, Salsa)– Discovering, querying, interacting with, and controlling instrumented physical
systems– adapt behaviors and interactions/compositions of applications
components/services
• Data Quality and Uncertainty Management– Online characterization of information quality and estimation of uncertainty,
so that it can effectively drive decision making
• Autonomic Infrastructure (CometG, Meteor, Squid)– Asynchronous computation, communication and coordination– Data acquisition, wide-area streaming and in-transit manipulation– Content based data/service discovery and composition– Decoupled associative messaging– Application-driven dynamic management of sensor/actuator systems
• overlay management, iso-clustering, computation/communication/power tradeoffs
Accord: Rule-Based Programming System
• Accord is a programming system which supports the development of autonomic applications.– Enables definition of autonomic components with programmable
behaviors and interactions.– Enables runtime composition and autonomic management of these
components using dynamically defined rules.• Dynamic specification of adaptation behaviors using rules.• Enforcement of adaptation behaviors by invoking sensors and
actuators.• Runtime conflict detection and resolution.
• 3 Prototypes: Object-based, Components-based (CCA), Service-based (Web Service)
“Accord: A Programming Framework for Autonomic Applications,” H. Liu* and M. Parashar, IEEE Transactions on Systems, Man and Cybernetics, Special Issue on Engineering Autonomic Systems, IEEE Press, Vol. 36, No 3, pp. 341 – 352, 2006.
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 13
Autonomic Element/Behaviors in Accord
Element Manager
Functional Port
Autonomic Element
Control Port
Operational Port
ComputationalElement
Element Manager
Functional Port
Autonomic Element
Control Port
Operational Port
ComputationalElement
Element Manager
Event generation
Actuatorinvocation
OtherInterface
invocation
Internalstate
Contextualstate
Rules
Element Manager
Event generation
Actuatorinvocation
OtherInterface
invocation
Internalstate
Contextualstate
Rules
Application workflow
Composition manager
Application strategiesApplication requirements
Interaction rules
Interaction rules
Interaction rules
Interaction rules
Behavior rules
Behavior rules
Behavior rules
Behavior rules
Application workflow
Composition manager
Application strategiesApplication requirements
Interaction rules
Interaction rules
Interaction rules
Interaction rules
Behavior rules
Behavior rules
Behavior rules
Behavior rules
LLC Based Self Management within Accord
• Element/Service Managers are augmented with LLC Controllers– monitors state/execution context of elements– enforces adaptation actions determined by the controller– augment human defined rules
Self-Managing Element
ComputationalElement
LLC Controller
Element Manager
InternalState
ContextualState
OptimizationFunction
LLC Controller
Element Manager
Advice Computational Element Model
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 14
For Example: Optimizing Via Component Replacement
For Example: Optimizing Via Component Adaptation
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 15
For Example: Healing Via Component Replacement
Decentralized (Decoupled/Asynchronous) Content-based Middleware Services
Pervasive Grid Environment
Self-Managing Overlay (SquidTON)
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 16
The Instrumented Oil Field of the Future (UT-CSM, UT-IG, RU, OSU, UMD, ANL)
Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changes
Assimilate data & reservoir properties intothe evolving reservoir model
Use simulation and optimization to guide future production, future data acquisition strategy
• Production of oil and gas can take advantage of installed sensors that will monitor the reservoir’s state as fluids are extracted
• Knowledge of the reservoir’s state during production can result in better engineering decisions
– economical evaluation; physical characteristics (bypassed oil, high pressure zones); productions techniques for safe operating conditions in complex and difficult areas
“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.
Effective Oil Reservoir Management: Well Placement/Configuration
• Why is it important – Better utilization/cost-effectiveness of existing reservoirs– Minimizing adverse effects to the environment
Better Management
Less Bypassed Oil
Bad Management
Much Bypassed Oil
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 17
Optimize
• Economic revenue
• Environmental hazard
• …
Based on the present subsurface knowledge and numerical model
Improve numerical model
Plan optimal data acquisition
Acquire remote sensing data
Improve knowledge of subsurface to
reduce uncertainty
Update knowledge of modelM
anag
emen
t dec
isio
n
START
Dynamic Decision Dynamic Decision SystemSystem
Dynamic DataDynamic Data--Driven Assimilation Driven Assimilation
Data assimilation
Subsurface characterization
Experimental design
Autonomic Autonomic Grid Grid
MiddlewareMiddleware
Grid Data ManagementGrid Data ManagementProcessing MiddlewareProcessing Middleware
Dynamic Data-Driven Reservoir Management: “Closing the Loop” using Optimization
An Autonomic Well Placement/Configuration Workflow
If guess not in DBinstantiate IPARS
with guess asparameter
Send guesses
MySQLDatabase
If guess in DB:send response to Clientsand get new guess fromOptimizer
OptimizationService
IPARSFactory
SPSA
VFSA
ExhaustiveSearch
DISCOVERclient
client
Generate Guesses Send GuessesStart Parallel
IPARS InstancesInstance connects to
DISCOVER
DISCOVERNotifies ClientsClients interact
with IPARS
AutoMate Programming System/Grid Middleware
History/ Archived
Data
Sensor/ContextData
Oil prices, Weather, etc.
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 18
AutonomicAutonomic OilOil Well Placement/Configuration
permeability Pressure contours3 wells, 2D profile
Contours of NEval(y,z,500)(10)
Requires NYxNZ (450)evaluations. Minimumappears here.
VFSA solution: “walk”: found after 20 (81) evaluations
AutonomicAutonomic OilOil Well Placement/Configuration (VFSA)
“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers, Vol. 8, No. 4, pp 255 – 269, 2005 “Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 19
Wide Area Data Streaming in the Fusion Simulation Project (FSP)
• Wide Area Data Streaming Requirements – Enable high-throughput, low latency data transfer to support near real-time
access to the data– Minimize related overhead on the executing simulation– Adapt to network conditions to maintain desired QoS– Handle network failures while eliminating data loss.
GTC Runs on Teraflop/Petaflop Supercomputers
Data archiving
Data replication
Large data analysis
End-to-end system with monitoring routines
Data replication
User monitoring
Post processing
40Gbps
User monitoring
Visualization
Autonomic Data Streaming for Coupled Fusion Simulation Workflows
TransferSimulation
Storage
ORNL, TN
AnalysisStorage
PPPL, NJNERSC, CAData Grid Middleware
& Logistical NetworkingBackbone
Simulation : Simulation Service (SS)
Analysis/Viz :Data Analysis Service (DAS)
Storage : Data Storage Service (DSS)
Transfer : Autonomic Data Streaming Service (ADSS)
– Buffer Manager Service (BMS)
– Data Transfer Service (DTS)
Data TransferBuffer
Autonomic Data Streaming Service
(ADSS)
Managed Service
• Grid-based coupled fusion simulation workflow– Run for days between ORNL (TN), NERSC (CA), PPPL (NJ) and
Rutgers (NJ) – Generates multi-terabytes of data– Data coupled, analyzed and visualized at runtime
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 20
ADSS Implementation
DTSBMS
Bandwidth Measurement
Buffer Size Measurement
SS
Buffer size Prediction
Data Rate Measurement
Optimization Function
LLC Model
Local HD
LLC Controller
Element/Service Manager
ADSS
Contextual StateService State
Rule Based Adaptation Grid Middleware {LN}
Backbone
Rule Base
Data Rate Prediction
DTSBMS
Bandwidth Measurement
Buffer Size Measurement
SS
Buffer size Prediction
Data Rate Measurement
Optimization Function
LLC Model
Local HD
LLC Controller
Element/Service Manager
ADSS
Contextual StateService State
Rule Based Adaptation Grid Middleware {LN}
Backbone
Rule Base
Data Rate Prediction
Adaptive Data Transfer
• No congestion in intervals 1-9 – Data transferred over WAN
• Congested at intervals 9-19 – Controller recognizes this congestion and advises the “Element Manager” which in
turn adapts DTS to transfer data to local storage (LAN).• Adaptation continues until the network is not congested
– Data sent to the local storage by the DTS falls to zero at the 19th controller interval.
Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24
Dat
a Tr
ansf
errr
ed b
y D
TS(M
B)
0
20
40
60
80
100
120
140
Ban
dwid
th (M
b/se
c)
0
20
40
60
80
100
120
DTS to WANDTS to LANBandwidthCongestion
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 21
Adaptive Buffer Management
• Uniform buffer management is used when data generation is constant • Aggregate buffer management is triggered when congestion increases
Controller Interval0 2 4 6 8 10 12 14 16 18 20 22 24 26
Blo
cks
Gro
uped
by
BM
S
0
1
2
3
4
5
6
7
8
Ban
dwid
th (M
b/se
c)
0
20
40
60
80
100
120
Blocks 1 Block = 4MB
Congestion
Bandwidth
Uniform Buffer Management
Aggregate Buffer Management
Adaptation of the Workflow
• Create multiple instances of the Autonomic Data Streaming Service (ADSS)
– Effective Network Transfer Rate dips below the Threshold (our case around 100Mbs)
– Maximum number of instances is contained within a certain limit
Data Generation Rate (Mbps)0 20 40 60 80 100 120 140 160
% N
etw
ork
thro
ughp
ut
0
20
40
60
80
100N
umbe
r of A
DSS
Inst
ance
s
0
1
2
3
4
5
% Network throughput vs MbpsNumber of ADSS Instances vs Mbps
TransferSimulation
ADSS-0 Data TransferBuffer
Data TransferBuffer
Data TransferBuffer
ADSS-1
ADSS-2
% Network throughput is difference between the max and current network transfer rate
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 22
Autonomic Data Streaming and In-transit Data Manipulation: Two Level Cooperative Self-Management
• First Level– Application level data streaming
• Proactive QoS management based on online control and user defined policies
• Second Level– Adaptive runtime management strategies for in-transit data manipulation
• Quick, opportunistic and reactive– Adaptive buffering and processing techniques performed at each in-transit node
• Masks idle time of in-transit data by performing additional computations in the data pipeline
In-Transit node
Data Consumer
or Sink
Data Producer
Process Buffer
Forward
In-Transit Level Reactive
Management
Simulation
LLC Controller
Service Manager
Application Level Proactive Management
BufferData
Blocks
Data Blocks
Data Consumer
Data Transfer
Coupling
Summary of Results
• Experimental setup – “in-transit” nodes (clusters)
• Rutgers (“Frea”) and PPPL (“Portalx” and “Gridn”)– Data generation
• NERSC (“Seaborg”) and ORNL (“RAM”)
• Adaptive processing at in-transit nodes reduced %idle time per data item from 40% to 2%
• Quality of data received at sink improved with adaptation at in-transit nodes
• End to End cooperative management – Results in better QoS than standalone mechanisms at the two
levels which operate without cooperation• Less data (25%) was written at the data generation end in face of
congestions at in-transit nodes
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 23
Healthcare Scenario – Patient-driven intervention
• Support for “continuous” monitoring of an individual “Individualised Healthcare”– Post-operative care (short term)– Long-term home care
• Medical sensors of different types– Reaction to intake of particular medication– Manage progression of disease– Assess post-operative care
Key Issues
• Power management– Limited power usage (periodic transmissions)– Power through body movements– Mechanisms for energy storage and use of multiple power
sources
• Data Quality– Reduce total data volume transmitted
• Development of implantable “biocompatible” sensors– Use of specialist “blood analyte”– “BioSensing” – signals generated from biological
phenomenon (glucose and lactate enzymes)
• Fusing data from multiple sensors
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 24
0.0
2.0
4.0
6.0
8.0
10.0
00:0
0
00:4
0
01:2
0
02:0
0
02:4
0
03:2
0
04:0
0
04:4
0
05:2
0
06:0
0
06:4
0
07:2
0
08:0
0
08:4
0
09:2
0
10:0
0
10:4
0
11:2
0
12:0
0
12:4
0
13:2
0
14:0
0
14:4
0
15:2
0
16:0
0
16:4
0
17:2
0
18:0
0
18:4
0
19:2
0
20:0
0
20:4
0
21:2
0
22:0
0
22:4
0
23:2
0
Time
Glu
cose
(mm
ol/L
)
Devising ‘Trends’ Algorithms Based on Single Data Points(reality check – single points are gappy and error-prone)
Compare continuous data - 576 blood glucose concentration measurements(every 5 min over 24 hours then the same after 6 months of a drug intervention in same individual)
Pre-intervention
Post-intervention
Healthcare@Home – Integrated Inputs
©2005 Cardiff University / WeSC
BLUETOOTH
FromEd Conley
Data may be recorded and maintained atvarious places
Need for analysis at a central site
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 25
Logical Architecture
End to End PatientView
Overall System/ServerArchitecture View
Intended Users:
•Clinicians (Doctors, Nurses, etc)•Patients •Researchers•Policy Makers
@ Home – sending medical reading to server
(1) Patient uses glucose meter
(2) Device sends reading via Bluetooth
GPRS
(3) Hub sends reading via GPRS toH@H server
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 26
On receiving medical data from sensors
GPRS
Validatedata DCCR
Diabetes ContinuingCare Reference
save tonotification
Autonomic (Physics/Model/System Driven) Runtime Management ation Runtime Management in V-Grid
Grid Resource Hierarchy
Application Domain Hierarchy
Virtual Grid Resource Autonomic Runtime Manager (ARM)
Loop for each level of Grid/Application hierarchy
V-Grid Monitoring( Self-observation, Context-awareness )
System states (CPU, Memory, Bandwidth, Availability etc.)Application states (Computation/Communication Ratio, Nature of Applications, Application Dynamics)
V-Grid Deduction( Self-adaptation, Self-optimization, Self-healing)
Identify and characterize natural regionsDefine objective functions and management strategyDefine VCUs
V-Grid ExecutionPartition, Map and Tune
NR1 NR2 NR5
VCU10
VCU11 VCU2
1VCUn
1
VCU12 VCU2
2 VCUi2 VCUj
2
Self-
lear
ning
NR: Application Natural RegionsVCU: Virtual Computational Unit
V-Grid
VirtualResource
Unit
SP2 ClusterBeowulf
Linux
VirtualResource
Unit
E10K
Workstation
Beowulf
Workstation
VirtualResource
Unit
IBM SP2
t
NCM
t
NCM
t
NCM
t
NCM
t
NCM
t
NCM
t
NCM
IBM SP2
WAN
Institutional
Divisional/Departmental
ComputingNode
Cap
abili
ty
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 27
Autonomic Runtime Management
Self-Optimization& Execution
Self-Observation
& Analysis
AutonomicPartitioning
Partition/ComposeRepartition/Recompose
VCUVCUVirtual Computation
Unit
VCUVirtualResource
Unit
Dynamic Driver Application
Monitoring &Context-Aware
Services
Application
Monitoring
Service
Resource
Monitoring
Service
Heterogeneous, Dynamic Computational Environment
Natural Region Characterization
PerformancePrediction
Module
CPUSystem
CapabilityModule
MemoryBandwidthAvailability
Access Policy
Resource History Module
System StateSynthesizer
Application State Characterization
Nature of Adaptation
ApplicationDynamics
Computation/Communication
ObjectiveFunction
Synthesizer
Prescriptions
MappingDistribution
Redistribution
Execution
NRM NWM
Normalized Work Metric
NormalizedResource Metric
Autonomic Scheduling
VGTS: Virtual Grid Time SchedulingVGSS: Virtual Grid Space Scheduling
Global GridScheduling
Local GridScheduling
VGTS VGSS VGTS VGSS
Virtual GridAutonomic
RuntimeManager
Current System State
Current Application State
DeductionEngine
DeductionEngine
DeductionEngine
Experimental Evaluation – Autonomic Runtime Management
Experiment Setup:IBM SP4 cluster (DataStar at San Diego Supercomputer Center,
total 1632 processors)SP4 (p655) node: 8 processors(1.5 GHz), memory 16 GB, 6.0 GFlops
RM3D: refinement levels = 3refinement factor = 2
Performance gain
AHMP 30% - 42%LPA 20% - 29%
GPA: GreedyLPA: Level-based
Execution Time for RM3D Application (1000 time steps, size=256x64x64)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
64 128 256 512 1024 1280
Number of Processors
Exe
cutio
n Ti
me
(sec
)
GPALPASBC+AHMP
Overall Performance
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 28
ARMaDA: Application-sensitive Adaptations
• PAC tuple, 5-component metric• Octant approach: app. runtime state• GrACE (ISP), Vampire (pBD-ISP,
GMISP+SP) partitioners• ARMaDA framework
– Computation/communication– Application dynamics– Nature of adaptation
• RM3D, 64 procs on “Blue Horizon”– 100 steps, base grid 128*32*32– 3 levels, RF = 2, regrid 4 steps
ARMaDA: System-sensitive Adaptations
• System characteristics using NWS• RM3D compressible turbulence
application– 128x64x64 base (coarse) grid– 3 levels, factor 2 refinement
• System/Environment– University of Texas at Austin (32
nodes), Rutgers (16 nodes)
Capacity Calculator
System Sensitive Partitioner
CPU
Memory
Bandwidth
Capacity
Applications
Weights
Partitions
ResourceMonitoringTool
kbkmkpk BwMwPwC ++=
0100200300400500600700800900
Execution time (secs)
4 8 16 32
Number of processors
Non System-SensitiveSystem-Sensitive
430225842427264502924
805.5423.72
Static Sensing (s)
Dynamic Sensing (s)Procs
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 29
Handle Different Resource Situations
Efficiency
Performance
Survivability
less space
more time
applicaton-level out-of-core
(ALOC)
less time
more space
applicaton-level pipelining
(ALP)
Space Time
(a)
(b)
(c)
When resourcesare under-utilized
When resourcesare scarce
ALP: Trade in space (resource) for time (performance)ALOC: Trade in time (performance) for space (resource)
Experimental Results - ALP
Experiment Setup:IBM SP4 cluster (DataStar at San Diego Supercomputing Center,
total 1632 processors)SP4 (p655) node: 8 processors(1.5 GHz), memory 16 GB, 6.0 GFlops
Performance gainup to 40% on 512 processors
Execution Time for RM3D Application (1000 time steps, size=128x32x32)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
16 32 48 64 96 128 256 384 512
Number of Processors
Exe
cutio
n Ti
me
(sec
)
AHMP without ALPAHMP with ALP
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 30
Effects of Finite Memory - ALOC
Intel Pentium 4 CPU 1.70GHz, Linux 2.4 kernelCache size: 256 KB, Physical memory: 512 M, Swap space: 1 G.
Processing Time versus Allocated Memory
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 51 101 151 201 251 301 351 401 451 501 551 601
Memory Allocation (MB)
Pro
cess
ing
Tim
e (m
s/op
erat
ion)
Processing Time versus Allocated Memory
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
1 51 101 151 201 251 301 351 401
Memory Allocation (MB)
Pro
cess
ing
Tim
e (m
s/op
erat
ion)
Experimental Results - ALOC
Number of Page Faults for RM3D Application (100 time steps, size=128x32x32, 4 refinement levels)
0
1000
2000
3000
4000
5000
6000
1 11 21 31 41 51 61 71 81 91
Regridding Steps
Num
ber
of P
age
Faul
ts
NonALOCALOC
crash point
Execution Time for RM3D Application(100 time steps, size=128x32x32, 4 refinement levels)
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
1 11 21 31 41 51 61 71 81 91Regridding Steps
Exec
utio
n Ti
me
NonALOCALOC
crash point
Boewulf Cluster (Frea at Rutgers, 64 processors)Intel Pentium 4 CPU 1.70GHz, Linux 2.4 kernelCache size: 256 KB, Physical memory: 512 M, Swap space: 1 G.
Autonomic Grid Computing Tutorial, eScience '07, Bangalore, India 12/10/07
Manish Parashar & Omer Rana 31
Conclusion
• Pervasive Computational Ecosystems and eScience– Unprecedented opportunity
• can enable a new generation of knowledge-based, data and information driven, context-aware, computationally intensive, pervasive applications
– Unprecedented research challenges• scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …• applications, algorithms, measurements, data/information, software
• Autonomic Grid Computing– Addressing the complexity of pervasive Grid environments
• Project AutoMate: Autonomics for eScience on the Grid– Semantic + Autonomics – Accord, Rudder/Comet, Meteor, Squid, Topos, …
• More Information, publications, software– www.caip.rutgers.edu/~parashar/– [email protected]
Thank You!