1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART...

27
1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End Computing Systems
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART...

Page 1: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

1

Dr. Frederica DaremaCISE/NSF

Performance Engineering Large Scale Computing Systems

SC07-APART Workhop on:Performance Analysis and Optimization

of High-End Computing Systems

Page 2: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

2

Outline

• The BIG PICTURE• Applications Directions• Computing Platforms Directions• Research and Technology Directions• Examples of some advances• Future Challenges and Opportunities

Page 3: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

Science, Engineering, and “Commercial” Applications

Environments: how are they shaping in the future

What does it entail for:Large-Scale Computing

and.. for Large-Scale High-End Computing

Page 4: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

4

• Processing at multiple levels• Computation and data processing, both at the

application and the instruments/sensors side • New Computational Units

– Beyond commodity microprocessors /superscalar / (D)MT GPU/(GP)2Us (MC-P), MT, FPGAs, GPUs, …

– Populating: high-end platforms, workstations, visualization servers, data servers, etc, …

• Potentially: – MC-Ps, FPGAs, GPUs at application side– MC-Ps, FPGAs, GPUs at the data acquisition side

• One kind of processor EVERYWHERE??? • Or Mix of MC-Ps, FPGAs, GPUs??? • Pros & deficiencies in each - advances close gaps• Complexity persists and increases

Small-Scale and Large-Scale Systems –Increasing complexity of systems and applications …

Page 5: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

5

Platforms Directions

Distributed Platform

MPP NOW

SAR

tac-com

database

firecntl

firecntl

alg accelerator

database

SP

….

– Vector Processors– SIMD MPPs

• Latencies– variable (internode,

intranode)• Bandwidths

– different for different links

– different based on traffic

– Distributed Memory MPs– Shared Memory MPs

– Distributed Platforms, Heterogeneous Computers and Networks

• Heterogeneity– architecture (computer &network)– node power

(supernodes, MCP)

Past

Prese

ntFutu

re

Petaflops Platform(Grid-in-a-Box)

Page 6: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

6

– Mostly monolithic– Mostly one

programming language

– Multi-Modular– Multi-Language– Multi-

Developers– Multi-Source

Data

Present / Future

– Computation Intensive

– Batch– Hours/days

– Computation Intensive– Data Intensive– Real Time– Few Minutes/hours– Visualization – Interactive Steering– Integrated Simulations&Experiments

Dynamic Data Driven Applications Systems

Past

Applications Directions

Page 7: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

7

Dynamic Integration of Computation & Measurements/Data

(from the Real-Time to the High-End)Unification of

Computing Platforms & Sensors/InstrumentsDDDAS guides sensor systems architectures

Example of new applications and systems directionsDynamic Data Driven Application Systems (DDDAS)(www.cise.nsf.gov/dddas & www.dddas.org)

ExperimentMeasurements

Field-Data(on-line/archival)

User

Theory

(First Principles) Simulations

(Math.Modeling

Phenomenology

Observ’n Modeling

Design)

Dynamic Feedback & Control

Loop

Challenges:Application Simulations MethodsAlgorithmic Stability Measurement/Instrumentation MethodsComputing Systems Software Support

DDDAS: ability to dynamically incorporate additional data into an executing application, and in reverse, ability of an application to dynamically steer the measurement process

Software Architecture Frameworks Synergistic, Multidisciplinary

Research

Page 8: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

8

TeraGrid• A distributed system of

unprecedented scale•30+ TF, 1+ PB, 40 Gb/s net

• Unified user environment across resources•User software environment

User support resources

• Integrated new partners to introduce new capabilities•Additional computing,

visualization capabilities•New types of resources:

data collections, instruments

• Created an initial community of over 500 users, 80 PIs

• Created User Portal in collaboration with NMI

courtesy Charlie Catlett

Page 9: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

9

DDDAS: Beyond Grid Computing “Extended Grid” – “SuperGRID”:

the Application Platform is

the computational&measurement system

Applications

Com

puta

tion

al

Plat

form

s

Inst

rum

ents

Sens

ors

Archi

val/

Stor

ed D

ata

Measurement Grids Computational GridsSuperGrids: Dynamically Coupled Networks of Data and Computations

Page 10: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

10

Examples of TeraGrid Applications

Lattice-Boltzman Simulations

Coveney, UCLBruce Boghosian, Tufts

Wheeler/UTAustin, Saltz/OSU,Parashar/RutgersReservoir Modeling

Animation pointed to by 2003 Nobel chemistry prize announcement.

Schulten, UIUC

Aquaporin Mechanism

Groundwater/Flood ModelingMaidment, Wells, UT

Atmospheric ModelingDroegemeier, OU

Advanced Support for TeraGrid Applications:

TeraGrid staff are “embedded”

with applications to create- Functionally distributed workflows- Remote data access, storage

and visualization- Distributed data mining- Ensemble and parameter sweep

run and data management

courtesy Charlie Catlett

Page 11: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

11

To address the complexity of today’s and future systems, applications and their

environments We need systematic modeling and analysis

approaches for designing, supporting the runtime, and management of such

systems

Systems Performance Engineering

Page 12: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

12

Background• Systems Modeling and Analysis increasingly important:

– systems design cycle and runtime– measurements (static and runtime)– functional correctness of hw, hw and sw performance,

dependability, reliability, power management, security, debugging, …

• Traditionally/in the past (for example): – modeling specific aspects components, rather than full system– architectural simulators trade speed for accuracy – full-system

simulators trade accuracy for speed• Want modeling/simulation capabilities that allow

– accurate – cycle level resolution – complete modeling of the entire system – simulate execution of real workloads (full applications or

realistic benchmarks) on top of real OS systems– allow users to probe features in the systems (hardware,

systems software, application) • A number of research efforts are addressing such challenges, and

more…

Page 13: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

13

System Modeling and Analysisdevelop methods and tools for modeling, measuring, analyzing,

evaluating, and predicting the performance, dependability, reliability, runtime management, debugging, security, etc..

for design & runtime support of complex computing and communications systems

• Hardware and Software modeling

– methods tools and measurements, providing multimodal, hierarchical or multilevel modeling and analysis capabilities of such systems;

– methods that describe components of the system, but also the system as a total, and enable assessment of the effects of individual hardware and software layers and components of these systems;

– ability to describe the system in multiple levels of detail (characteristics and time-scales);

– combine different (hybrid) methods of describing components and layers, from analytical, statistical, to simulation, emulation, etc….

– performance specification languages and compilers– testing & validation of developed methods and tools

Page 14: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

14

System Modeling and Analysis

• Modeling and measurement approaches– capabilities to describe, analyze and predict the behavior of the

components as well as the systems; – analysis and prediction due to characteristics or changes in the

application, system software, hardware; – multilevel approaches and multi-modal approaches

• Performance Frameworks – combine tools in “plug-and-play” fashion – multiple views of the system

• Use of systems modeling and analysis methods and tools beyond the design cycle..… that is: to support optimized application composition, mapping, runtime with performance, dependability, fault-tolerance

Page 15: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

15

Authenication

/

Authorization

Fault Recovery

Services

Distributed Systems Management

Distributed, Heterogeneous, Dynamic, AdaptiveComputing Platforms and Networks

DeviceTechnology . . .

CPUTechnology

Visualization

Scalable I/OData Management

Archiving/Retrieval

Services

Collaboration Environments

Distributed Applications

MemoryTechnology

Prog.Models

Libraries

Tools

Compilers

Systems Modeling and Analysis

Perf

orm

an

ce F

ram

ew

ork

s

. . .

Application

Models

File/IOModels

OSSchedulerModels

ArchitectureNetwork Models

MemoryModels

Page 16: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

16

Authenication/ Authorization

DependabilityServices

Distributed Systems Management

VisualizationScalable I/O

Data ManagementArchiving/Retrieval

ServicesOther Services . . .

Collaboration Environments

Distributed Applications

Distributed, Heterogeneous, Dynamic, AdaptiveComputing Platforms and Networks

DeviceTechnology . . .CPU

TechnologyMemory

Technology

Application Models

Architecture /Network Models

MemoryModels

OSScheduler

Models

IO / FileModels

. . . Languages

LibrariesTools

Compilers

Multiple views of the systemThe Operating Systems’ view

Page 17: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

17

DynamicallyLink

&Execute

Technology for integrated feedback & control Runtime Compiling System (RCS) and Dynamic Application

CompositionApplication

Model

Application Program

ApplicationIntermediate

Representation

CompilerFront-End

CompilerBack-End Performance

Measuremetns&

Models

DistributedProgramming

Model

ApplicationComponents

&Frameworks

Dynamic AnalysisSituation

LaunchApplication (s)

Distributed Platform

Ada

ptab

leco

mpu

ting

Syst

ems

Infr

astr

uctu

re

Distributed Computing Resources

MPP NOW

SAR

tac-com

database

firecntl

firecntl

alg accelerator

database

SP

….

Page 18: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

Great set of efforts that are developing systems modeling methods

along these directionsand leading to performance frameworks

Emphasis on Multidisciplinary Research(across sub-areas of CS)

Application driven validation of research and technology advances

Collaborations with industry are fruitful

Projects can be found in the proceedings of the Next Generation Software Workshop Series

organized every year in conjunction with IPDPS

Page 19: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

19

GRADS Project & VGRADS PI: Ken Kennedy, (& Dan Reed, Andrew Chien, Fran Berman, Dennis Gannon, Ian Foster, Jack

Dongarra, et.al)

Performance Contracts - At the Heart of the GrADS Model: •Fundamental mechanism for managing mapping and executionWhat are they?•Mappings from resources to performance •Mechanisms for determining when to interrupt and rescheduleAbstract Definition•Random Variable: r(A,I,C,t0) with a probability distribution

•A = app, I = input, C = configuration, t0 = time of initiation•Important statistics: lower and upper bounds (95% confidence)

Challenge•When should a contract be violated?

•Strict adherence balanced against cost of reconfiguration

Whole-ProgramCompiler

Libraries

DynamicOptimizer

Real-timePerformance

Monitor

PerformanceProblem

ServiceNegotiator

Scheduler

GridRuntimeSystem

SourceAppli-cation

Config-urableObject

Program

SoftwareComponents

Performance Feedback

Negotiation

Program Preparation System Program Execution System

Project Goals: To develop program preparation system support for computational Grid applications and technologies to support efficient run-time management of computational Grid resources, and achieve reliable performance under varying load.

GrADSoft Architecture

Page 20: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

20

Dynamic Adaptive Systems Software for Robust and Dependable Large-Scale Systems

{Adve & Sanders}

Page 21: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

21

Montage - An Integrated End-to-End Design and Development Framework for Wireless Networks

PI: Rappaport (& Browne, Shakkottai, Ramakrishnan, Varadarajan) {UTAustin, VTech}• Project advanced the state-of-the art in fast and efficient methods for simulating large-

scale networks• Deliverables:

– generated a wide range of analytical and simulation-based modeling methods– Developed a wireless channel simulator (the Site Specific Software Simulator for

Wireless - S^4W)• S^4W was used by the PIs to develop more powerful and efficient techniques for

end-to-end improved network performance for users of both wired and wireless networksS^4W has been used by several universities (in US and Canada), industry (Boeing) and NASA, and commercial business (Schlotzky’s deli)

• Developed fast simulation capabilities of networks

• Fast hybrid network simulation using spatiotemporal dilations FluNet: hybrid simulation-emulation environment, based on combined fluid models

• Developed scalable parallel discrete event simulator (Shakkottai, Ramakrishnan)

• Open Network Emulator – Highly scalable distributed direct code execution environment; supports both

simulation and emulation in a single tool; novel method, using the notion of Relativistic Time, so that the global virtual time is derived by dilating the real (wall-clock) time

– Productivity with Performance through Components&Composition (Browne)

• P-COM^2environement: automated compile-time/runtime-composition of a parallel programs - applied here to performance modeling

Page 22: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

22

A Fast, Cycle-Accurate Computer System Technology

Page 23: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

23

Fast and Accurate Simulation of Scalable Computer Systems

(a) Hybrid Emulation

(b) Multiple-context Interleaved Emulation

ProtoFlex addresses full-system and scaling complexity for FPGA-based simulation in two ways. Hybrid emulation (a) avoids reconstruction of the entire system on FPGAs.Interleaved emulation (b) lets us decouple the size and complexity of the simulated system from that of the underlying FPGA host.

{Falsafi & Hoe}

Page 24: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

24

Examples of Modeling & Analysis Efforts (Performance Modeling Frameworks)

• FPGA Accelerated Simulation Technologies – functional simulator + timing model (implemented in FPGAs) for fastest cycle-accurate, full system simulator (within 1-3 orders of real hw)

• Fast and accurate simulator through sampling, checkpointing to capture the microarchitectural state, and performing cycle-accurate simulation in the selected sampled regions, to simulate full (unmodified) applications

• Structural and composable performance simulation of complex systems effort constructs simulators from system descriptions and component libraries (e.g. produced in 11 wks Itanium2 simulator accurate to 3% of actual hardware)

• Real-time large-scale network simulation environment, through a hybrid of continuous and event-driven simulation paradigms, of a fluid-model representation the mean traffic and a packet-oriented simulation. The hybrid testbed will combine advantages of analytical models, simulation and emulation, and physical network testbeds.

• Component based software environment for simulation, emulation and synthesis of network protocols, integrating model-checking with event-driven simulations to allow performance evaluation and protocol validation in a unified way

• End-to-end design and development framework for large-scale wireless networks - composed through capabilities developed under problem solving environments application compile-time and runtime composition methods to compose the simulation and emulation systems for setting-up experimental testbeds, performance engineering methods (of the POEMS project), the Weaves runtime and the P-COM for parallel/distributed execution of discrete event simulations, and integrate low level channel models to higher level protocol layers and the relativistic time temporal model developed under the collabort’n.

Page 25: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

25

Examples of Modeling & Analysis Efforts(Application modeling, resource management,

…)• Modeling system for enabling algorithm designers and programmers to develop,

evaluate and compare application algorithms for CMP/CMT systems• Software tools to enable access to coordinated information collected through

hardware-based profiling of local and remote memory access of application computation and communication patterns

• Dynamic profiling of application phases for optimizing power consumption under set performance constraints for reconfigurable multi-core environments and data servers

• Cross platform performance estimation by partial execution of applications, capturing computation and communication parameters, and generalizing prediction to problem-scaling scenarios, in parallel and distributed platforms

• Language support continuous monitoring of distributed systems, grids and other data-centric and network systems

• Adaptive resource sharing mechanisms autonomically matching resources to dynamically changing needs via statistical and stochastic approaches

• Data driven resource allocation in complex systems, through workload characterization, analytical models and policy development

• Compiler enabled model- and measurement-driven adaptation environment for dependability and performance (performability)

• Engineering reliability at software design time by coupling software component architectural models with statistical methods to address uncertainties in design stage

• Tools for pro-active runtime system health monitoring and enhancement for large-scale parallel systems, by collecting and analyzing through on-line models data collected over extended periods of time, and in real-time, filtering and correlating evolving failure data with respect to factors such as workload and operating temperature, and use this information to schedule or checkpoint jobs

Page 26: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

26

Summary Thoughts• Large scale high-End systems cannot be treated as

isolated platforms• Such systems demand: enhanced and optimized

computation, communication and data management capabilities, in the presence of resource heterogeneity, dynamicity, adaptivity

• Need to advance the technologies that will automate the mapping of complex and dynamic applications on complex platforms with multiple and heterogeneous levels of processors, memory, and networks

• Modeling and Analysis Methods – Performance Engineering of systems are crucial in enabling optimized design, runtime, and management of such systems

Page 27: 1 Dr. Frederica Darema CISE/NSF Performance Engineering Large Scale Computing Systems SC07-APART Workhop on: Performance Analysis and Optimization of High-End.

27

Dynamic Adaptive Systems Software for Robust and Dependable Large-Scale Systems

• Award 0406351: A Compiler-Enabled Model- and Measurement-Driven Adaptation Environment for Dependability and PerformanceWilliam Sanders and Vikram AdveDevelops compiler controlled performance data monitoring together with performance models for adaptive and optimized runtime support, in environments with underlying computational, communication, and storage resources maybe changing, as well as environments where also the application requirements may be changing

Combines and advances in novel directions work on dynamic runtime compilation methods (LLVM) developed by Adve in 0093426(CAREER) - NGS: Techniques and Applications of Dynamic Compilation; and system level integrated performance methods developed by Sanders in 0228762 - Next Generation Software: An Integrated Framework for Performance Engineering and Resource-Aware CompilationIn addition to the multidisciplinary work from two sub-areas of computer sciences: compilers and performance modeling and analysis the project includes collaboration with industry, and specifically with two senior researchers from ATT Labs-Research, which provides resources such as production-level software, to drive and validate the research methods, and also provides opportunities for student internships at the ATT Research Lab.

Other Technical impacts of the individual projects: The LLVM compiler infrastructure has been publicly distributed since October 2003 and downloaded well over 2000 times since. It has attracted at least 40 serious users in academia (instructors and researchers) and industry (startups and established companies). Apple Computer has not only adopted LLVM and has set up an active group of developers working on incorporating LLVM in Apple’s products such as the next release of MacOS due in Spring 2007 A paper: Automatic Pool Allocation, on novel methods developed under the project and incorporated in LLVM, won a Best Paper award at the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), the premier conference in the area of compilers.

Other Technical impacts of the individual projects: Möbius is a performance engineering framework and tool for the evaluation of distributed and parallel computing systems, accounting for system components including the application software itself, the operating system, and the underlying computing and communication hardware. The framework provides a means by which multiple, heterogeneous models can be composed together, each representing a different module (software or hardware), component, or view of the system.

Möbius has made a significant worldwide impact in the research area of stochastic model analysis. The impact spans both academic and commercial domains. In addition to being the principal tool used in the graduate-level system reliability courses at the University of Illinois, USA and the Univ. of Florence, Italy, Möbius has been licensed to over 150 university sites throughout the world for teaching and research purposes. International Partnerships with tesearch groups from the Univ. of Twente, Dörtmund University, University of the Federal Armed Forces München, and Saarland University are partnering with the Möbius team to developing plug-in modules for the Möbius framework. The first International Möbius Developer’s Working group meeting was held in Sept. 2004, further increasing the number of groups that use Möbius in their research.Möbius has also been licensed for commercial use to many companies, including: Motorola, Iridium, Pioneer Hybrids, Windber Research Institute, General Dynamics and Boeing. For example, Möbius have been used for numerous telecommunications and computer system applications at Motorola and was designated one of three company wide system availability modeling packages.Recently, researchers have begun to use Möbius for biological applications; over 25 universities and Pioneer Hybrid (the world's largest seed producer) and Windber Research Incorporated (non-profit research organization with projects studying the disease progression of breast cancer) have licensed it for use with biological systems.