Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software...

41
Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory www.hpec-si.org High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government. Dr. Jeremy Kepner MIT Lincoln Laboratory

Transcript of Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software...

Page 1: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-1HPEC-SI

MITRE AFRLMIT Lincoln Laboratory

www.hpec-si.org

High Performance Embedded Computing

Software Initiative (HPEC-SI)

This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Dr. Jeremy KepnerMIT Lincoln Laboratory

Page 2: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-2www.hpec-si.org

MITRE AFRLLincoln

• DoD Need• Program Structure

Outline

• Introduction

• Software Standards

• Parallel VSIPL++

• Future Challenges

• Summary

Page 3: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-3www.hpec-si.org

MITRE AFRLLincoln

Enhanced Tactical Radar Correlator(ETRAC)

Overview - High Performance Embedded Computing (HPEC) Initiative

HPEC Software Initiative

Programs

Demonstration

Development

Applied Research

DARPA

ASARS-2

Shared memory serverEmbedded

multi-processor

Challenge: Transition advanced software technology and practices into major defense acquisition programs

Common Imagery Processor (CIP)

Page 4: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-4www.hpec-si.org

MITRE AFRLLincoln

$0.0

$1.0

$2.0

$3.0

Why Is DoD Concerned with Embedded Software?

Source: “HPEC Market Study” March 2001

Estimated DoD expenditures for embedded signal and image processing hardware and software ($B)

• COTS acquisition practices have shifted the burden from “point design”

hardware to “point design” software

• Software costs for embedded systems could be reduced by one-third

with improved programming models, methodologies, and standards

• COTS acquisition practices have shifted the burden from “point design”

hardware to “point design” software

• Software costs for embedded systems could be reduced by one-third

with improved programming models, methodologies, and standards

Page 5: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-5www.hpec-si.org

MITRE AFRLLincoln

NSSNAEGIS

Rivet Joint

Standard Missile

PredatorGlobal Hawk

U-2

JSTARS MSAT-Air

P-3/APS-137

F-16F-16

MK-48 Torpedo

Issues with Current HPEC Development Inadequacy of Software Practices & Standards

Today – Embedded Software Is: • Not portable• Not scalable • Difficult to develop• Expensive to maintain

Today – Embedded Software Is: • Not portable• Not scalable • Difficult to develop• Expensive to maintain

System Development/Acquisition Stages4 Years 4 Years 4 Years

Program MilestonesSystem Tech. Development

System Field DemonstrationEngineering/ manufacturing DevelopmentInsertion to Military Asset

Signal Processor Evolution 1st gen. 2nd gen. 3rd gen. 4th gen. 5th gen. 6th gen.

• High Performance Embedded Computing pervasive through DoD applications

– Airborne Radar Insertion program 85% software rewrite for each hardware platform

– Missile common processor Processor board costs < $100k Software development costs > $100M

– Torpedo upgrade Two software re-writes required after changes in hardware design

Page 6: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-6www.hpec-si.org

MITRE AFRLLincoln

Evolution of Software Support Towards“Write Once, Run Anywhere/Anysize”

1990

ApplicationApplication

VendorSoftwareVendor SW

DoD software development

COTS development

• Application software has traditionally been tied to the hardware

• Support “Write Once, Run Anywhere/Anysize”

20052000

Middleware

VendorSoftware

Application

Application

Middleware

Application

Middleware

Application

Middleware

• Many acquisition programs are developing stove-piped middleware “standards”

• Open software standards can provide portability, performance, and productivity benefits

VendorSoftware

Application

Middleware

Embedded SWStandards

Middleware

Page 7: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-7www.hpec-si.org

MITRE AFRLLincoln

Overall Initiative Goals & Impact

Performance (1.5x)

Por

tabi

lity

(3x)

Productivity (3x)

HPECSoftwareInitiative

Demonstrate

Develop

Pro

toty

pe

Object O

riented

Ope

n S

tand

ards

Interoperable & Scalable

Portability: reduction in lines-of-code to

change port/scale to new system

Productivity: reduction in overall lines-of-code

Performance: computation and communication

benchmarks

Program Goals• Develop and integrate software

technologies for embedded parallel systems to address portability, productivity, and performance

• Engage acquisition community to promote technology insertion

• Deliver quantifiable benefits

Program Goals• Develop and integrate software

technologies for embedded parallel systems to address portability, productivity, and performance

• Engage acquisition community to promote technology insertion

• Deliver quantifiable benefits

Page 8: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-8www.hpec-si.org

MITRE AFRLLincoln

HPEC-SI Path to Success

• Reduces software cost & schedule• Enables rapid COTS insertion• Improves cross-program interoperability• Basis for improved capabilities

Benefit to DoD Programs

• Reduces software complexity & risk• Easier comparisons/more competition• Increased functionality

Benefit to DoD Contractors

• Lower software barrier to entry• Reduced software maintenance costs• Evolution of open standards

Benefit to Embedded Vendors

HPEC Software Initiative builds on • Proven technology• Business models• Better software

practices

HPEC Software Initiative builds on • Proven technology• Business models• Better software

practices

Page 9: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-9www.hpec-si.org

MITRE AFRLLincoln

Organization

Demonstration

Dr. Keith Bromley SPAWARDr. Richard Games MITREDr. Jeremy Kepner, MIT/LL Mr. Brian Sroka MITREMr. Ron Williams MITRE ...

Government Lead

Dr. Rich Linderman AFRL

Technical Advisory Board

Dr. Rich Linderman AFRLDr. Richard Games MITREMr. John Grosh OSDMr. Bob Graybill DARPA/ITODr. Keith Bromley SPAWARDr. Mark Richards GTRIDr. Jeremy Kepner MIT/LL

Executive Committee

Dr. Charles Holland PADUSD(S+T)RADM Paul Sullivan N77

Development

Dr. James Lebak MIT/LLDr. Mark Richards GTRIMr. Dan Campbell GTRI Mr. Ken Cain MERCURY Mr. Randy Judd SPAWAR ...

Applied Research

Mr. Bob Bond MIT/LLMr. Ken Flowers MERCURYDr. Spaanenburg PENTUMMr. Dennis Cottel SPAWAR Capt. Bergmann AFRLDr. Tony Skjellum MPISoft ...

Advanced Research

Mr. Bob Graybill DARPA

• Partnership with ODUSD(S&T), Government Labs, FFRDCs, Universities, Contractors, Vendors and DoD programs

• Over 100 participants from over 20 organizations

• Partnership with ODUSD(S&T), Government Labs, FFRDCs, Universities, Contractors, Vendors and DoD programs

• Over 100 participants from over 20 organizations

Page 10: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-10www.hpec-si.org

MITRE AFRLLincoln

• Standards Overview• Future Standards

Outline

• Introduction

• Software Standards

• Parallel VSIPL++

• Future Challenges

• Summary

Page 11: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-11www.hpec-si.org

MITRE AFRLLincoln

P0 P1 P2 P3

Node Controller

Parallel Embedded Processor

System Controller

ConsolesOther

Computers

ControlCommunication:

CORBA, HP-CORBA

Data Communication:MPI, MPI/RT, DRI

Computation:VSIPL

VSIPL++, ||VSIPL++DefinitionsVSIPL = Vector, Signal, and Image

Processing Library||VSIPL++ = Parallel Object Oriented VSIPLMPI = Message-passing interfaceMPI/RT = MPI real-timeDRI = Data Re-org InterfaceCORBA = Common Object Request Broker

ArchitectureHP-CORBA = High Performance CORBA

Emergence of Component Standards

HPEC Initiative - Builds on completed research and existing

standards and libraries

Page 12: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-12www.hpec-si.org

MITRE AFRLLincoln

The Path to Parallel VSIPL++

Demonstrate insertions into fielded systems (e.g., CIP)• Demonstrate 3x portability

High-level code abstraction• Reduce code size 3x

Unified embedded computation/ communication standard•Demonstrate scalability

Demonstration: Existing Standards

Phase 1

Phase 2

Phase 3

Time

Development: Object-Oriented Standards

Applied Research: Unified Comp/Comm Lib

Demonstration: Object-Oriented Standards

Applied Research: Fault tolerance

Demonstration: Unified Comp/Comm Lib

Development: Fault tolerance

Applied Research: Self-optimization

Development: Unified Comp/Comm Lib

Fu

nct

ion

alit

y

VSIPL++

prototypeParallelVSIPL++

prototype

VSIPLMPI

VSIPL++

ParallelVSIPL++

(world’s first parallel object oriented standard)

•First demo successfully completed•VSIPL++ v0.5 spec completed•VSIPL++ v0.1 code available•Parallel VSIPL++ spec in progress•High performance C++ demonstrated

Page 13: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-13www.hpec-si.org

MITRE AFRLLincoln

Working Group Technical Scope

Development

VSIPL++

-MAPPING (task/pipeline parallel)-Reconfiguration (for fault tolerance)-Threads-Reliability/Availability-Data Permutation (DRI functionality)-Tools (profiles, timers, ...)-Quality of Service

-MAPPING (data parallelism)-Early binding (computations)-Compatibility (backward/forward)-Local Knowledge (accessing local data)-Extensibility (adding new functions)-Remote Procedure Calls (CORBA)-C++ Compiler Support-Test Suite-Adoption Incentives (vendor, integrator)

Applied Research

Parallel VSIPL++

Page 14: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-14www.hpec-si.org

MITRE AFRLLincoln

VSIPL (Vector, Signal, and Image Processing Library)

MPI (Message Passing Interface)

VSIPL++ (Object Oriented)v0.1 Specv0.1 Codev0.5 Spec & Codev1.0 Spec & Code

Parallel VSIPL++v0.1 Specv0.1 Codev0.5 Spec & Codev1.0 Spec & Code

Fault Tolerance/Self Optimizing Software

Task Name FY01 FY02 FY03 FY04 FY05 FY06 FY07 FY08

Near Mid Long

Overall Technical Tasks and Schedule

Applied Research

Development

Demonstrate

CIP

CIP

Demo 2

Demo 2

Demo 3 Demo 4

Demo 5 Demo 6

Page 15: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-15www.hpec-si.org

MITRE AFRLLincoln

HPEC-SI Goals1st Demo Achievements

Performance Goal 1.5x

Achieved 2x

Achieved 10x+Goal 3x

Portability

Achieved 6x*Goal 3x

Productivity

HPECSoftwareInitiative

Demonstrate

Develop

Pro

toty

pe

Object O

riented

Ope

n S

tand

ards

Interoperable & Scalable

Portability: zero code changes required

Productivity: DRI code 6x smaller vs MPI (est*)

Performance: 3x reduced cost or form factor

Page 16: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-16www.hpec-si.org

MITRE AFRLLincoln

• Technical Basis• Examples

Outline

• Introduction

• Software Standards

• Parallel VSIPL++

• Future Challenges

• Summary

Page 17: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-17www.hpec-si.org

MITRE AFRLLincoln

ParallelComputer

Parallel Pipeline

BeamformXOUT = w *XIN

DetectXOUT = |XIN|>c

FilterXOUT = FIR(XIN )

Signal Processing Algorithm

Mapping

• Data Parallel within stages• Task/Pipeline Parallel across stages • Data Parallel within stages• Task/Pipeline Parallel across stages

Page 18: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-18www.hpec-si.org

MITRE AFRLLincoln

Types of Parallelism

InputInput

FIRFIlters

FIRFIlters

SchedulerScheduler

Detector 2

Detector 2

Detector 1

Detector 1

Beam-former 2Beam-

former 2 Beam-

former 1 Beam-

former 1

Task Parallel Task Parallel

Pipeline Pipeline

Round RobinRound Robin

Data ParallelData Parallel

Page 19: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-19www.hpec-si.org

MITRE AFRLLincoln• Algorithm and hardware mapping are linked• Resulting code is non-scalable and non-portable

• Algorithm and hardware mapping are linked• Resulting code is non-scalable and non-portable

Current Approach to Parallel Code

Algorithm + Mapping Code

Proc1

Proc1

Proc2

Proc2

Stage 1

Proc3

Proc3

Proc4

Proc4

Stage 2while(!done){ if ( rank()==1 || rank()==2 )

stage1 ();else if ( rank()==3 || rank()==4 )

stage2();}

Proc5

Proc5

Proc6

Proc6

while(!done){ if ( rank()==1 || rank()==2 )

stage1(); else if ( rank()==3 || rank()==4) ||

rank()==5 || rank==6 ) stage2();

}

Page 20: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-20www.hpec-si.org

MITRE AFRLLincoln

Scalable Approach

Single Processor Mapping

Multi Processor Mapping

A = B + C#include <Vector.h>#include <AddPvl.h>

void addVectors(aMap, bMap, cMap) { Vector< Complex<Float> > a(‘a’, aMap, LENGTH); Vector< Complex<Float> > b(‘b’, bMap, LENGTH); Vector< Complex<Float> > c(‘c’, cMap, LENGTH);

b = 1; c = 2; a=b+c;

}

A = B + C

• Single processor and multi-processor code are the same• Maps can be changed without changing software• High level code is compact

• Single processor and multi-processor code are the same• Maps can be changed without changing software• High level code is compact

Lincoln Parallel Vector Library (PVL)

Page 21: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-21www.hpec-si.org

MITRE AFRLLincoln

C++ Expression Templates and PETE

A=B+C*DA=B+C*D

BinaryNode<OpAssign, Vector,BinaryNode<OpAdd, VectorBinaryNode<OpMultiply, Vector,

Vector >>>

ExpressionTemplates

Expression

Expression TypeParse Tree

B+C A

Main

Operator +

Operator =

+

B& C&

1. Pass B and Creferences tooperator +

4. Pass expression treereference to operator

2. Create expressionparse tree

3. Return expressionparse tree

5. Calculate result andperform assignment

copy &

copy

B&,

C&

Parse trees, not vectors, createdParse trees, not vectors, created

• Expression Templates enhance performance by allowing temporary variables to be avoided

• Expression Templates enhance performance by allowing temporary variables to be avoided

Page 22: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-22www.hpec-si.org

MITRE AFRLLincoln

PETE Linux Cluster Experiments

A=B+C A=B+C*D A=B+C*D/E+fft(F)

VSIPLPVL/VSIPLPVL/PETE

VSIPLPVL/VSIPLPVL/PETE

Vector Length Vector Length

0.6

0.8

0.9

1

1.2

0.7

1.1

0.8

0.9

1.1

1

VSIPLPVL/VSIPL

PVL/PETE

0.9

1

1.1

1.2

1.3

32 128512

2048

Vector Length

Re

lati

ve

Ex

ec

uti

on

Tim

e

819232768

1310728

Re

lati

ve

Ex

ec

uti

on

Tim

e

Re

lati

ve

Ex

ec

uti

on

Tim

e

1.2

32 128512

20488192

327681310728 32 128

5122048

819232768

1310728

• PVL with VSIPL has a small overhead• PVL with PETE can surpass VSIPL• PVL with VSIPL has a small overhead• PVL with PETE can surpass VSIPL

Page 23: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-23www.hpec-si.org

MITRE AFRLLincoln

PowerPC AltiVec Experiments

Results• Hand coded loop achieves good

performance, but is problem specific and low level

• Optimized VSIPL performs well for simple expressions, worse for more complex expressions

• PETE style array operators perform almost as well as the hand-coded loop and are general, can be composed, and are high-level

AltiVec loop VSIPL (vendor optimized) PETE with AltiVec

• C• For loop• Direct use of AltiVec extensions• Assumes unit stride• Assumes vector alignment

• C• AltiVec aware VSIPro Core Lite (www.mpi-softtech.com)• No multiply-add• Cannot assume unit stride• Cannot assume vector alignment

• C++• PETE operators• Indirect use of AltiVec extensions• Assumes unit stride• Assumes vector alignment

A=B+C*DA=B+C A=B+C*D+E*F

A=B+C*D+E/F

Software Technology

Page 24: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-24www.hpec-si.org

MITRE AFRLLincoln

• Technical Basis• Examples

Outline

• Introduction

• Software Standards

• Parallel VSIPL++

• Future Challenges

• Summary

Page 25: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-25www.hpec-si.org

MITRE AFRLLincoln

A = sin(A) + 2 * B;

• Generated code (no temporaries)

for (index i = 0; i < A.size(); ++i) A.put(i, sin(A.get(i)) + 2 * B.get(i));

• Apply inlining to transform to

for (index i = 0; i < A.size(); ++i) Ablock[i] = sin(Ablock[i]) + 2 * Bblock[i];

• Apply more inlining to transform to

T* Bp = &(Bblock[0]); T* Aend = &(Ablock[A.size()]);for (T* Ap = &(Ablock[0]); Ap < pend; ++Ap, ++Bp) *Ap = fmadd (2, *Bp, sin(*Ap));

• Or apply PowerPC AltiVec extensions

• Each step can be automatically generated• Optimization level whatever vendor desires• Each step can be automatically generated• Optimization level whatever vendor desires

Page 26: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-26www.hpec-si.org

MITRE AFRLLincoln

BLAS zherk Routine

• BLAS = Basic Linear Algebra Subprograms• Hermitian matrix M: conjug(M) = Mt

• zherk performs a rank-k update of Hermitian matrix C:C A conjug(A)t + C

• VSIPL codeA = vsip_cmcreate_d(10,15,VSIP_ROW,MEM_NONE);C = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);tmp = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);vsip_cmprodh_d(A,A,tmp); /* A*conjug(A)t */vsip_rscmmul_d(alpha,tmp,tmp);/* *A*conjug(A)t */vsip_rscmmul_d(beta,C,C); /* *C */vsip_cmadd_d(tmp,C,C); /* *A*conjug(A)t + *C */vsip_cblockdestroy(vsip_cmdestroy_d(tmp));vsip_cblockdestroy(vsip_cmdestroy_d(C));vsip_cblockdestroy(vsip_cmdestroy_d(A));

• VSIPL++ code (also parallel)Matrix<complex<double> > A(10,15);Matrix<complex<double> > C(10,10);C = alpha * prodh(A,A) + beta * C;

Page 27: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-27www.hpec-si.org

MITRE AFRLLincoln

Simple Filtering Application

int main () { using namespace vsip; const length ROWS = 64; const length COLS = 4096;

vsipl v;

FFT<Matrix, complex<double>, complex<double>, FORWARD, 0, MULTIPLE, alg_hint ()> forward_fft (Domain<2>(ROWS,COLS), 1.0);

FFT<Matrix, complex<double>, complex<double>, INVERSE, 0, MULTIPLE, alg_hint ()> inverse_fft (Domain<2>(ROWS,COLS), 1.0);

const Matrix<complex<double> > weights (load_weights (ROWS, COLS));

try { while (1) output (inverse_fft (forward_fft (input ()) * weights)); } catch (std::runtime_error) { // Successfully caught access outside domain. }}

Page 28: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-28www.hpec-si.org

MITRE AFRLLincoln

Explicit Parallel Filter

#include <vsiplpp.h>using namespace VSIPL;const int ROWS = 64;const int COLS = 4096;int main (int argc, char **argv){ Matrix<Complex<Float>> W (ROWS, COLS, "WMap"); // weights matrix Matrix<Complex<Float>> X (ROWS, COLS, "WMap"); // input matrix load_weights (W) try { while (1) { input (X); // some input function

Y = IFFT ( mul (FFT(X), W)); output (Y); // some output function

} } catch (Exception &e) {cerr << e << endl};}

Page 29: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-29www.hpec-si.org

MITRE AFRLLincoln

Multi-Stage Filter (main)

using namespace vsip;

const length ROWS = 64;const length COLS = 4096;

int main (int argc, char **argv) { sample_low_pass_filter<complex<float> > LPF(); sample_beamform<complex<float> > BF(); sample_matched_filter<complex<float> > MF();

try { while (1) output (MF(BF(LPF(input ())))); } catch (std::runtime_error) { // Successfully caught access outside domain. }}

Page 30: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-30www.hpec-si.org

MITRE AFRLLincoln

Multi-Stage Filter (low pass filter)

template<typename T>class sample_low_pass_filter<T>{public: sample_low_pass_filter() : FIR1_(load_w1 (W1_LENGTH), FIR1_LENGTH), FIR2_(load_w2 (W2_LENGTH), FIR2_LENGTH) { }

Matrix<T> operator () (const Matrix<T>& Input) { Matrix<T> output(ROWS, COLS); for (index row=0; row<ROWS; row++) output.row(row) = FIR2_(FIR1_(Input.row(row)).second).second; return output; }

private: FIR<T, SYMMETRIC_ODD, FIR1_DECIMATION, CONTINUOUS, alg_hint()> FIR1_; FIR<T, SYMMETRIC_ODD, FIR2_DECIMATION, CONTINUOUS, alg_hint()> FIR2_;}

Page 31: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-31www.hpec-si.org

MITRE AFRLLincoln

Multi-Stage Filter (beam former)

template<typename T>class sample_beamform<T>{public: sample_beamform() : W3_(load_w3 (ROWS,COLS)) { } Matrix<T> operator () (const Matrix<T>& Input) const { return W3_ * Input; }private: const Matrix<T> W3_;}

Page 32: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-32www.hpec-si.org

MITRE AFRLLincoln

Multi-Stage Filter (matched filter)

template<typename T>class sample_matched_filter<T>{public: matched_filter() : W4_(load_w4 (ROWS,COLS)), forward_fft_ (Domain<2>(ROWS,COLS), 1.0), inverse_fft_ (Domain<2>(ROWS,COLS), 1.0) {}

Matrix<T> operator () (const Matrix<T>& Input) const { return inverse_fft_ (forward_fft_ (Input) * W4_); }

private: const Matrix<T> W4_; FFT<Matrix<T>, complex<double>, complex<double>, FORWARD, 0, MULTIPLE, alg_hint()> forward_fft_; FFT<Matrix<T>, complex<double>, complex<double>, INVERSE, 0, MULTIPLE, alg_hint()> inverse_fft_;}

Page 33: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-33www.hpec-si.org

MITRE AFRLLincoln

• Fault Tolerance• Self Optimization• High Level Languages

Outline

• Introduction

• Software Standards

• Parallel VSIPL++

• Future Challenges

• Summary

Page 34: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-34www.hpec-si.org

MITRE AFRLLincoln

Dynamic Mapping for Fault Tolerance

Parallel ProcessorSpare

Failure

InputTask

XIN

Map0

Nodes: 0,1

Map1

Nodes: 0,2

OutputTask

Map2

Nodes: 1,3

XOUT

• Switching processors is accomplished by switching maps

• No change to algorithm required

• Switching processors is accomplished by switching maps

• No change to algorithm required

Page 35: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-35www.hpec-si.org

MITRE AFRLLincoln

Dynamic Mapping Performance Results

0

1

10

100

32 128 512 2048

Remap/BaselineRedundant/BaselineRebind/Baseline

Data Size

Rel

ativ

e T

ime

• Good dynamic mapping performance is possible• Good dynamic mapping performance is possible

Page 36: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-36www.hpec-si.org

MITRE AFRLLincoln

Optimal Mapping of Complex Algorithms

Input

XINXIN

Low Pass Filter

XINXIN

W1W1

FIR1FIR1 XOUTXOUT

W2W2

FIR2FIR2

Beamform

XINXIN

W3W3

multmult XOUTXOUT

Matched Filter

XINXIN

W4W4

FFTFFT

IFFTIFFT XOUTXOUT

Workstation

EmbeddedMulti-computer

PowerPCCluster

EmbeddedBoard

IntelCluster

Application

Hardware

Different Optimal Maps

• Need to automate process of mapping algorithm to hardware

• Need to automate process of mapping algorithm to hardware

Page 37: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-37www.hpec-si.org

MITRE AFRLLincoln

Self-optimizing Softwarefor Signal Processing

• Find– Min(latency | #CPU)– Max(throughput | #CPU)

• S3P selects correct optimal mapping

• Excellent agreement between S3P predicted and achieved latencies and throughputs

• Find– Min(latency | #CPU)– Max(throughput | #CPU)

• S3P selects correct optimal mapping

• Excellent agreement between S3P predicted and achieved latencies and throughputs

#CPU

Lat

ency

(se

con

ds)

Large (48x128K)Small (48x4K)

Th

rou

gh

pu

t (

fram

es/s

ec)

4 5 6 7 8

25

20

15

100.25

0.20

0.15

0.10

1.5

1.0

0.5

predictedachieved

predictedachieved

predictedachieved

predictedachieved

5.0

4.0

3.0

2.04 5 6 7 8

Problem Size

#CPU

1-1-1-1

1-1-1-1

1-1-1-1

1-1-1-1

1-1-2-1

1-2-2-1

1-2-2-2 1-3-2-2

1-1-2-1

1-1-2-2

1-2-2-2

1-3-2-2

1-1-1-2

1-1-2-2

1-2-2-2

1-2-2-3

1-1-2-1

1-2-2-1

1-2-2-2

2-2-2-2

Page 38: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-38www.hpec-si.org

MITRE AFRLLincoln

High Level Languages

DoD SensorProcessing

High Performance Matlab Applications

Parallel MatlabToolbox

DoD MissionPlanning

ScientificSimulation

CommercialApplications

UserInterface

HardwareInterface

Parallel Computing Hardware

• Parallel Matlab need has been identified

•HPCMO (OSU)

• Required user interface has been demonstrated

•Matlab*P (MIT/LCS)•PVL (MIT/LL)

• Required hardware interface has been demonstrated

•MatlabMPI (MIT/LL)

• Parallel Matlab Toolbox can now be realized• Parallel Matlab Toolbox can now be realized

Page 39: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-39www.hpec-si.org

MITRE AFRLLincoln

MatlabMPI deployment (speedup)

• Maui– Image filtering benchmark (300x on 304 cpus)

• Lincoln– Signal Processing (7.8x on 8 cpus)– Radar simulations (7.5x on 8 cpus)– Hyperspectral (2.9x on 3 cpus)

• MIT– LCS Beowulf (11x Gflops on 9 duals)– AI Lab face recognition (10x on 8 duals)

• Other– Ohio St. EM Simulations– ARL SAR Image Enhancement– Wash U Hearing Aid Simulations– So. Ill. Benchmarking– JHU Digital Beamforming– ISL Radar simulation– URI Heart modeling

• Rapidly growing MatlabMPI user base demonstrates need for parallel matlab

• Demonstrated scaling to 300 processors

• Rapidly growing MatlabMPI user base demonstrates need for parallel matlab

• Demonstrated scaling to 300 processors

0

1

10

100

1 10 100 1000

MatlabMPILinear

Number of ProcessorsP

erfo

rman

ce (

Gig

aflo

ps)

Image Filtering on IBM SPat Maui Computing Center

Page 40: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-40www.hpec-si.org

MITRE AFRLLincoln

Summary

• HPEC-SI Expected benefit

– Open software libraries, programming models, and standards that provide portability (3x), productivity (3x), and performance (1.5x) benefits to multiple DoD programs

• Invitation to Participate

– DoD Program offices with Signal/Image Processing needs

– Academic and Government Researchers interested in high performance embedded computing

– Contact: [email protected]

Page 41: Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory  High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Slide-41www.hpec-si.org

MITRE AFRLLincoln

The Links

High Performance Embedded Computing Workshophttp://www.ll.mit.edu/HPEC

High Performance Embedded Computing Software Initiativehttp://www.hpec-si.org/

Vector, Signal, and Image Processing Libraryhttp://www.vsipl.org/

MPI Software Technologies, Inc.http://www.mpi-softtech.com/Data Reorganization Initiative

http://www.data-re.org/CodeSourcery, LLC

http://www.codesourcery.com/MatlabMPI

http://www.ll.mit.edu/MatlabMPI