Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software...

Slide-1HPEC-SI

MITRE AFRLMIT Lincoln Laboratory

www.hpec-si.org

High Performance Embedded Computing

Software Initiative (HPEC-SI)

This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Dr. Jeremy KepnerMIT Lincoln Laboratory

Slide-2www.hpec-si.org

MITRE AFRLLincoln

• DoD Need• Program Structure

Outline

• Introduction

• Software Standards

• Parallel VSIPL++

• Future Challenges

• Summary


MITRE AFRLLincoln

Enhanced Tactical Radar Correlator(ETRAC)

Overview - High Performance Embedded Computing (HPEC) Initiative

HPEC Software Initiative

Programs

Demonstration

Development

Applied Research

DARPA

ASARS-2

Shared memory serverEmbedded

multi-processor

Challenge: Transition advanced software technology and practices into major defense acquisition programs

Common Imagery Processor (CIP)


MITRE AFRLLincoln

$0.0

$1.0

$2.0

$3.0

Why Is DoD Concerned with Embedded Software?

Source: “HPEC Market Study” March 2001

Estimated DoD expenditures for embedded signal and image processing hardware and software ($B)

• COTS acquisition practices have shifted the burden from “point design”

hardware to “point design” software

• Software costs for embedded systems could be reduced by one-third

with improved programming models, methodologies, and standards

• COTS acquisition practices have shifted the burden from “point design”

hardware to “point design” software

• Software costs for embedded systems could be reduced by one-third

with improved programming models, methodologies, and standards


MITRE AFRLLincoln

NSSNAEGIS

Rivet Joint

Standard Missile

PredatorGlobal Hawk

U-2

JSTARS MSAT-Air

P-3/APS-137

F-16F-16

MK-48 Torpedo

Issues with Current HPEC Development Inadequacy of Software Practices & Standards

Today – Embedded Software Is: • Not portable• Not scalable • Difficult to develop• Expensive to maintain

Today – Embedded Software Is: • Not portable• Not scalable • Difficult to develop• Expensive to maintain

System Development/Acquisition Stages4 Years 4 Years 4 Years

Program MilestonesSystem Tech. Development

System Field DemonstrationEngineering/ manufacturing DevelopmentInsertion to Military Asset

Signal Processor Evolution 1st gen. 2nd gen. 3rd gen. 4th gen. 5th gen. 6th gen.

• High Performance Embedded Computing pervasive through DoD applications

– Airborne Radar Insertion program 85% software rewrite for each hardware platform

– Missile common processor Processor board costs < $100k Software development costs > $100M

– Torpedo upgrade Two software re-writes required after changes in hardware design


MITRE AFRLLincoln

Evolution of Software Support Towards“Write Once, Run Anywhere/Anysize”

1990

ApplicationApplication

VendorSoftwareVendor SW

DoD software development

COTS development

• Application software has traditionally been tied to the hardware

• Support “Write Once, Run Anywhere/Anysize”

20052000

Middleware

VendorSoftware

Application

Application

Middleware

Application

Middleware

Application

Middleware

• Many acquisition programs are developing stove-piped middleware “standards”

• Open software standards can provide portability, performance, and productivity benefits

VendorSoftware

Application

Middleware

Embedded SWStandards

Middleware


MITRE AFRLLincoln

Overall Initiative Goals & Impact

Performance (1.5x)

Por

tabi

lity

(3x)

Productivity (3x)

HPECSoftwareInitiative

Demonstrate

Develop

Pro

toty

pe

Object O

riented

Ope

n S

tand

ards

Interoperable & Scalable

Portability: reduction in lines-of-code to

change port/scale to new system

Productivity: reduction in overall lines-of-code

Performance: computation and communication

benchmarks

Program Goals• Develop and integrate software

technologies for embedded parallel systems to address portability, productivity, and performance

• Engage acquisition community to promote technology insertion

• Deliver quantifiable benefits

Program Goals• Develop and integrate software

technologies for embedded parallel systems to address portability, productivity, and performance

• Engage acquisition community to promote technology insertion

• Deliver quantifiable benefits


MITRE AFRLLincoln

HPEC-SI Path to Success

• Reduces software cost & schedule• Enables rapid COTS insertion• Improves cross-program interoperability• Basis for improved capabilities

Benefit to DoD Programs

• Reduces software complexity & risk• Easier comparisons/more competition• Increased functionality

Benefit to DoD Contractors

• Lower software barrier to entry• Reduced software maintenance costs• Evolution of open standards

Benefit to Embedded Vendors

HPEC Software Initiative builds on • Proven technology• Business models• Better software

practices

HPEC Software Initiative builds on • Proven technology• Business models• Better software

practices


MITRE AFRLLincoln

Organization

Demonstration

Dr. Keith Bromley SPAWARDr. Richard Games MITREDr. Jeremy Kepner, MIT/LL Mr. Brian Sroka MITREMr. Ron Williams MITRE ...

Government Lead

Dr. Rich Linderman AFRL

Technical Advisory Board

Dr. Rich Linderman AFRLDr. Richard Games MITREMr. John Grosh OSDMr. Bob Graybill DARPA/ITODr. Keith Bromley SPAWARDr. Mark Richards GTRIDr. Jeremy Kepner MIT/LL

Executive Committee

Dr. Charles Holland PADUSD(S+T)RADM Paul Sullivan N77

Development

Dr. James Lebak MIT/LLDr. Mark Richards GTRIMr. Dan Campbell GTRI Mr. Ken Cain MERCURY Mr. Randy Judd SPAWAR ...

Applied Research

Mr. Bob Bond MIT/LLMr. Ken Flowers MERCURYDr. Spaanenburg PENTUMMr. Dennis Cottel SPAWAR Capt. Bergmann AFRLDr. Tony Skjellum MPISoft ...

Advanced Research

Mr. Bob Graybill DARPA

• Partnership with ODUSD(S&T), Government Labs, FFRDCs, Universities, Contractors, Vendors and DoD programs

• Over 100 participants from over 20 organizations

• Partnership with ODUSD(S&T), Government Labs, FFRDCs, Universities, Contractors, Vendors and DoD programs

• Over 100 participants from over 20 organizations


MITRE AFRLLincoln

• Standards Overview• Future Standards

Outline

• Introduction




• Summary


MITRE AFRLLincoln

P0 P1 P2 P3

Node Controller

Parallel Embedded Processor

System Controller

ConsolesOther

Computers

ControlCommunication:

CORBA, HP-CORBA

Data Communication:MPI, MPI/RT, DRI

Computation:VSIPL

VSIPL++, ||VSIPL++DefinitionsVSIPL = Vector, Signal, and Image

Processing Library||VSIPL++ = Parallel Object Oriented VSIPLMPI = Message-passing interfaceMPI/RT = MPI real-timeDRI = Data Re-org InterfaceCORBA = Common Object Request Broker

ArchitectureHP-CORBA = High Performance CORBA

Emergence of Component Standards

HPEC Initiative - Builds on completed research and existing

standards and libraries


MITRE AFRLLincoln

The Path to Parallel VSIPL++

Demonstrate insertions into fielded systems (e.g., CIP)• Demonstrate 3x portability

High-level code abstraction• Reduce code size 3x

Unified embedded computation/ communication standard•Demonstrate scalability

Demonstration: Existing Standards

Phase 1

Phase 2

Phase 3

Time

Development: Object-Oriented Standards

Applied Research: Unified Comp/Comm Lib

Demonstration: Object-Oriented Standards

Applied Research: Fault tolerance

Demonstration: Unified Comp/Comm Lib

Development: Fault tolerance

Applied Research: Self-optimization

Development: Unified Comp/Comm Lib

Fu

nct

ion

alit

y

VSIPL++

prototypeParallelVSIPL++

prototype

VSIPLMPI

VSIPL++

ParallelVSIPL++

(world’s first parallel object oriented standard)

•First demo successfully completed•VSIPL++ v0.5 spec completed•VSIPL++ v0.1 code available•Parallel VSIPL++ spec in progress•High performance C++ demonstrated


MITRE AFRLLincoln

Working Group Technical Scope

Development

VSIPL++

-MAPPING (task/pipeline parallel)-Reconfiguration (for fault tolerance)-Threads-Reliability/Availability-Data Permutation (DRI functionality)-Tools (profiles, timers, ...)-Quality of Service

-MAPPING (data parallelism)-Early binding (computations)-Compatibility (backward/forward)-Local Knowledge (accessing local data)-Extensibility (adding new functions)-Remote Procedure Calls (CORBA)-C++ Compiler Support-Test Suite-Adoption Incentives (vendor, integrator)

Applied Research

Parallel VSIPL++


MITRE AFRLLincoln

VSIPL (Vector, Signal, and Image Processing Library)

MPI (Message Passing Interface)

VSIPL++ (Object Oriented)v0.1 Specv0.1 Codev0.5 Spec & Codev1.0 Spec & Code

Parallel VSIPL++v0.1 Specv0.1 Codev0.5 Spec & Codev1.0 Spec & Code

Fault Tolerance/Self Optimizing Software

Task Name FY01 FY02 FY03 FY04 FY05 FY06 FY07 FY08

Near Mid Long

Overall Technical Tasks and Schedule

Applied Research

Development

Demonstrate

CIP

CIP

Demo 2

Demo 2

Demo 3 Demo 4

Demo 5 Demo 6


MITRE AFRLLincoln

HPEC-SI Goals1st Demo Achievements

Performance Goal 1.5x

Achieved 2x

Achieved 10x+Goal 3x

Portability

Achieved 6x*Goal 3x

Productivity

HPECSoftwareInitiative

Demonstrate

Develop

Pro

toty

pe

Object O

riented

Ope

n S

tand

ards

Interoperable & Scalable

Portability: zero code changes required

Productivity: DRI code 6x smaller vs MPI (est*)

Performance: 3x reduced cost or form factor


MITRE AFRLLincoln

• Technical Basis• Examples

Outline

• Introduction




• Summary


MITRE AFRLLincoln

ParallelComputer

Parallel Pipeline

BeamformXOUT = w *XIN

DetectXOUT = |XIN|>c

FilterXOUT = FIR(XIN )

Signal Processing Algorithm

Mapping

• Data Parallel within stages• Task/Pipeline Parallel across stages • Data Parallel within stages• Task/Pipeline Parallel across stages


MITRE AFRLLincoln

Types of Parallelism

InputInput

FIRFIlters

FIRFIlters

SchedulerScheduler

Detector 2

Detector 2

Detector 1

Detector 1

Beam-former 2Beam-

former 2 Beam-

former 1 Beam-

former 1

Task Parallel Task Parallel

Pipeline Pipeline

Round RobinRound Robin

Data ParallelData Parallel


MITRE AFRLLincoln• Algorithm and hardware mapping are linked• Resulting code is non-scalable and non-portable

• Algorithm and hardware mapping are linked• Resulting code is non-scalable and non-portable

Current Approach to Parallel Code

Algorithm + Mapping Code

Proc1

Proc1

Proc2

Proc2

Stage 1

Proc3

Proc3

Proc4

Proc4

Stage 2while(!done){ if ( rank()==1 || rank()==2 )

stage1 ();else if ( rank()==3 || rank()==4 )

stage2();}

Proc5

Proc5

Proc6

Proc6

while(!done){ if ( rank()==1 || rank()==2 )

stage1(); else if ( rank()==3 || rank()==4) ||

rank()==5 || rank==6 ) stage2();

}


MITRE AFRLLincoln

Scalable Approach

Single Processor Mapping

Multi Processor Mapping

A = B + C#include <Vector.h>#include <AddPvl.h>

void addVectors(aMap, bMap, cMap) { Vector< Complex<Float> > a(‘a’, aMap, LENGTH); Vector< Complex<Float> > b(‘b’, bMap, LENGTH); Vector< Complex<Float> > c(‘c’, cMap, LENGTH);

b = 1; c = 2; a=b+c;

}

A = B + C

• Single processor and multi-processor code are the same• Maps can be changed without changing software• High level code is compact

• Single processor and multi-processor code are the same• Maps can be changed without changing software• High level code is compact

Lincoln Parallel Vector Library (PVL)


MITRE AFRLLincoln

C++ Expression Templates and PETE

A=B+C*DA=B+C*D

BinaryNode<OpAssign, Vector,BinaryNode<OpAdd, VectorBinaryNode<OpMultiply, Vector,

Vector >>>

ExpressionTemplates

Expression

Expression TypeParse Tree

B+C A

Main

Operator +

Operator =

+

B& C&

1. Pass B and Creferences tooperator +

4. Pass expression treereference to operator

2. Create expressionparse tree

3. Return expressionparse tree

5. Calculate result andperform assignment

copy &

copy

B&,

C&

Parse trees, not vectors, createdParse trees, not vectors, created

• Expression Templates enhance performance by allowing temporary variables to be avoided

• Expression Templates enhance performance by allowing temporary variables to be avoided


MITRE AFRLLincoln

PETE Linux Cluster Experiments

A=B+C A=B+C*D A=B+C*D/E+fft(F)

VSIPLPVL/VSIPLPVL/PETE

VSIPLPVL/VSIPLPVL/PETE

Vector Length Vector Length

0.6

0.8

0.9

1

1.2

0.7

1.1

0.8

0.9

1.1

1

VSIPLPVL/VSIPL

PVL/PETE

0.9

1

1.1

1.2

1.3

32 128512

2048

Vector Length

Re

lati

ve

Ex

ec

uti

on

Tim

e

819232768

1310728

Re

lati

ve

Ex

ec

uti

on

Tim

e

Re

lati

ve

Ex

ec

uti

on

Tim

e

1.2

32 128512

20488192

327681310728 32 128

5122048

819232768

1310728

• PVL with VSIPL has a small overhead• PVL with PETE can surpass VSIPL• PVL with VSIPL has a small overhead• PVL with PETE can surpass VSIPL


MITRE AFRLLincoln

PowerPC AltiVec Experiments

Results• Hand coded loop achieves good

performance, but is problem specific and low level

• Optimized VSIPL performs well for simple expressions, worse for more complex expressions

• PETE style array operators perform almost as well as the hand-coded loop and are general, can be composed, and are high-level

AltiVec loop VSIPL (vendor optimized) PETE with AltiVec

• C• For loop• Direct use of AltiVec extensions• Assumes unit stride• Assumes vector alignment

• C• AltiVec aware VSIPro Core Lite (www.mpi-softtech.com)• No multiply-add• Cannot assume unit stride• Cannot assume vector alignment

• C++• PETE operators• Indirect use of AltiVec extensions• Assumes unit stride• Assumes vector alignment

A=B+C*DA=B+C A=B+C*D+E*F

A=B+C*D+E/F

Software Technology


MITRE AFRLLincoln

• Technical Basis• Examples

Outline

• Introduction




• Summary


MITRE AFRLLincoln

A = sin(A) + 2 * B;

• Generated code (no temporaries)

for (index i = 0; i < A.size(); ++i) A.put(i, sin(A.get(i)) + 2 * B.get(i));

• Apply inlining to transform to

for (index i = 0; i < A.size(); ++i) Ablock[i] = sin(Ablock[i]) + 2 * Bblock[i];

• Apply more inlining to transform to

T* Bp = &(Bblock[0]); T* Aend = &(Ablock[A.size()]);for (T* Ap = &(Ablock[0]); Ap < pend; ++Ap, ++Bp) *Ap = fmadd (2, *Bp, sin(*Ap));

• Or apply PowerPC AltiVec extensions

• Each step can be automatically generated• Optimization level whatever vendor desires• Each step can be automatically generated• Optimization level whatever vendor desires


MITRE AFRLLincoln

BLAS zherk Routine

• BLAS = Basic Linear Algebra Subprograms• Hermitian matrix M: conjug(M) = Mt

• zherk performs a rank-k update of Hermitian matrix C:C A conjug(A)t + C

• VSIPL codeA = vsip_cmcreate_d(10,15,VSIP_ROW,MEM_NONE);C = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);tmp = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);vsip_cmprodh_d(A,A,tmp); /* A*conjug(A)t */vsip_rscmmul_d(alpha,tmp,tmp);/* *A*conjug(A)t */vsip_rscmmul_d(beta,C,C); /* *C */vsip_cmadd_d(tmp,C,C); /* *A*conjug(A)t + *C */vsip_cblockdestroy(vsip_cmdestroy_d(tmp));vsip_cblockdestroy(vsip_cmdestroy_d(C));vsip_cblockdestroy(vsip_cmdestroy_d(A));

• VSIPL++ code (also parallel)Matrix<complex<double> > A(10,15);Matrix<complex<double> > C(10,10);C = alpha * prodh(A,A) + beta * C;


MITRE AFRLLincoln

Simple Filtering Application

int main () { using namespace vsip; const length ROWS = 64; const length COLS = 4096;

vsipl v;

FFT<Matrix, complex<double>, complex<double>, FORWARD, 0, MULTIPLE, alg_hint ()> forward_fft (Domain<2>(ROWS,COLS), 1.0);

FFT<Matrix, complex<double>, complex<double>, INVERSE, 0, MULTIPLE, alg_hint ()> inverse_fft (Domain<2>(ROWS,COLS), 1.0);

const Matrix<complex<double> > weights (load_weights (ROWS, COLS));

try { while (1) output (inverse_fft (forward_fft (input ()) * weights)); } catch (std::runtime_error) { // Successfully caught access outside domain. }}


MITRE AFRLLincoln

Explicit Parallel Filter

#include <vsiplpp.h>using namespace VSIPL;const int ROWS = 64;const int COLS = 4096;int main (int argc, char **argv){ Matrix<Complex<Float>> W (ROWS, COLS, "WMap"); // weights matrix Matrix<Complex<Float>> X (ROWS, COLS, "WMap"); // input matrix load_weights (W) try { while (1) { input (X); // some input function

Y = IFFT ( mul (FFT(X), W)); output (Y); // some output function

} } catch (Exception &e) {cerr << e << endl};}


MITRE AFRLLincoln

Multi-Stage Filter (main)

using namespace vsip;

const length ROWS = 64;const length COLS = 4096;

int main (int argc, char **argv) { sample_low_pass_filter<complex<float> > LPF(); sample_beamform<complex<float> > BF(); sample_matched_filter<complex<float> > MF();

try { while (1) output (MF(BF(LPF(input ())))); } catch (std::runtime_error) { // Successfully caught access outside domain. }}


MITRE AFRLLincoln

Multi-Stage Filter (low pass filter)

template<typename T>class sample_low_pass_filter<T>{public: sample_low_pass_filter() : FIR1_(load_w1 (W1_LENGTH), FIR1_LENGTH), FIR2_(load_w2 (W2_LENGTH), FIR2_LENGTH) { }

Matrix<T> operator () (const Matrix<T>& Input) { Matrix<T> output(ROWS, COLS); for (index row=0; row<ROWS; row++) output.row(row) = FIR2_(FIR1_(Input.row(row)).second).second; return output; }

private: FIR<T, SYMMETRIC_ODD, FIR1_DECIMATION, CONTINUOUS, alg_hint()> FIR1_; FIR<T, SYMMETRIC_ODD, FIR2_DECIMATION, CONTINUOUS, alg_hint()> FIR2_;}


MITRE AFRLLincoln

Multi-Stage Filter (beam former)

template<typename T>class sample_beamform<T>{public: sample_beamform() : W3_(load_w3 (ROWS,COLS)) { } Matrix<T> operator () (const Matrix<T>& Input) const { return W3_ * Input; }private: const Matrix<T> W3_;}


MITRE AFRLLincoln

Multi-Stage Filter (matched filter)

template<typename T>class sample_matched_filter<T>{public: matched_filter() : W4_(load_w4 (ROWS,COLS)), forward_fft_ (Domain<2>(ROWS,COLS), 1.0), inverse_fft_ (Domain<2>(ROWS,COLS), 1.0) {}

Matrix<T> operator () (const Matrix<T>& Input) const { return inverse_fft_ (forward_fft_ (Input) * W4_); }

private: const Matrix<T> W4_; FFT<Matrix<T>, complex<double>, complex<double>, FORWARD, 0, MULTIPLE, alg_hint()> forward_fft_; FFT<Matrix<T>, complex<double>, complex<double>, INVERSE, 0, MULTIPLE, alg_hint()> inverse_fft_;}


MITRE AFRLLincoln

• Fault Tolerance• Self Optimization• High Level Languages

Outline

• Introduction




• Summary


MITRE AFRLLincoln

Dynamic Mapping for Fault Tolerance

Parallel ProcessorSpare

Failure

InputTask

XIN

Map0

Nodes: 0,1

Map1

Nodes: 0,2

OutputTask

Map2

Nodes: 1,3

XOUT

• Switching processors is accomplished by switching maps

• No change to algorithm required

• Switching processors is accomplished by switching maps

• No change to algorithm required


MITRE AFRLLincoln

Dynamic Mapping Performance Results

0

1

10

100

32 128 512 2048

Remap/BaselineRedundant/BaselineRebind/Baseline

Data Size

Rel

ativ

e T

ime

• Good dynamic mapping performance is possible• Good dynamic mapping performance is possible


MITRE AFRLLincoln

Optimal Mapping of Complex Algorithms

Input

XINXIN

Low Pass Filter

XINXIN

W1W1

FIR1FIR1 XOUTXOUT

W2W2

FIR2FIR2

Beamform

XINXIN

W3W3

multmult XOUTXOUT

Matched Filter

XINXIN

W4W4

FFTFFT

IFFTIFFT XOUTXOUT

Workstation

EmbeddedMulti-computer

PowerPCCluster

EmbeddedBoard

IntelCluster

Application

Hardware

Different Optimal Maps

• Need to automate process of mapping algorithm to hardware

• Need to automate process of mapping algorithm to hardware


MITRE AFRLLincoln

Self-optimizing Softwarefor Signal Processing

• Find– Min(latency | #CPU)– Max(throughput | #CPU)

• S3P selects correct optimal mapping

• Excellent agreement between S3P predicted and achieved latencies and throughputs

• Find– Min(latency | #CPU)– Max(throughput | #CPU)

• S3P selects correct optimal mapping

• Excellent agreement between S3P predicted and achieved latencies and throughputs

#CPU

Lat

ency

(se

con

ds)

Large (48x128K)Small (48x4K)

Th

rou

gh

pu

t (

fram

es/s

ec)

4 5 6 7 8

25

20

15

100.25

0.20

0.15

0.10

1.5

1.0

0.5

predictedachieved

predictedachieved

predictedachieved

predictedachieved

5.0

4.0

3.0

2.04 5 6 7 8

Problem Size

#CPU

1-1-1-1

1-1-1-1

1-1-1-1

1-1-1-1

1-1-2-1

1-2-2-1

1-2-2-2 1-3-2-2

1-1-2-1

1-1-2-2

1-2-2-2

1-3-2-2

1-1-1-2

1-1-2-2

1-2-2-2

1-2-2-3

1-1-2-1

1-2-2-1

1-2-2-2

2-2-2-2


MITRE AFRLLincoln

High Level Languages

DoD SensorProcessing

High Performance Matlab Applications

Parallel MatlabToolbox

DoD MissionPlanning

ScientificSimulation

CommercialApplications

UserInterface

HardwareInterface

Parallel Computing Hardware

• Parallel Matlab need has been identified

•HPCMO (OSU)

• Required user interface has been demonstrated

•Matlab*P (MIT/LCS)•PVL (MIT/LL)

• Required hardware interface has been demonstrated

•MatlabMPI (MIT/LL)

• Parallel Matlab Toolbox can now be realized• Parallel Matlab Toolbox can now be realized


MITRE AFRLLincoln

MatlabMPI deployment (speedup)

• Maui– Image filtering benchmark (300x on 304 cpus)

• Lincoln– Signal Processing (7.8x on 8 cpus)– Radar simulations (7.5x on 8 cpus)– Hyperspectral (2.9x on 3 cpus)

• MIT– LCS Beowulf (11x Gflops on 9 duals)– AI Lab face recognition (10x on 8 duals)

• Other– Ohio St. EM Simulations– ARL SAR Image Enhancement– Wash U Hearing Aid Simulations– So. Ill. Benchmarking– JHU Digital Beamforming– ISL Radar simulation– URI Heart modeling

• Rapidly growing MatlabMPI user base demonstrates need for parallel matlab

• Demonstrated scaling to 300 processors

• Rapidly growing MatlabMPI user base demonstrates need for parallel matlab

• Demonstrated scaling to 300 processors

0

1

10

100

1 10 100 1000

MatlabMPILinear

Number of ProcessorsP

erfo

rman

ce (

Gig

aflo

ps)

Image Filtering on IBM SPat Maui Computing Center


MITRE AFRLLincoln

Summary

• HPEC-SI Expected benefit

– Open software libraries, programming models, and standards that provide portability (3x), productivity (3x), and performance (1.5x) benefits to multiple DoD programs

• Invitation to Participate

– DoD Program offices with Signal/Image Processing needs

– Academic and Government Researchers interested in high performance embedded computing

– Contact: [email protected]


MITRE AFRLLincoln

The Links

High Performance Embedded Computing Workshophttp://www.ll.mit.edu/HPEC

High Performance Embedded Computing Software Initiativehttp://www.hpec-si.org/

Vector, Signal, and Image Processing Libraryhttp://www.vsipl.org/

MPI Software Technologies, Inc.http://www.mpi-softtech.com/Data Reorganization Initiative

http://www.data-re.org/CodeSourcery, LLC

http://www.codesourcery.com/MatlabMPI

http://www.ll.mit.edu/MatlabMPI

Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software...

Documents

Transcript of Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software...