Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software...
-
Upload
rolf-merritt -
Category
Documents
-
view
213 -
download
0
Transcript of Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software...
Slide-1HPEC-SI
MITRE AFRLMIT Lincoln Laboratory
www.hpec-si.org
High Performance Embedded Computing
Software Initiative (HPEC-SI)
This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
Dr. Jeremy KepnerMIT Lincoln Laboratory
Slide-2www.hpec-si.org
MITRE AFRLLincoln
• DoD Need• Program Structure
Outline
• Introduction
• Software Standards
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-3www.hpec-si.org
MITRE AFRLLincoln
Enhanced Tactical Radar Correlator(ETRAC)
Overview - High Performance Embedded Computing (HPEC) Initiative
HPEC Software Initiative
Programs
Demonstration
Development
Applied Research
DARPA
ASARS-2
Shared memory serverEmbedded
multi-processor
Challenge: Transition advanced software technology and practices into major defense acquisition programs
Common Imagery Processor (CIP)
Slide-4www.hpec-si.org
MITRE AFRLLincoln
$0.0
$1.0
$2.0
$3.0
Why Is DoD Concerned with Embedded Software?
Source: “HPEC Market Study” March 2001
Estimated DoD expenditures for embedded signal and image processing hardware and software ($B)
• COTS acquisition practices have shifted the burden from “point design”
hardware to “point design” software
• Software costs for embedded systems could be reduced by one-third
with improved programming models, methodologies, and standards
• COTS acquisition practices have shifted the burden from “point design”
hardware to “point design” software
• Software costs for embedded systems could be reduced by one-third
with improved programming models, methodologies, and standards
Slide-5www.hpec-si.org
MITRE AFRLLincoln
NSSNAEGIS
Rivet Joint
Standard Missile
PredatorGlobal Hawk
U-2
JSTARS MSAT-Air
P-3/APS-137
F-16F-16
MK-48 Torpedo
Issues with Current HPEC Development Inadequacy of Software Practices & Standards
Today – Embedded Software Is: • Not portable• Not scalable • Difficult to develop• Expensive to maintain
Today – Embedded Software Is: • Not portable• Not scalable • Difficult to develop• Expensive to maintain
System Development/Acquisition Stages4 Years 4 Years 4 Years
Program MilestonesSystem Tech. Development
System Field DemonstrationEngineering/ manufacturing DevelopmentInsertion to Military Asset
Signal Processor Evolution 1st gen. 2nd gen. 3rd gen. 4th gen. 5th gen. 6th gen.
• High Performance Embedded Computing pervasive through DoD applications
– Airborne Radar Insertion program 85% software rewrite for each hardware platform
– Missile common processor Processor board costs < $100k Software development costs > $100M
– Torpedo upgrade Two software re-writes required after changes in hardware design
Slide-6www.hpec-si.org
MITRE AFRLLincoln
Evolution of Software Support Towards“Write Once, Run Anywhere/Anysize”
1990
ApplicationApplication
VendorSoftwareVendor SW
DoD software development
COTS development
• Application software has traditionally been tied to the hardware
• Support “Write Once, Run Anywhere/Anysize”
20052000
Middleware
VendorSoftware
Application
Application
Middleware
Application
Middleware
Application
Middleware
• Many acquisition programs are developing stove-piped middleware “standards”
• Open software standards can provide portability, performance, and productivity benefits
VendorSoftware
Application
Middleware
Embedded SWStandards
Middleware
Slide-7www.hpec-si.org
MITRE AFRLLincoln
Overall Initiative Goals & Impact
Performance (1.5x)
Por
tabi
lity
(3x)
Productivity (3x)
HPECSoftwareInitiative
Demonstrate
Develop
Pro
toty
pe
Object O
riented
Ope
n S
tand
ards
Interoperable & Scalable
Portability: reduction in lines-of-code to
change port/scale to new system
Productivity: reduction in overall lines-of-code
Performance: computation and communication
benchmarks
Program Goals• Develop and integrate software
technologies for embedded parallel systems to address portability, productivity, and performance
• Engage acquisition community to promote technology insertion
• Deliver quantifiable benefits
Program Goals• Develop and integrate software
technologies for embedded parallel systems to address portability, productivity, and performance
• Engage acquisition community to promote technology insertion
• Deliver quantifiable benefits
Slide-8www.hpec-si.org
MITRE AFRLLincoln
HPEC-SI Path to Success
• Reduces software cost & schedule• Enables rapid COTS insertion• Improves cross-program interoperability• Basis for improved capabilities
Benefit to DoD Programs
• Reduces software complexity & risk• Easier comparisons/more competition• Increased functionality
Benefit to DoD Contractors
• Lower software barrier to entry• Reduced software maintenance costs• Evolution of open standards
Benefit to Embedded Vendors
HPEC Software Initiative builds on • Proven technology• Business models• Better software
practices
HPEC Software Initiative builds on • Proven technology• Business models• Better software
practices
Slide-9www.hpec-si.org
MITRE AFRLLincoln
Organization
Demonstration
Dr. Keith Bromley SPAWARDr. Richard Games MITREDr. Jeremy Kepner, MIT/LL Mr. Brian Sroka MITREMr. Ron Williams MITRE ...
Government Lead
Dr. Rich Linderman AFRL
Technical Advisory Board
Dr. Rich Linderman AFRLDr. Richard Games MITREMr. John Grosh OSDMr. Bob Graybill DARPA/ITODr. Keith Bromley SPAWARDr. Mark Richards GTRIDr. Jeremy Kepner MIT/LL
Executive Committee
Dr. Charles Holland PADUSD(S+T)RADM Paul Sullivan N77
Development
Dr. James Lebak MIT/LLDr. Mark Richards GTRIMr. Dan Campbell GTRI Mr. Ken Cain MERCURY Mr. Randy Judd SPAWAR ...
Applied Research
Mr. Bob Bond MIT/LLMr. Ken Flowers MERCURYDr. Spaanenburg PENTUMMr. Dennis Cottel SPAWAR Capt. Bergmann AFRLDr. Tony Skjellum MPISoft ...
Advanced Research
Mr. Bob Graybill DARPA
• Partnership with ODUSD(S&T), Government Labs, FFRDCs, Universities, Contractors, Vendors and DoD programs
• Over 100 participants from over 20 organizations
• Partnership with ODUSD(S&T), Government Labs, FFRDCs, Universities, Contractors, Vendors and DoD programs
• Over 100 participants from over 20 organizations
Slide-10www.hpec-si.org
MITRE AFRLLincoln
• Standards Overview• Future Standards
Outline
• Introduction
• Software Standards
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-11www.hpec-si.org
MITRE AFRLLincoln
P0 P1 P2 P3
Node Controller
Parallel Embedded Processor
System Controller
ConsolesOther
Computers
ControlCommunication:
CORBA, HP-CORBA
Data Communication:MPI, MPI/RT, DRI
Computation:VSIPL
VSIPL++, ||VSIPL++DefinitionsVSIPL = Vector, Signal, and Image
Processing Library||VSIPL++ = Parallel Object Oriented VSIPLMPI = Message-passing interfaceMPI/RT = MPI real-timeDRI = Data Re-org InterfaceCORBA = Common Object Request Broker
ArchitectureHP-CORBA = High Performance CORBA
Emergence of Component Standards
HPEC Initiative - Builds on completed research and existing
standards and libraries
Slide-12www.hpec-si.org
MITRE AFRLLincoln
The Path to Parallel VSIPL++
Demonstrate insertions into fielded systems (e.g., CIP)• Demonstrate 3x portability
High-level code abstraction• Reduce code size 3x
Unified embedded computation/ communication standard•Demonstrate scalability
Demonstration: Existing Standards
Phase 1
Phase 2
Phase 3
Time
Development: Object-Oriented Standards
Applied Research: Unified Comp/Comm Lib
Demonstration: Object-Oriented Standards
Applied Research: Fault tolerance
Demonstration: Unified Comp/Comm Lib
Development: Fault tolerance
Applied Research: Self-optimization
Development: Unified Comp/Comm Lib
Fu
nct
ion
alit
y
VSIPL++
prototypeParallelVSIPL++
prototype
VSIPLMPI
VSIPL++
ParallelVSIPL++
(world’s first parallel object oriented standard)
•First demo successfully completed•VSIPL++ v0.5 spec completed•VSIPL++ v0.1 code available•Parallel VSIPL++ spec in progress•High performance C++ demonstrated
Slide-13www.hpec-si.org
MITRE AFRLLincoln
Working Group Technical Scope
Development
VSIPL++
-MAPPING (task/pipeline parallel)-Reconfiguration (for fault tolerance)-Threads-Reliability/Availability-Data Permutation (DRI functionality)-Tools (profiles, timers, ...)-Quality of Service
-MAPPING (data parallelism)-Early binding (computations)-Compatibility (backward/forward)-Local Knowledge (accessing local data)-Extensibility (adding new functions)-Remote Procedure Calls (CORBA)-C++ Compiler Support-Test Suite-Adoption Incentives (vendor, integrator)
Applied Research
Parallel VSIPL++
Slide-14www.hpec-si.org
MITRE AFRLLincoln
VSIPL (Vector, Signal, and Image Processing Library)
MPI (Message Passing Interface)
VSIPL++ (Object Oriented)v0.1 Specv0.1 Codev0.5 Spec & Codev1.0 Spec & Code
Parallel VSIPL++v0.1 Specv0.1 Codev0.5 Spec & Codev1.0 Spec & Code
Fault Tolerance/Self Optimizing Software
Task Name FY01 FY02 FY03 FY04 FY05 FY06 FY07 FY08
Near Mid Long
Overall Technical Tasks and Schedule
Applied Research
Development
Demonstrate
CIP
CIP
Demo 2
Demo 2
Demo 3 Demo 4
Demo 5 Demo 6
Slide-15www.hpec-si.org
MITRE AFRLLincoln
HPEC-SI Goals1st Demo Achievements
Performance Goal 1.5x
Achieved 2x
Achieved 10x+Goal 3x
Portability
Achieved 6x*Goal 3x
Productivity
HPECSoftwareInitiative
Demonstrate
Develop
Pro
toty
pe
Object O
riented
Ope
n S
tand
ards
Interoperable & Scalable
Portability: zero code changes required
Productivity: DRI code 6x smaller vs MPI (est*)
Performance: 3x reduced cost or form factor
Slide-16www.hpec-si.org
MITRE AFRLLincoln
• Technical Basis• Examples
Outline
• Introduction
• Software Standards
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-17www.hpec-si.org
MITRE AFRLLincoln
ParallelComputer
Parallel Pipeline
BeamformXOUT = w *XIN
DetectXOUT = |XIN|>c
FilterXOUT = FIR(XIN )
Signal Processing Algorithm
Mapping
• Data Parallel within stages• Task/Pipeline Parallel across stages • Data Parallel within stages• Task/Pipeline Parallel across stages
Slide-18www.hpec-si.org
MITRE AFRLLincoln
Types of Parallelism
InputInput
FIRFIlters
FIRFIlters
SchedulerScheduler
Detector 2
Detector 2
Detector 1
Detector 1
Beam-former 2Beam-
former 2 Beam-
former 1 Beam-
former 1
Task Parallel Task Parallel
Pipeline Pipeline
Round RobinRound Robin
Data ParallelData Parallel
Slide-19www.hpec-si.org
MITRE AFRLLincoln• Algorithm and hardware mapping are linked• Resulting code is non-scalable and non-portable
• Algorithm and hardware mapping are linked• Resulting code is non-scalable and non-portable
Current Approach to Parallel Code
Algorithm + Mapping Code
Proc1
Proc1
Proc2
Proc2
Stage 1
Proc3
Proc3
Proc4
Proc4
Stage 2while(!done){ if ( rank()==1 || rank()==2 )
stage1 ();else if ( rank()==3 || rank()==4 )
stage2();}
Proc5
Proc5
Proc6
Proc6
while(!done){ if ( rank()==1 || rank()==2 )
stage1(); else if ( rank()==3 || rank()==4) ||
rank()==5 || rank==6 ) stage2();
}
Slide-20www.hpec-si.org
MITRE AFRLLincoln
Scalable Approach
Single Processor Mapping
Multi Processor Mapping
A = B + C#include <Vector.h>#include <AddPvl.h>
void addVectors(aMap, bMap, cMap) { Vector< Complex<Float> > a(‘a’, aMap, LENGTH); Vector< Complex<Float> > b(‘b’, bMap, LENGTH); Vector< Complex<Float> > c(‘c’, cMap, LENGTH);
b = 1; c = 2; a=b+c;
}
A = B + C
• Single processor and multi-processor code are the same• Maps can be changed without changing software• High level code is compact
• Single processor and multi-processor code are the same• Maps can be changed without changing software• High level code is compact
Lincoln Parallel Vector Library (PVL)
Slide-21www.hpec-si.org
MITRE AFRLLincoln
C++ Expression Templates and PETE
A=B+C*DA=B+C*D
BinaryNode<OpAssign, Vector,BinaryNode<OpAdd, VectorBinaryNode<OpMultiply, Vector,
Vector >>>
ExpressionTemplates
Expression
Expression TypeParse Tree
B+C A
Main
Operator +
Operator =
+
B& C&
1. Pass B and Creferences tooperator +
4. Pass expression treereference to operator
2. Create expressionparse tree
3. Return expressionparse tree
5. Calculate result andperform assignment
copy &
copy
B&,
C&
Parse trees, not vectors, createdParse trees, not vectors, created
• Expression Templates enhance performance by allowing temporary variables to be avoided
• Expression Templates enhance performance by allowing temporary variables to be avoided
Slide-22www.hpec-si.org
MITRE AFRLLincoln
PETE Linux Cluster Experiments
A=B+C A=B+C*D A=B+C*D/E+fft(F)
VSIPLPVL/VSIPLPVL/PETE
VSIPLPVL/VSIPLPVL/PETE
Vector Length Vector Length
0.6
0.8
0.9
1
1.2
0.7
1.1
0.8
0.9
1.1
1
VSIPLPVL/VSIPL
PVL/PETE
0.9
1
1.1
1.2
1.3
32 128512
2048
Vector Length
Re
lati
ve
Ex
ec
uti
on
Tim
e
819232768
1310728
Re
lati
ve
Ex
ec
uti
on
Tim
e
Re
lati
ve
Ex
ec
uti
on
Tim
e
1.2
32 128512
20488192
327681310728 32 128
5122048
819232768
1310728
• PVL with VSIPL has a small overhead• PVL with PETE can surpass VSIPL• PVL with VSIPL has a small overhead• PVL with PETE can surpass VSIPL
Slide-23www.hpec-si.org
MITRE AFRLLincoln
PowerPC AltiVec Experiments
Results• Hand coded loop achieves good
performance, but is problem specific and low level
• Optimized VSIPL performs well for simple expressions, worse for more complex expressions
• PETE style array operators perform almost as well as the hand-coded loop and are general, can be composed, and are high-level
AltiVec loop VSIPL (vendor optimized) PETE with AltiVec
• C• For loop• Direct use of AltiVec extensions• Assumes unit stride• Assumes vector alignment
• C• AltiVec aware VSIPro Core Lite (www.mpi-softtech.com)• No multiply-add• Cannot assume unit stride• Cannot assume vector alignment
• C++• PETE operators• Indirect use of AltiVec extensions• Assumes unit stride• Assumes vector alignment
A=B+C*DA=B+C A=B+C*D+E*F
A=B+C*D+E/F
Software Technology
Slide-24www.hpec-si.org
MITRE AFRLLincoln
• Technical Basis• Examples
Outline
• Introduction
• Software Standards
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-25www.hpec-si.org
MITRE AFRLLincoln
A = sin(A) + 2 * B;
• Generated code (no temporaries)
for (index i = 0; i < A.size(); ++i) A.put(i, sin(A.get(i)) + 2 * B.get(i));
• Apply inlining to transform to
for (index i = 0; i < A.size(); ++i) Ablock[i] = sin(Ablock[i]) + 2 * Bblock[i];
• Apply more inlining to transform to
T* Bp = &(Bblock[0]); T* Aend = &(Ablock[A.size()]);for (T* Ap = &(Ablock[0]); Ap < pend; ++Ap, ++Bp) *Ap = fmadd (2, *Bp, sin(*Ap));
• Or apply PowerPC AltiVec extensions
• Each step can be automatically generated• Optimization level whatever vendor desires• Each step can be automatically generated• Optimization level whatever vendor desires
Slide-26www.hpec-si.org
MITRE AFRLLincoln
BLAS zherk Routine
• BLAS = Basic Linear Algebra Subprograms• Hermitian matrix M: conjug(M) = Mt
• zherk performs a rank-k update of Hermitian matrix C:C A conjug(A)t + C
• VSIPL codeA = vsip_cmcreate_d(10,15,VSIP_ROW,MEM_NONE);C = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);tmp = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);vsip_cmprodh_d(A,A,tmp); /* A*conjug(A)t */vsip_rscmmul_d(alpha,tmp,tmp);/* *A*conjug(A)t */vsip_rscmmul_d(beta,C,C); /* *C */vsip_cmadd_d(tmp,C,C); /* *A*conjug(A)t + *C */vsip_cblockdestroy(vsip_cmdestroy_d(tmp));vsip_cblockdestroy(vsip_cmdestroy_d(C));vsip_cblockdestroy(vsip_cmdestroy_d(A));
• VSIPL++ code (also parallel)Matrix<complex<double> > A(10,15);Matrix<complex<double> > C(10,10);C = alpha * prodh(A,A) + beta * C;
Slide-27www.hpec-si.org
MITRE AFRLLincoln
Simple Filtering Application
int main () { using namespace vsip; const length ROWS = 64; const length COLS = 4096;
vsipl v;
FFT<Matrix, complex<double>, complex<double>, FORWARD, 0, MULTIPLE, alg_hint ()> forward_fft (Domain<2>(ROWS,COLS), 1.0);
FFT<Matrix, complex<double>, complex<double>, INVERSE, 0, MULTIPLE, alg_hint ()> inverse_fft (Domain<2>(ROWS,COLS), 1.0);
const Matrix<complex<double> > weights (load_weights (ROWS, COLS));
try { while (1) output (inverse_fft (forward_fft (input ()) * weights)); } catch (std::runtime_error) { // Successfully caught access outside domain. }}
Slide-28www.hpec-si.org
MITRE AFRLLincoln
Explicit Parallel Filter
#include <vsiplpp.h>using namespace VSIPL;const int ROWS = 64;const int COLS = 4096;int main (int argc, char **argv){ Matrix<Complex<Float>> W (ROWS, COLS, "WMap"); // weights matrix Matrix<Complex<Float>> X (ROWS, COLS, "WMap"); // input matrix load_weights (W) try { while (1) { input (X); // some input function
Y = IFFT ( mul (FFT(X), W)); output (Y); // some output function
} } catch (Exception &e) {cerr << e << endl};}
Slide-29www.hpec-si.org
MITRE AFRLLincoln
Multi-Stage Filter (main)
using namespace vsip;
const length ROWS = 64;const length COLS = 4096;
int main (int argc, char **argv) { sample_low_pass_filter<complex<float> > LPF(); sample_beamform<complex<float> > BF(); sample_matched_filter<complex<float> > MF();
try { while (1) output (MF(BF(LPF(input ())))); } catch (std::runtime_error) { // Successfully caught access outside domain. }}
Slide-30www.hpec-si.org
MITRE AFRLLincoln
Multi-Stage Filter (low pass filter)
template<typename T>class sample_low_pass_filter<T>{public: sample_low_pass_filter() : FIR1_(load_w1 (W1_LENGTH), FIR1_LENGTH), FIR2_(load_w2 (W2_LENGTH), FIR2_LENGTH) { }
Matrix<T> operator () (const Matrix<T>& Input) { Matrix<T> output(ROWS, COLS); for (index row=0; row<ROWS; row++) output.row(row) = FIR2_(FIR1_(Input.row(row)).second).second; return output; }
private: FIR<T, SYMMETRIC_ODD, FIR1_DECIMATION, CONTINUOUS, alg_hint()> FIR1_; FIR<T, SYMMETRIC_ODD, FIR2_DECIMATION, CONTINUOUS, alg_hint()> FIR2_;}
Slide-31www.hpec-si.org
MITRE AFRLLincoln
Multi-Stage Filter (beam former)
template<typename T>class sample_beamform<T>{public: sample_beamform() : W3_(load_w3 (ROWS,COLS)) { } Matrix<T> operator () (const Matrix<T>& Input) const { return W3_ * Input; }private: const Matrix<T> W3_;}
Slide-32www.hpec-si.org
MITRE AFRLLincoln
Multi-Stage Filter (matched filter)
template<typename T>class sample_matched_filter<T>{public: matched_filter() : W4_(load_w4 (ROWS,COLS)), forward_fft_ (Domain<2>(ROWS,COLS), 1.0), inverse_fft_ (Domain<2>(ROWS,COLS), 1.0) {}
Matrix<T> operator () (const Matrix<T>& Input) const { return inverse_fft_ (forward_fft_ (Input) * W4_); }
private: const Matrix<T> W4_; FFT<Matrix<T>, complex<double>, complex<double>, FORWARD, 0, MULTIPLE, alg_hint()> forward_fft_; FFT<Matrix<T>, complex<double>, complex<double>, INVERSE, 0, MULTIPLE, alg_hint()> inverse_fft_;}
Slide-33www.hpec-si.org
MITRE AFRLLincoln
• Fault Tolerance• Self Optimization• High Level Languages
Outline
• Introduction
• Software Standards
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-34www.hpec-si.org
MITRE AFRLLincoln
Dynamic Mapping for Fault Tolerance
Parallel ProcessorSpare
Failure
InputTask
XIN
Map0
Nodes: 0,1
Map1
Nodes: 0,2
OutputTask
Map2
Nodes: 1,3
XOUT
• Switching processors is accomplished by switching maps
• No change to algorithm required
• Switching processors is accomplished by switching maps
• No change to algorithm required
Slide-35www.hpec-si.org
MITRE AFRLLincoln
Dynamic Mapping Performance Results
0
1
10
100
32 128 512 2048
Remap/BaselineRedundant/BaselineRebind/Baseline
Data Size
Rel
ativ
e T
ime
• Good dynamic mapping performance is possible• Good dynamic mapping performance is possible
Slide-36www.hpec-si.org
MITRE AFRLLincoln
Optimal Mapping of Complex Algorithms
Input
XINXIN
Low Pass Filter
XINXIN
W1W1
FIR1FIR1 XOUTXOUT
W2W2
FIR2FIR2
Beamform
XINXIN
W3W3
multmult XOUTXOUT
Matched Filter
XINXIN
W4W4
FFTFFT
IFFTIFFT XOUTXOUT
Workstation
EmbeddedMulti-computer
PowerPCCluster
EmbeddedBoard
IntelCluster
Application
Hardware
Different Optimal Maps
• Need to automate process of mapping algorithm to hardware
• Need to automate process of mapping algorithm to hardware
Slide-37www.hpec-si.org
MITRE AFRLLincoln
Self-optimizing Softwarefor Signal Processing
• Find– Min(latency | #CPU)– Max(throughput | #CPU)
• S3P selects correct optimal mapping
• Excellent agreement between S3P predicted and achieved latencies and throughputs
• Find– Min(latency | #CPU)– Max(throughput | #CPU)
• S3P selects correct optimal mapping
• Excellent agreement between S3P predicted and achieved latencies and throughputs
#CPU
Lat
ency
(se
con
ds)
Large (48x128K)Small (48x4K)
Th
rou
gh
pu
t (
fram
es/s
ec)
4 5 6 7 8
25
20
15
100.25
0.20
0.15
0.10
1.5
1.0
0.5
predictedachieved
predictedachieved
predictedachieved
predictedachieved
5.0
4.0
3.0
2.04 5 6 7 8
Problem Size
#CPU
1-1-1-1
1-1-1-1
1-1-1-1
1-1-1-1
1-1-2-1
1-2-2-1
1-2-2-2 1-3-2-2
1-1-2-1
1-1-2-2
1-2-2-2
1-3-2-2
1-1-1-2
1-1-2-2
1-2-2-2
1-2-2-3
1-1-2-1
1-2-2-1
1-2-2-2
2-2-2-2
Slide-38www.hpec-si.org
MITRE AFRLLincoln
High Level Languages
DoD SensorProcessing
High Performance Matlab Applications
Parallel MatlabToolbox
DoD MissionPlanning
ScientificSimulation
CommercialApplications
UserInterface
HardwareInterface
Parallel Computing Hardware
• Parallel Matlab need has been identified
•HPCMO (OSU)
• Required user interface has been demonstrated
•Matlab*P (MIT/LCS)•PVL (MIT/LL)
• Required hardware interface has been demonstrated
•MatlabMPI (MIT/LL)
• Parallel Matlab Toolbox can now be realized• Parallel Matlab Toolbox can now be realized
Slide-39www.hpec-si.org
MITRE AFRLLincoln
MatlabMPI deployment (speedup)
• Maui– Image filtering benchmark (300x on 304 cpus)
• Lincoln– Signal Processing (7.8x on 8 cpus)– Radar simulations (7.5x on 8 cpus)– Hyperspectral (2.9x on 3 cpus)
• MIT– LCS Beowulf (11x Gflops on 9 duals)– AI Lab face recognition (10x on 8 duals)
• Other– Ohio St. EM Simulations– ARL SAR Image Enhancement– Wash U Hearing Aid Simulations– So. Ill. Benchmarking– JHU Digital Beamforming– ISL Radar simulation– URI Heart modeling
• Rapidly growing MatlabMPI user base demonstrates need for parallel matlab
• Demonstrated scaling to 300 processors
• Rapidly growing MatlabMPI user base demonstrates need for parallel matlab
• Demonstrated scaling to 300 processors
0
1
10
100
1 10 100 1000
MatlabMPILinear
Number of ProcessorsP
erfo
rman
ce (
Gig
aflo
ps)
Image Filtering on IBM SPat Maui Computing Center
Slide-40www.hpec-si.org
MITRE AFRLLincoln
Summary
• HPEC-SI Expected benefit
– Open software libraries, programming models, and standards that provide portability (3x), productivity (3x), and performance (1.5x) benefits to multiple DoD programs
• Invitation to Participate
– DoD Program offices with Signal/Image Processing needs
– Academic and Government Researchers interested in high performance embedded computing
– Contact: [email protected]
Slide-41www.hpec-si.org
MITRE AFRLLincoln
The Links
High Performance Embedded Computing Workshophttp://www.ll.mit.edu/HPEC
High Performance Embedded Computing Software Initiativehttp://www.hpec-si.org/
Vector, Signal, and Image Processing Libraryhttp://www.vsipl.org/
MPI Software Technologies, Inc.http://www.mpi-softtech.com/Data Reorganization Initiative
http://www.data-re.org/CodeSourcery, LLC
http://www.codesourcery.com/MatlabMPI
http://www.ll.mit.edu/MatlabMPI