Chapter 5 Synchronous Sequential Logic 5-1 Sequential Circuits
Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard!...
-
date post
21-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard!...
Saman Amarasinghe
Lets stick with current sequential languagesParallel Programming is hard!Billons of LOC written in sequential languages
Let the compiler do all the workMaintain the current strong machine abstraction
SUIF Parallelizing CompilerMonica Lam and the Stanford SUIF team 1993 – 1997
Automatically extract parallelism from sequential programs
Heroic AnalysisInterprocedural analysisArray and scalar data-flow analysisReduction and recurrence recognition C to FORTRAN
Achieved Best SPEC results of the dayVector processor Cray C90 540Uniprocessor Digital 21164508SUIF on 8 processors Digital 8400 1,016
But… Techniques were not robust for general use
spic
e 2g6
dodu
c
fppp
p
ora
md l
jdp2
wa v
e 5
md l
jsp2
a lvi
nn
n as a
7
ear
h ydr
o2d
su2 c
or
tom
catv
swm
2 56
Nu
mb
er o
f P
r oce
sso
rs
1
2
3
4
5
678
0
200
400
600
800
1000
1200
MF
LO
PS
Composition is key to building large systems
Implemented naturally via time-multiplexing
The framework for parallelizing sequential programs
Sequential parts at outermostGlobal barriers
Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))
Util
izat
ion
Number of cores
Expected Year
Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))
Util
izat
ion
Number of cores
Expected Year
% parallel
Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))
Util
izat
ion
Number of cores
Expected Year
% parallel
Speedup = 1/(1– p + p/N)Utilization = 1/(p + N*(1 – p))
Util
izat
ion
Number of cores
Expected Year
% parallel
Currently… Theory, algorithms, languages, tools all centered around the sequential paradigmA well enforced machine abstraction
Move to muticore is a fundamental shift Akin to analog design to digital shift
Need a new abstraction where parallelism is the primary form of expression
Parallelism is simpleParallelism is naturalCommunication is intuitive
Parallel composition of sequential segmentsWith possible space-multiplexed execution
Parallel programming still in the dark agesElite community of practitioners Active open research, little stable consensusAssumption: we don’t know how to teach parallel programming!
Aim for a “Mead and Conway” type revolutionDevelop simple, cookbook approachesIf we can’t teach them, they’re too complex!Make them accessibleCarefully thought-out courseware, tools, texts, coursesFocus on the educational communityExporting, proselytizing, workshops, conferences,
journals, …
1. Move to a truly parallel world (long term)Natural world is extremely parallel learn to emulate itCan we make sequential programs a special case of parallel programming?
2. Rejoice when parallelism is natural (medium term)Switch to parallel languages if using them is easier than sequential languages
3. Help migrate legacy application (short term)Existing large body of code – cannot ignore!Written in sequential languages – need to work with them
Some domains are inherently parallelCoding them using a sequential language is…
Harder than using the right parallel abstraction All information on inherent parallelism is lost
There are win-win situations Increasing the programmer productivity while extracting parallel performance
Streaming domain and the StreamIt experience
Picture Reorder
joiner
joiner
IDCT
IQuantization
splitter
splitter
VLD
macroblocks, motion vectors
frequency encodedmacroblocks
differentially coded motion vectors
motion vectorsspatially encoded macroblocks
recovered picture
ZigZag
Saturation
Channel Upsample Channel Upsample
Motion Vector Decode
Y Cb Cr
quantization coefficients
picture type
<QC>
<QC>
reference picture
Motion Compensation
<PT1> referencepicture
Motion Compensation
<PT1>reference picture
Motion Compensation
<PT1>
<PT2>
Repeat
Color Space Conversion
<PT1, PT2>
MPEG bit stream
Structured block level diagram describes computation and flow of data
Conceptually easy to understand
Clean abstraction of functionality
Mapping to C (sequentialization) destroys this simple view
MPEG-2 Decoder
add VLD(QC, PT1, PT2);
add splitjoin { split roundrobin(NB, V);
add pipeline { add ZigZag(B); add IQuantization(B) to QC; add IDCT(B); add Saturation(B); } add pipeline { add MotionVectorDecode(); add Repeat(V, N); }
join roundrobin(B, V);}
add splitjoin { split roundrobin(4(B+V), B+V, B+V);
add MotionCompensation(4(B+V)) to PT1; for (int i = 0; i < 2; i++) { add pipeline { add MotionCompensation(B+V) to PT1; add ChannelUpsample(B); } }
join roundrobin(1, 1, 1);}
add PictureReorder(3WH) to PT2;
add ColorSpaceConversion(3WH);
Picture Reorder
joiner
joiner
IDCT
IQuantization
splitter
splitter
VLD
macroblocks, motion vectors
frequency encodedmacroblocks
differentially coded motion vectors
motion vectorsspatially encoded macroblocks
recovered picture
ZigZag
Saturation
Channel Upsample Channel Upsample
Motion Vector Decode
Y Cb Cr
quantization coefficients
picture type
<QC>
<QC>
reference picture
Motion Compensation
<PT1> referencepicture
Motion Compensation
<PT1>reference picture
Motion Compensation
<PT1>
<PT2>
Repeat
Color Space Conversion
<PT1, PT2>
MPEG bit stream
MPEG-2 Decoder
Task ParallelismThread (fork/join) parallelismParallelism explicit in algorithmBetween filters without producer/consumer relationship
Data ParallelismData parallel loop (forall)Between iterations of a stateless filter Can’t parallelize filters with state
Pipeline ParallelismUsually exploited in hardwareBetween producers and consumersStateful filters can be parallelized
MPEG-2 Decoder
Picture Reorder
joiner
joiner
IDCT
IQuantization
splitter
splitter
VLD
macroblocks, motion vectors
frequency encodedmacroblocks
differentially coded motion vectors
motion vectorsspatially encoded macroblocks
recovered picture
ZigZag
Saturation
Channel Upsample Channel Upsample
Motion Vector Decode
Y Cb Cr
quantization coefficients
picture type
<QC>
<QC>
reference picture
Motion Compensation
<PT1> referencepicture
Motion Compensation
<PT1>reference picture
Motion Compensation
<PT1>
<PT2>
Repeat
Color Space Conversion
<PT1, PT2>
MPEG bit stream
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Vocod
erFFT
FMRad
ioTDE
Bitonic
Sort
MPEG2D
ecod
er
Chann
elVoc
oder
DESDCT
Filterb
ank
Serpe
nt
Radar
Geom
etric
Mea
n
Benchmarks
Th
rou
gh
pu
t N
orm
aliz
ed t
o S
ing
le C
ore
Str
eam
It
.
On a 16 core MIT Raw Processor (http://cag.csail.mit.edu/raw)
Don’t modify a code segment if…The performance impact is insignificant and is isolated from the rest Automatic parallelizer works perfectly
Modify and annotate a segment if…Automatic parallelizer needs a little help
Otherwise rewrite the segment
Program ReincarnationA new body with the same old soul
Still inExistingSequentialLanguages
Use a ParallelLanguage
.exe.exeOriginal
Compiler
OriginalCompiler
OriginalBinary
AutomaticParallelization
AutomaticParallelization
StaticAnalysis
StaticAnalysis
Dynamic analysis Managed program executionProgram invariant inference Application knowledge database
Assisted parallelizationGUI tool
Correctness in reincarnated Test GenerationDivergence Analysis
Static analysisAutomatic parallelization info for program understanding
Learn about the domainFlag domain specific issuesGenerate domain-specific hints
Bring programs to modern ageBlock diagramRefactoring identification
Instrumenter andBinary interpreter
Instrumenter andBinary interpreter
ManagedProgram
Execution
ManagedProgram
Execution
Program InvariantInference Engine
Program InvariantInference Engine
.log.log Application
Knowledge(program
representation & invariants)
ApplicationKnowledge
(program representation &
invariants)Known IdiomIdentification
&Domain
Hint Generation
Known IdiomIdentification
&Domain
Hint Generation
DomainKnowledgeDatabase
DomainKnowledgeDatabase
DomainKnowledgeExtraction
DomainKnowledgeExtraction
Compiler &Instrumenter
Compiler &Instrumenter
Reincarna
ted .c
Reincarna
ted .c.exe.exe
Assisted Application
Reincarnation Tool
ManagedProgram
Execution
ManagedProgram
Execution
.log.log Test GenerationTest GenerationDivergence
Analysis
DivergenceAnalysis
RefactoringIdentification
RefactoringIdentification
Block DiagramRepresentation
Block DiagramRepresentation
Legacy Program Source File
.c.c
Multicore menace will impact all of us in a big way
Parallelism need to keep up with Moore’s curve
Will definitely need new parallel languages where parallelism is the primary form of composition
Low hanging fruit when parallelism is the natural form of expression
However, cannot ignore the past investments
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.