Farms, Pipes, Streams and Reforestation · RePhraseProject: Refactoring Parallel Heterogeneous...

Post on 11-Jul-2020

9 views 0 download

Transcript of Farms, Pipes, Streams and Reforestation · RePhraseProject: Refactoring Parallel Heterogeneous...

Farms,Pipes,StreamsandReforestationType-DirectedParallelisation

DavidCastro,KevinHammond andSusmit SarkarUniversityofStAndrews,UK

kevin@kevinhammond.net

IFIPWorkingGroup2.11Meeting, Bloomington, Indiana,23/8/16

RePhrase Project:RefactoringParallelHeterogeneousSoftware– aSoftwareEngineeringApproach(ICT-644235),2015-2018,€3.6Mbudget

8Partners,6EuropeancountriesUK,Spain, Italy,Austria,Hungary,Israel

CoordinatedbyKevinHammond, StAndrews

0

ParaFormance Project: ParallelPatterns forHeterogeneousMulticoreSystems(ICT-288570),2015-2018,£537Kbudget

Goal:FormationofHigh-Growth Company ofScaleby2023

Coordinated byKevinHammond, StAndrews

TheProblem

• Weneedtochoosethebestparallelabstractions• Algorithmicskeletons [Cole1989]implementpatterns

• Weneedaformalwaytoreasonaboutparallelstructure§ Correctness oftransformations§ Reasoningaboutperformance

4

f1f1

f1

f2f2

f2

f1f1

f1

f2

ExampleSkeleton:ParallelTaskFarms

§ TaskFarmsuseafixednumberofworkers§ Eachworkerappliesthesameoperation(f)§ f isappliedtoeachoftheinputsinastream.

5

f

f…

f

…,x10,x9,x8

x6

x7

x5

f x3

f x2

f x4

fx1,fx0,…

ExampleSkeleton:ParallelPipeline

§ Parallelpipelinescomposetwooperations(f andg)§ overtheelementsofaninputstream§ f and g areruninparallel

6

f g…x9x8

g(fx1)g(fx0)

fx5fx4

g(fx2)fx6

fx3x7

7

Example

ImageMerge

8

Imagemergingcomposestwooperations,merge andmark

Possibleimplementationsinclude:

ChoosinganImplementation

9

DecoratethefunctiontypewithIM(n,m)

where

Nowthetypesystemautomaticallyselects

Wecanguarantee thatthisisfunctionallyequivalent toimgMerge

Inferringparallelstructures

10

Wecanleaveholesinthetypes,e.g.

IM(n,m)=_||FARMm_

replaces_withthesimplestpossiblestructures

IM(n,m)=mincost(_||FARMm_)

replaces_withtheleastcoststructures.

Wecanchoosetheprovablyleastcostskeleton

11

BasicSemantics

SyntaxofSkeletons

12

funT liftsanatomicfunctiontoacollectiontypeTdc representsdivideandconquerovercollectionTfb introducesfeedback

SkeletonDenotational Semantics

13

Basesemantics,S(⍴ isaglobalenvironmentoffunctiondefns)

Liftedtoastreamingform,PovercollectiontypeT

Morphisms forDivide-and-Conquer

14

Catamorphism (fold)

Anamorphism (unfold)

Morphisms forStreams

15

Givenabifunctor,G,mapsovercollectionTare

Iteration

16

Easytodefineusingthefix-pointcombinator,Yf=f(Yf)

17

GeneralisingRecursionPatterns

Hylomorphisms

18

map,cata andana arejustspecialcasesofhylomorphisms

Hylomorphisms aregeneralrecursionpatterns

ForhyloF fg μF recursivecalltreeg howinputsaresplitf howresultsarecombined

Example:Quicksort

19

or

AlltheWorld’saHylomorphism!

20

21

StructureinTypes

IntroducingParallelPatterns

§ Thetypesystemusesastructure-equivalencerelationthatdescribeswhentwoprogramsareextensionallyequivalent.

§ Thetype-checkingalgorithmneedstodecidethesestructure-equivalences.

§ Thetype-checkingalgorithmalsoneedstounifystructures,modulothisstructure-equivalencerelation.

ProgramStructure Targetparallelstructure

Sequentialnormalisedstructure

Sequentialnormalisedstructure

?

SyntaxofStructuredTypes

23

Structure-AnnotatedTypeRules

24

Convertibility

25

Plussomeotherrulesderivedfromthehylomorphism laws.Weusethistoproduceaconfluentrewritingsystem.

ParallelismErasure

26

Rewriterulesderivedfromconvertibility

Repeatedtoproduceaconfluentrewritingsystem

Normalisation

27

Therewriterulesarederivedfrombasiclaws

Usedtodefineanormalisation procedure

Wecannowproveequivalenceoftwoparalleltermsby:i) erasingparallelismusingerase,ii) normalising usingnorm,andiii) testingforequivalence

28

Example

QuickSort Revisited

29

Startwithasequentialversion

Tocreateaparalleldivide-and-conquerversion,weneedtodecide

Thisiseasilydoneusingasimpleparallelismerasure

Inferrring MoreComplexParallelStructure

30

Nowconsideramorecomplexstructure

ParallelismerasureontheRHSgives

Normalisation oftheLHS(usingHYLO-SPLITetc)gives

Normalisation oftheRHSgives

Inferrring MoreComplexParallelStructure(2)

31

Weneedtounify thenormalised forms

and

Inferrring MoreComplexParallelStructure(3)

32

Substitutingbackgivesusthedesiredparallelform

Wecanthenuseequivalencetogivetheactualprogram

Costs

33

For1000listsof30,000,000elements

Predictedv.ActualSpeedup

ImageConvolutionof500imagesontitanic,a 2.4GHz24-core,AMDOpteron6176architecture,runningCentosLinux2.6.18-274.e15.Dashedlinesarepredictions.

34

7.3 Image convolution

Image convolution is widely used in image processing applications. The convo-lution algorithm is the composition of two functions read and process. Thefunction read reads an image from a file, and process processes the image.

tread, tproc : TimingIC, BIC : SkelTyIC = Pipe (Farm (Func {ti=tread})) (Farm (Func {ti=tproc}))BIC = bestInst titanic IC

imageConv1 : Par BIC FilePath ImgimageConv1 = skel [readI, processI]

imageConv2 : Auto titanic FilePath ImgimageConv2 = bestSkel [readI, processI] [tread, tproc]

1 2 4 6 8 10 12 14 16 18 20 22

12

4

6

8

10

12

14

16

18

20

22

n2

Workers

Speedup

Pipe (Farm 6 Func) (Farm n2

Func)

Farm n2

(Seq Func Func)

Farm n2

(Pipe Func Func)

(a) titanic

1 4 8 16 24 32 38 46 54 62

14

8

12

16

20

24

28

32

36

40

44

n2

Workers

Farm n2

(Seq Func Func)

Pipe (Farm 12 Func) (Farm n2

Func)

Farm n2

(Pipe Func Func)

(b) lovelace

Fig. 6. Di↵erent Parallel Structures for Image Convolution, 500 Images 1024 * 1024

8 Related Work

Work has also gone into boiling down skeletons into a small set (included in theskeletons we consider) which can express a variety of patterns [17].

Scaife et al [33] present the design and implementation of a parallelisingcompiler that automatically extracts parallelism for Standard ML. They exploitparallelism in the familiarmap and fold HOFs by using nested parallel skeletons.

Refactoring. Roughly, rewriting systems consist of a set of objects with somerelations on how to transform those objects. Rewriting rules for transformingdi↵erent parallel skeletons into other kinds of parallelism have been used forthe implementation of di↵erent refactoring techniques [1, 2, 11]. In this work weprovide a formalisation of rewriting rules for parallel skeletons that allow us toensure that no incorrect parallel structure is introduced.

1 2 4 6 8 10 12 14 16 18 20 22

12

4

6

8

10

12

14

16

18

n Workers

Speedup

Farm n Func

N = 1024N = 2048

(a) Matrix Multiplication

1 2 3 4 5 6 7 8 9 10 11 12

2

3

4

5

6

n2

Workers

Speedup

Farm n (Pipe Func Func)

(b) Image Merge

1 2 4 6 8 10 12 14 16 18 20 22

12

4

6

8

10

12

14

16

18

20

n2

Workers

Speedup

Pipe (Farm n

1

Func) (Farm n

2

Func)

n1

= 1n1

= 2n1

= 4n1

= 6n1

= 8

(c) Image Conv

Fig. 5. Speedups vs predictions (titanic).

35

Conclusion

Conclusions

• First-ever treatment ofparallelismthatreflects parallelstructure intypes

• Severaladvantages toexposingparallelstructure intypes• clearseparation between thestructureandthefunctionality• documentation ofhowaprogramwasparallelized• easytochangetheparallelstructure ofaprogramwithoutmodifyingthe

functionalbehaviour

• Reasoningaboutcostsofdifferent parallelstructure isverypowerful§ Automatically findsuitableparallelstructures§ Compile-time informationabouttherun-time behaviour§ Automatically rewriteprogramstominimizecosts

36

FutureWork

• Otherpatterns,e.g.stencilandbulksynchronousparallelism

• Moredetailedcostmodels(seee.g.Hammondetal,2016)

• DynamicAnalysisisalsopossible

• Allow(certainkindsof)sideeffectsintheworkers

• Implementback-ends.Runourstructuredprograms!

• Largercasestudies

37

PaperAvailableonRequest

38

ToappearinICFP2016

THANKYOU!

http://rephrase-ict.eu

@rephrase_eu

http://paraphrase-ict.eu

39