Massively Parallel Landscape- Evolution Modelling using ......Evolution Modelling using General...

Post on 10-Oct-2020

4 views 0 download

Transcript of Massively Parallel Landscape- Evolution Modelling using ......Evolution Modelling using General...

MassivelyParallelLandscape-EvolutionModellingusingGeneralPurposeGraphicalProcessingUnits

S7331Tuesday9th May2017

GPUTechnologyConferenceA.StephenMcGough,DarrelMaddy

J.Wainwright,S.Liang,M.Rapoportas,A.Trueman,R.Grey,G.KumarVinod,andJamesBell

DurhamUniversity,UKNewcastleUniversity,UK

Outline

• WhatisLandscapeEvolutionModelling(LEM)• ParallelizationofLEM• PreliminaryResults

LandscapeEvolutionModeling

• Landscapeschangeovertimeduetowater/weathering• PhysicalandChemicalWeatheringrequirewatertobreakdownmaterial• HigherenergyflowingwaterbothErodesandTransportsmaterialuntildecreasingenergyconditionsresultinDepositionofmaterial

• Theseprocessestakealongtime• Manyglacial-InterglacialCycles

• Cyclesare~100kaforlast800ka,priorto800kacycleswere~40kainlength

• Wewanttouseretrodiction toworkouthowthelandscapehaschanged

LandscapeEvolutionModeling

• Useasimulationtomodelhowthelandscapechanges• 3DLandscapeisdiscretizedasaregular2Dgrid(x,y)withcellvaluesrepresentingsurfaceheights(z)derivedfromadigitalelevationmodel(DEM)

• Cellscanbe10mx10morlarger

31 22 32

33 32 25 33 34

29 26 27 39 36

27 26 41 44 50

45 44 40 51 55

39 44 46

N

7 10 7

105 8 9

5 9 6

4 6 7 8 4

87

9

8 7

9 8 4 6 5

6

Figure 2.2: The figure illustrates the water accumulation modelling. The amount ofwater accumulates on a cell is the sum of water of all adjacent cells whichhave assigned a direction towards it. Hence computing water accumu-lation on one cell is only ready to be performed when the water accu-mulation of all of its flowing in neighbours have been computed. Imagecourtesy: Gregory E. Tucker and Gregory R. Hancock, 2009.

2.4 Bottlenecks and the Potential of Parallel Solutions

The time complexity of the flow direction computation on non-flat cells is O(n),where n is the total number of cells in the DEM, since the algorithm performs boundedoperations on each cell. The run time e�ciency of the algorithm for water accumu-lation however depends on the longest drainage path in the DEM. Although themodelling is su�ciently e�cient on a small scale DEM, however, it faces a severecomputational challenge when processing massive size grids. The resolution of aDEM has to be high enough to achieve su�cient accuracy, which makes the size ofthe grid to grow substantially to represent a fairly large terrain. Also, geologists ex-pect computers to perform a large number of iterations of water flow directions andaccumulations computation to model the change of the landscape over a very longperiod. Due to these facts, the spatial and temporal scalability of landscape evolutionmodelling depends on the computational power of hardware. Unfortunately, as thefree lunch of Moore’s law is over, it will be unwise to expect the hardware performanceimprovement will satisfy the computational demand in the near future.

Computer scientists have made a couple of attempts to overcome this computa-tional bottleneck. TerraFlow has implemented the algorithms with significant I/Ooptimisations for massive size DEMs. [1] However, it still consumes minutes formillion size DEM on a computer with 500MHz processor and 1GB memory. Sincecomputing water flow direction on each grid is a total independent process, and waterflow accumulations on one drainage path does not depend on others, these computa-tions are possible to be performed in parallel. Chase Wallis team has implemented

9

LandscapeEvolutionModeling(simplified)Eachiterationofthesimulation:

FlowRouting

FlowAccumulation

Erosion/Deposition

1 1 3 1 1

7 2 1 1 5

1 1 1 1 1

2 1 1 1 1

1 1 6 1 2

How much material will be removed?How much material will be deposited?

Current sequential versionis much slower than this…

• Each step is ‘fairly’ fast…• But we want to do lots of them 120K to

1M years• On landscapes of 6-56M cells• If we could simulate 1 year in 1 minute

this would take 83 – 694 days!• assuming 1 year = 1 iteration• may need more

ExecutionanalysisofSequentialLEM

• WestartedfromanexistingsequentialLEM• 51x100cellsforjust120Kyearstook72hours

• estimatefor25Mcells64,000years• Thiswasnon-optimalcode

• Reducedexecutiontimefrom72to4.7hours• 64,000yearsdownto300years

• Butthisisstillnotenoughforourneeds

ExecutionanalysisofSequentialLEM

• PerformanceAnalysis:• ~74%oftimespentroutingandaccumulating• Needordersofmagnitudespeedup

• Sofocuswasonflowrouting/accumulation

Outline

• WhatisLandscapeEvolutionModelling(LEM)• ParallelizationofLEM• PreliminaryResults

ParallelFlowRouting

• Eachcellcanbedoneindependentlyofallothers• SFD

• 100%flowinthedirectionofsteepestdecent(normallylowestneighbour)

• MFD• Flowisproportionedbetweenalllowerneighbours

• Proportionaltoslopetoeachneighbours

• Almostlinearspeed-up• Problemswithcodedivergence

• CUDAWarpssplitwhencodecontainsafork

3 2 47 5 87 1 9

3 2 47 5 87 1 9

SingleflowdirectionvsmultipleflowdirectionMFDis‘better’butmuchmorecomputationallydemanding

ParallelAccumulation:CorrectFlow

• Iterate:• Donotcomputeacelluntilithasnoincorrectcellsflowingintoit• Sumallinputsandaddself• Allcellscanworkindependentlyofeachother

• Somerestrictiononupdatesnothappeningimmediately

FlowRouting Accumulation Correct

1 1

1 11 1 1 1 1

1 1

3 2 2

2 2

4 6 3

4

5 6 197 14

Cellvaluesarenotnormally1,buttheinitialrainfallonthecell

Notthewholestory…

• SinksandPlateaus

• Can’tworkoutflowroutingonsinksandplateaus• Needto‘fake’aflowrouting

• Fillasinkuntilitcanflowout• Turneditintoaplateau

• Fakeflowdirectionsonaplateautotheoutlet

ParallelPlateaurouting

• Needtofindtheoutflowofaplateauandflowallwatertoit• Acommonsolutionistouseabreadthfirstsearchalgorithm

• Parallelimplementation• Thoughresultdoeslook‘unnatural’• Alternativepatternsarepossible– butacceptable

• Weareinvestigatingalternativesolutions

Sinkfilling

• Dealingwithasinglesinkis(relatively)simple• Fillsinkuntilweendupwithaplateau(lake)

• Butwhatifwehavemultiplenestedsinks?

NestedSinkfilling

• ImplementedparallelversionofthesinkfillingalgorithmproposedbyArger etal[2003]

• Identifyeachsink(parallel)• Determinewhichcellsflowintothissink- watershed(parallel)• Determinethelowestcelljoiningeachpairofsinks(parallel/sequential)• WorkouthowhighcellsineachsinkneedtoberaisedtotoallowallcellstoflowoutoftheDEM(sequential)

• Fillallsinkcellstothisheight(parallel)

GPGPUSolution:PARALLEM

• MassivelyparallelversionoftheLEM• ForDirection(includingplateauandsinks)andAccumulation

• Processhasnowbeenparallelized• onNVIDIAFermi/Keplerbasedgraphicscards

• TeslaC2050,GTX580,K20,K40,K80• ~twoordersofmagnitudespeedupovertheoptimizedsequentialcode(upto56mcells)

Outline

• WhatisLandscapeEvolutionModelling(LEM)• ParallelizationofLEM• PreliminaryResults

Results:Performance• Overallperformance

0.01

0.1

1

10

100

1000

0.01 0.1 1 10 100

Tim

e (s

)

DEM size

CybErosion-slimTesla single iteration580 single iteration

Tesla average 10580 average 10

(millions)

Results:Performance• FlowDirection

• Includingsink&plateausolution

0.0001

0.001

0.01

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100

Tim

e (s

)

DEM size (millions)

Sequential Flow DirectionTesla Flow Direction580 Flow Direction

Terraflow Flow Direction

0.001

0.01

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100

Tim

e (s

)

DEM size (millions)

Sequential Flow AccumulationTesla Flow Accumulation580 Flow Accumulation

Terraflow Flow Accumulation

Results:Performance• FlowAccumulation

0.001

0.01

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100

Tim

e (s

)

DEM size (millions)

Sequential Flow AccumulationTesla Flow Accumulation580 Flow Accumulation

Terraflow Flow AccumulationTesla K20 Flow Accumulation

0.001

0.01

0.1

1

10

100

1000

0.001 0.01 0.1 1 10 100

Tim

e (s

)

DEM size (millions)

Sequential Flow AccumulationTesla Flow Accumulation580 Flow Accumulation

Terraflow Flow AccumulationTesla K20 Flow Accumulation

Tesla K20 - removed conditionals

3. PARALLEMpipeline

4. UpperThamesDigitalElevationmodel

TheCurrentSimulation

• CoreModelnowextendedwithprocesses• Mostonlyaffectindividualcells(weathering,vegetation)

• SomehavecrossDEMeffects(massmovement)butcanusesameprocessasbefore

12.SpatialpatternsofLandscapechangeovertime

125-120k

20-15k

Notewhiteisdepositiontodarkerosion.

TheCurrentSimulation• ActivelyrunninglandscapemodelsonK40/K80GPGPUs• Taking~7weekstorunourmodel(MFD)

• Leadingtointerestingresults• Notseenasmodelshavetraditionallybeenmuchsmaller

• Currentlyrunningonjust1GPGPU• Runningmultiplemodelssimultaneously

• Nowhaveamulti-GPGPUcodeforrunningflowaccumulation

• Designedto‘sweep’overthelandscape

UpperThamesValley+120K

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

138

Problem:LandscapeCuttingwithSFD

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

115 125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

70 115 125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

50 70 115 125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

25 50 70 115 125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

20 25 50 70 115 125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

15 20 25 50 70 115 125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

10 15 20 25 50 70 115 125 138

0

10

20

30

40

50

60

70

80

90

100

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

5 10 15 20 25 50 70 115 125 138

SFD MFD

Comparing‘cutin’betweenSFDandMFD

70

72

74

76

78

80

82

84

0 500 1000 1500 2000 2500 3000

1k 20kSFD 20kMFD

Problem:AlgorithmSlow-down

• Correctflowalgorithmrequiresallinputcellstobecorrectbeforeprogressing

• Becomesaproblemforrivers

0

20

40

60

80

100

1 10 100 1000 10000

Perc

enta

ge C

ompl

ete

Iteration

• Correctflowcompletionprofile

Summary

• Abletoshow2+ordersofmagnitudespeedupPARALLEM• Significantpotentialforfurtherspeedup

• Optimizationoftheprocesses• Removesequentialization ofcorrectflow

• TheuseofGPGPUshasallowedustoredresstheexecutionrestrictionwhichhaspreventedusdoingMFD– leadingto‘better’landscapes

stephen.mcgough@newcastle.ac.ukdarrel.maddy@newcastle.ac.ukJ.Wainwright,S.Liang,M.Rapoportas,A.Trueman,R.Grey,G.KumarVinod,andJamesBell

WeArerecruiting:- 2 PostDoc (MachineLearning)- 1PostDoc (ParallelProgramming)- AlwayslookingforgoodPhDCandidates