Photos placed in horizontal position between photos and...

25
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. SAND2017-4325 C The Impact of Increasing Memory System Diversity on Applications Gwen Voskuilen Arun Rodrigues, Mike Frank, Si Hammond 4/25/2017

Transcript of Photos placed in horizontal position between photos and...

Page 1: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. SAND2017-4325 C

TheImpactofIncreasingMemorySystemDiversityonApplications

GwenVoskuilenArun Rodrigues,MikeFrank,SiHammond

4/25/2017

Page 2: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Diversityinmemoryhierarchy

§ Newmemorytechnologies– stackedDRAM,non-volatile§ Advantages:higherbandwidth,persistent,etc.§ Disadvantages:expensive,higherlatencies

§ Solution:multi-levelmemory(MLM)§ Mixmemoriestoexposeadvantages,hidedisadvantages§ Reallyhardinpractice

§ Mustcarefullylocatedatabasedondatacharacteristics§ Whichdatagoesintowhichmemory?Whodecides?

2

Automatic Manual

Hardware Hardwaredecides

Software OS/runtimedecides

Application decides

Page 3: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

ManagingMLM

§ TrinityKNL:SmallstackedDRAM+largeDDRDRAM§ Applicationdatafootprint>>stackedDRAMcapacity

§ Managementrequiresidentifying andselectivelyallocatingdataneedingbandwidthintostackedDRAM

§ Study1:Softwaremanagementpolicies§ Rangingfromsimpletocomplex§ Developedanalysistool,MemSieve, toevaluatememorybehavior

§ Study2:Hardwarecaching§ Insertion/evictionpolicies

3

Page 4: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Outline

§ Methodology

§ Software/manuallymanagedMLM(malloc()based)

§ Hardware/automaticmanagedMLM

§ Conclusions

4

Page 5: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

SimulationMethodology

§ MiniApps:HPCG,PENNANT,SNAP§ Memoryfootprintsof1-8GBà samplingrequired

§ SimulatedontwoarchitecturesusingSST§ Lightweight:72smallcores,mesh,privateL1,semi-privateL2§ Heavyweight:8bigcores,ring,privateL1+L2,sharedL3§ Both:DDRandHMCmemories

5

TileHMCDDR

C CL1 L1

L2

L3Core

Mem

Core

L3

Core

Core

CoreCore

Core

Core

L3

L3

L3

L3

L3

L3

Mem

Mem

Mem

Core

L1

L2

Page 6: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Outline

§ Methodology

§ Software/manuallymanagedMLM

§ Hardware/automaticmanagedMLM

§ Conclusions

6

Page 7: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Softwareapproaches

Software management

OS / Runtime

Static

Greedily insert pages

into HMC

Greedily insert mallocs

into HMC

Dynamic(future work)

Programmer

Static

Direct "best" mallocs to

HMC

Dynamic

Migrate to put current "best"

to HMC

7Increased performance?

Page 8: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Tradeoff

§ Automatic(OS)§ Easierforprogrammer§ Abletocaptureallocationsnotunderprogrammercontrol

§ Library,pre-programstart,etc.§ Page-tablecomplexity;potentiallyexpensivere-mapping§ Noprogramknowledgeà worseperformance?

§ Coulduseprogrammerhintsorruntimeprofilingbutmorework

§ Manual(programmer)§ Moreworkforprogrammer,pervasive(?)changes§ Notabletohandleallallocations§ Possibleconflictsbetweenapplicationandlibraryallocations

§ Whatiflibrariesdecidetomanageallocationforinternalstructurestoo?§ Knowledgeofprogrambehaviorà betterperformance?

8

Page 9: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Analysistool:MemSieve

§ Capturesanapplication’smemoryaccessesandcorrelatestotheapplication’smemoryallocations§ Filtersoutcachehits§ Withoutsimulatingfullmemoryhierarchyà 2.5X+faster

§ Keymeasurement:malloc density§ #accesses/size

§ Hypothesis:densemallocs shouldbeputinHMC§ AssumingsimilarlatenciesbetweenHMCandDDR

9

Page 10: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Malloc analysis

Pennant HPCG Snap * MiniPICMalloc count 8B 23M 1B 438KMalloc size 32.1 TB 7.43 GB 30GB 7.9GBDistinct traces 248 612 323 39043Accessed traces 140 146 90 10794Size of accessed traces as % total

89.7% 99.987% 89.6% 84%

10

*Iterations from beginning & middle only

§ Manymallocs butfewdistinctmalloc calltraces(locations)§ Reasonsmallocs arenotaccessed

§ Sameaddressmalloc’d repeatedlyà cache-resident§ Malloc wasnotaccessedinprofiledsectionofapplication

Page 11: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Idealmalloc behavior§ Good: Afew,small,verydensemallocs§ Bad:Many,equallydensemallocs;densestarebig

11

Den

sity

Mallocs

Big density variation: less work to manage

% a

cces

ses

(cum

ulat

ive)

Size

Mallocs

% a

cces

ses

(cum

ulat

ive)

Lots of accesses in a very small region

More dense Less dense

Page 12: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Malloc density

12

0%10%20%30%40%50%60%70%80%90%100%

0

0.5

1

1.5

2

2.5

3

3.5

4

% o

f tot

al a

cces

ses

Den

sity

(acc

esse

s/by

te)

Malloc call sites, from most to least dense

DensityAccesses

0%

20%

40%

60%

80%

100%

0

0.25

0.5

0.75

1

1.25

1.5

1.75

%of

tota

l acc

esse

s

Den

sity

(acc

esse

s/by

te)

Malloc call sites, from most to least dense

Density

Accesses

0%10%20%30%40%50%60%70%80%90%100%

0

8

16

24

32

% o

f tot

al a

cces

ses

Cum

ulat

ive

size

(TB

)

Malloc call sites, from most to least dense

Size (TB)Accesses

0%

20%

40%

60%

80%

100%

0

1

2

3

4

5

6

7

8

% o

f tot

al a

cces

ses

Cum

ulat

ive

size

(GB

)

Malloc call sites, from most to least dense

Size (MB)Accesses

HPCGPENNANT

Page 13: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

HMCPotentialPerformance§ Max:8X§ Trendsdonotchangewithdatasetsize(1-8GB)

13

0

2

4

6

MiniPIC charge

MiniPIC field

MiniPIC move

SNAP p0 SNAP p1/2 HPCG PENNANTSpee

dup

over

all

DD

R

Heavyweight Architecture: Performance with all HMC

0

2

4

6

MiniPIC charge

MiniPIC field

MiniPIC move

SNAP p0 SNAP p1 HPCG PENNANTSpee

dup

over

all

DD

R

Lightweight Architecture: Performance with all HMC

Page 14: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Manualallocation:PENNANT

§ Largeperformancejumpfrom25%to50%HMC§ Dynamicmigrationnecessary

14

0

1

2

3

4

5

6

7

Greedy - page Greedy - malloc Static Dynamic

Spee

dup

over

DD

R o

nly

PENNANT

12.50% 25% 50% 100%

Page 15: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Manualallocation:HPCG

§ Again,largejumpfrom25%to50%HMC§ Page-based&staticperformsimilarly§ Dynamicnotbetter

§ Butgranularityofmigrationislarge

15

0

1

2

3

4

5

6

Greedy - page Greedy - malloc Static Dynamic

Spee

dup

over

DD

R o

nly

HPCG

12.50% 25% 50% 100%

Page 16: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Manualallocation:SNAP

§ Greedy-pageperformsthebest§ Twolargemallocs inSNAP,each42%oftotal

§ Medium/lowdensity§ Oncetheydon’tfit,HMCsize/malloc strategydoesn’tmatter

§ Suggestedcodechange§ Breakuplargemallocs toimproveHMCutilization

16

0

0.5

1

1.5

2

Greedy - page Greedy - malloc Static DynamicSpee

dup

over

DD

R o

nly

SNAP

12.50% 25% 50% 100%

Page 17: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Topologycomparison

§ Similartrend

17

0123456

Greedy -page

Greedy -malloc

Static Dynamic

Spee

dup

over

DD

R o

nly

PENNANT

12.50% 25% 50% 100%

01234567

Greedy -page

Greedy -malloc

Static Dynamic

Spee

dup

over

DD

R o

nly

PENNANT

12.50% 25% 50% 100%

Heavyweight Lightweight

Page 18: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Outline

§ Methodology

§ Software/manuallymanagedMLM

§ Hardware/automaticmanagedMLM

§ Conclusions

18

Page 19: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Hardwaremanagement

§ HardwaremanagementofMLMatthepagelevel§ CachepagesinHMC,pagestillresidesinDDR§ Comparedtoblocklevel:lowertrackingoverheadbuthigher

add/removeoverhead

§ Focuswashardwarecaching§ But,alsopossibletodocachingviaOS§ Usually,lessinformation(hits,misses,etc.)

19

Page 20: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

AutomaticPage-LevelSwapping§ Additionpolicies

§ Replacementpolicies

20

Directory Controller

DDR Fast Memory

MLM Unit

Mapping Table

DMAPolicy

Dispatcher

Addition Policies Replacement Policies• addT: Simple Threshold• addMFU: Most Frequently Used• addRAND: 1:8192 chance• addMRPU: More Recent Previous Use• addMFRPU: More Frequent + More

Recent Previous Use• addSC: Deprioritize streams• addSCF: as addSC + More Frequent

• FIFO: First-in, First-out• LRU: Least Recently Used• LFU: Least Frequently Used• LFU8: LFU w/ 8-bit counter• BiLRU: BiModal LRU• SCLRU: Deprioritize streams

Page 21: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Performancevs.Policy

21

0.2

0.4

0.6

0.8

1

1.2

1.4

addMFRPU addMFU addMRPU addRAND addSC addT addSCF

Performan

ce

AddPolicy

Lulesh:MLMPerformancevsPolicy

BiLRU FIFO

LRU SCLRU

LFU8 LFU

0.2

0.4

0.6

0.8

1

1.2

1.4

addMFRPU addMFU addMRPU addRAND addSC addT addSCF

Performan

ce

AddPolicy

MiniFE:MLMPerformancevsPolicy

BiLRU FIFO

LRU SCLRU

LFU8 LFU

Addition policy: big variation

Replacement policy: little variation

“What you put in matters more than what you take out”

Page 22: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Largerdatasets

§ Lookedathighestperformingadditionpolicies§ Variantsofmost-frequentlyused§ Baseline:random§ LRUreplacement

22

01234567

1024 8192 65536

Perf(1

=nofastm

em)

Pages

Pennant-bPerformance:AdditionaddMFRPU AllFastaddRand addSCFaddMFU

0

0.5

1

1.5

2

1024 8192 65536

Perf(1

=nofastm

em)

Pages

Snap-p0Performance:Addition

addMFRPU AllFastaddRand addSCFAddMFU

0

2

4

6

1024 8192 65536Perf(1

=nofastm

em)

Pages

HPCGPerformance:AdditionaddMFRPU Series2addRAND addSCFaddMFU

Page 23: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

FineTuning1. Thresholds

2. Pagesize3. Throttling

23

0

0.5

1

1.5

2

0 20 40 60 80

Performance

Threshold

PennantThreshold

0

1

2

3

4

5

9 11 13 15

Rel.Pe

form

PageSize(2x B)

PennantPagesizeEffects

128M256M

1.45

1.5

1.55

1.6

9 11 13 15

Rel.Pe

form

PageSize(2x B)

snap-p0PagesizeEffects

128M256M

0.5

1.5

2.5

3.5

0 200 400 600 800 1000

Perfo

rman

ce

Threshold

MLMPerformacevs.Threshold(addT/LRU)

CoMDlammpsluleshminiFE

0

0.2

0.4

0.6

0.8

1

CoMD lammps lulesh miniFEPe

rforman

ce

SwapThro0ling

Thro;le

NoThro;le

Page 24: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND

Conclusions§ Applicationbehaviorvariessignificantly§ Softwaremanagementisfeasible

§ Smalltomoderatenumberofdense callsites§ Staticallocationsufficientinsomecases,dynamicnecessaryinothers

§ Forhardwaremanagement,additionpolicymattersmost§ Forautomatic&manual,profilingisinstrumental

§ Applicationmanaged– helpsidentifyhigh-bandwidthdata§ Automaticallymanaged– helpsidentifyplaceswhereapplication

changeswillimproveperformance

24

Page 25: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND