I/O Profiling Towards the Exascale

28
I/O Profiling Towards the Exascale [email protected] ZIH, Technische Universität Dresden NEXTGenIO & SAGE: Working towards Exascale I/O Barcelona, May 19th, 2017

Transcript of I/O Profiling Towards the Exascale

Page 1: I/O Profiling Towards the Exascale

I/O Profiling Towards the Exascale [email protected] ZIH, Technische Universität Dresden NEXTGenIO & SAGE: Working towards Exascale I/O Barcelona, May 19th, 2017

Page 2: I/O Profiling Towards the Exascale

NEXTGenIO facts

Project •  Research & Innovation

Action •  36 month duration •  €8.1 million

Partners •  EPCC •  INTEL •  FUJITSU •  BSC •  TUD •  ALLINEA •  ECMWF •  ARCTUR

May19th,2017 NextGenIO/SAGEworkshop

Page 3: I/O Profiling Towards the Exascale

Approx. 50% committed to hardware development

•  Note: final configuration may differ

May19th,2017 NextGenIO/SAGEworkshop

Page 4: I/O Profiling Towards the Exascale

Intel™ DIMMs are a key feature

• Non-volatile RAM •  3D XPoint technology

• Much larger capacity than DRAM • Slower than DRAM •  By a certain factor •  Significantly faster than SSDs ™

• 12 DIMM slots per socket •  Combination of DDR4 and Intel™ DIMMs

NextGenIO/SAGEworkshopMay19th,2017

Page 5: I/O Profiling Towards the Exascale

Three usage models

• The “memory” usage model •  Extension of the main memory •  Data is volatile like normal main memory

• The “storage” usage model •  Classic persistent block device •  Like a very fast SSD

• The “application direct” usage model •  Maps persistent storage into address space •  Direct CPU load/store instructions

NextGenIO/SAGEworkshopMay19th,2017

Page 6: I/O Profiling Towards the Exascale

New members in memory hierarchy

•  New memory technology •  Changes the memory

hierarchy we have •  Impact on applications

e.g. simulations? •  I/O performance is one of

the critical components for scaling up HPC applications and enabling HPDA applications at scale

HPC systems today HPC systems of the future

CPU

Memory NVRAM

Spinning storage disk

Register

Cache

Memory & Storage Latency Gaps

Storage tape

1x

100,000x

10x

10x

10,000x

DRAM

Storage SSD

CPU

Register

Cache

1x

10x

10x

DRAM

Spinning storage disk

Storage disk - MAID

Storage tape

10x

100x

100x

1,000x

10x

socket socket

socket

socket

socket

socket

DIMM

DIMM

DIMM

IO

IO

backup

IO

backupbackup

May19th,2017 NextGenIO/SAGEworkshop

Page 7: I/O Profiling Towards the Exascale

Remote memory access on top

• Network hardware will support remote access • Data in NVDIMMs •  To be shared between nodes

• Systemware •  Support remote access •  Data partitioning and replication

NextGenIO/SAGEworkshopMay19th,2017

Page 8: I/O Profiling Towards the Exascale

Filesystem

Network

Memory Memory

Node

Memory Memory Memory Memory

Node

Node NodeNodeNode

Filesystem

Using distributed storage

•  Global file system •  No changes to apps

•  Required functionality •  Create and tear down file

systems for jobs •  Works across nodes •  Preload and postmove

filesystems •  Support multiple

filesystems across system •  I/O Performance

•  Sum of many layers

May19th,2017 NextGenIO/SAGEworkshop

Page 9: I/O Profiling Towards the Exascale

Filesystem

Network

Memory Memory

Node

Memory Memory Memory Memory

Node

Node NodeNodeNode

Objectstore

Using an object store

•  Needs changes in apps •  Needs same functionality

as global filesystem •  Removes need for POSIX

functionality •  I/O Performance

•  Different type of abstraction

•  Mapping to objects •  Different kind of

Instrumentation

May19th,2017 NextGenIO/SAGEworkshop

Page 10: I/O Profiling Towards the Exascale

Job1

Filesystem

Job2Job3

Job4Job2

Job2 Job2 Job4

Towards workflows

•  Resident data sets •  Sharing preloaded data

across a range of jobs •  Data analytic workflows •  How to control access/

authorisation/security/etc….?

•  Workflows •  Producer-consumer

model •  Remove file system from

intermediate stages •  I/O Performance

•  Data merging/integration?

May19th,2017 NextGenIO/SAGEworkshop

Page 11: I/O Profiling Towards the Exascale

Tools have three key objectives

•  Analysis tools need to •  Reveal performance

interdependencies in I/O and memory hierarchy

•  Support workflow visualization

•  Exploit NVRAM to store data themselves

•  (Workload modelling)

May19th,2017 NextGenIO/SAGEworkshop

Page 12: I/O Profiling Towards the Exascale

Vampir & Score-P

June2nd,2017 LUG17 18

Page 13: I/O Profiling Towards the Exascale

How to meet the objectives?

•  File I/O, NVRAM performance •  Monitoring (data acquisition)

•  Sampling •  Tracing

•  Statistical analysis (profiles) •  Time series analysis

• Multiple layers •  Simultaneously •  Topology context

• Workflow support •  Merge and relate performance data

• Data sources

May19th,2017 NextGenIO/SAGEworkshop

Page 14: I/O Profiling Towards the Exascale

Tapping the I/O layers

•  I/O layers •  POSIX •  MPI-I/O •  HDF5 •  NetCDF •  PNetCDF •  File system (Lustre, Adios)

• Data of interest •  Open/Create/Close operations (meta data) •  Data transfer operations

May19th,2017 NextGenIO/SAGEworkshop

Page 15: I/O Profiling Towards the Exascale

What the NVM library tells us

• Allocation and free events •  Information •  Memory size (requested, usable) •  High Water Mark metric •  Size and number of elements in memory

• NVRAM health status •  Not measurable at high frequencies

•  Individual NVRAM load/stores •  Remain out of scope (e.g. memory mapped files)

May19th,2017 NextGenIO/SAGEworkshop

Page 16: I/O Profiling Towards the Exascale

Memory Access Statistics

• Memory access hotspots for using DRAM and NVRAM? •  Where? When? Type of memory?

• Metric collection needs to be extended 1.  DRAM local access 2.  DRAM remote access (on a different socket) 3.  NVRAM local access 4.  NVRAM remote access (on a different socket)

May19th,2017 NextGenIO/SAGEworkshop

Page 17: I/O Profiling Towards the Exascale

Access to PMU using perf

• Architectural independent counters •  May introduce some overhead

•  MEM_TRANS_RETIRED.LOAD_LATENCY •  MEM_TRANS_RETIRED.PRECISE_STORE •  Guess: It will also work for NVRAM?

• Architectural dependent counters •  Counter for DRAM

•  MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_DRAM •  MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM •  MEM_LOAD_UOPS_*.REMOTE_NVRAM ? •  MEM_LOAD_UOPS_*.LOCAL_NVRAM ?

May19th,2017 NextGenIO/SAGEworkshop

Page 18: I/O Profiling Towards the Exascale

I/O operations over time

May19th,2017 NextGenIO/SAGEworkshop

IndividualI/OOperaGon

I/ORunGmeContribuGon

Page 19: I/O Profiling Towards the Exascale

I/O data rate over time

May19th,2017 NextGenIO/SAGEworkshop

I/ODataRateofsinglethread

Page 20: I/O Profiling Towards the Exascale

I/O summaries with totals

May19th,2017 NextGenIO/SAGEworkshop

OtherMetrics:•  IOPS•  I/OTime•  I/OSize•  I/OBandwidth

Page 21: I/O Profiling Towards the Exascale

I/O summaries per file

May19th,2017 NextGenIO/SAGEworkshop

Page 22: I/O Profiling Towards the Exascale

I/O operations per file

May19th,2017 NextGenIO/SAGEworkshop

Focusonspecificresource

Showallresources

Page 23: I/O Profiling Towards the Exascale

Taken from my daily work...

•  Bringing the system I/O down •  with a single (serial)

application •  Higher I/O demand than

IOR benchmark •  Why?

May19th,2017 NextGenIO/SAGEworkshop

Page 24: I/O Profiling Towards the Exascale

Coarse grained time series reveal some clue, but...

May19th,2017 NextGenIO/SAGEworkshop

Page 25: I/O Profiling Towards the Exascale

Details make a difference

May19th,2017 NextGenIO/SAGEworkshop

AsingleNetCDFget_vara_floattriggers...

...15!POSIXreadoperaGons

Page 26: I/O Profiling Towards the Exascale

Approaching the real cause

May19th,2017 NextGenIO/SAGEworkshop

AsingleNetCDFget_vara_floattriggers...

...15!POSIXreadoperaGons

Evenworse:NetCDFreads136kbto

providejust2kb

Page 27: I/O Profiling Towards the Exascale

Before and after…

May19th,2017 NextGenIO/SAGEworkshop

Page 28: I/O Profiling Towards the Exascale

Summary

• NEXTGenIO developing a full hardware and software solution

• Performance focus •  Consider complete I/O stack •  Incorporate new I/O paradigms •  Study implications of NVRAM

• Reduce I/O costs • New usage models for HPC and HPDA

May19th,2017 NextGenIO/SAGEworkshop