BDVA & ETP4HPC Workshop - EXDCI · Legal, Payroll, finance, etc. Compute-intensive workloads:...
-
Upload
hoangtuyen -
Category
Documents
-
view
214 -
download
0
Transcript of BDVA & ETP4HPC Workshop - EXDCI · Legal, Payroll, finance, etc. Compute-intensive workloads:...
6-7-2017 2www.bdva.eu
Agenda item Actions TimeWelcome & Introductions 08:00-08:15
1. Common HPC and BD Glossary: determine a common reference to understand possible relationships between a typical HPC stack and a BD Analytics stack
BDVA Lead: Jim Kenneally (Intel Corp.)
HPC lead: Mark Asch(Univ. de Picardie), Hans-Christian Hoppe (Intel)
2. Cross-Pollination of HPC and BD Technologies: What respective technologies/approaches from a HPC stack or a BD Analytics stack can benefit the other’s needs e.g. respective hybrid systems that incorporate select elements from either HPC or BD technologies/approaches
BDVA lead – Nenad Stojanovic (Nissatech), supported by Gabriel Antoniu (Inria) & Alexandru Costan (Irisa)
HPC lead - Costas Bekas, IBM Research (available @10am), Mark Asch (Univ. de Picardie),
3. Extreme BD Workloads: Understand bottlenecks through better appreciation of centralised and decentralised processing of extreme big data workloads
BDVA Lead: Maria Perez (UPM)
HPC lead: Mark Asch, Stephane Requena (GENCI)
4. Collaboration between HPC CoEs and BD CoEs: Facilitate better communication between HPC’s CoE (https://exdci.eu/collaboration/coe) and BD COEs (http://i-know.tugraz.at/european-network/)
BDVA lead: Paul Czech (know-center) -HPC Lead: Erwin Laure (PDC-KTH)Available 11-12
5. Extreme Scale Demonstrators (ESDs) and Exascale co-design: Information update only
HPC lead: Michael Malms (ETP4HPC)
6. User engagement, that include understanding user base (UX Analysis), Skills development, Business Models
BDVA lead: Andrea Manieri (ENG)
HPC Lead:Francois Bodin (IRISA), Catherine Inglis (epcc)7. Explore options for possible collaborations in view of forthcoming WP18-20
All
Close 12:00
Agree timings
July 4th, 2017EXDCI WP25
HPC, Big Data &Deep Learning Stacks
HPC Big Data Deep Learning
Infiniband &
OPA fabrics
Storage &
I/O nodes
x86 nodes,
GPUs, FPGAs
Linux OS Variant
Containers
PFS
(Lustre etc.)
MPIOpenMP,
threading
Accelerator
APIs
Numerical
libraries
Performance &
debugging
Domain-specific libraries
Compiled languages (C, C++,
FORTRAN)
Scripting lang.
(Python)
IDEs & Frameworks
(PETSc, …)
Compiled in-house, commercial & OSS applications
Cluster management
(OpenHPC)
Batch scheduling
(SLURM …)
Linux OS Variant (some Windows)
Ethernet
fabrics
Local
storage
x86 hyper-
convergent nodes
Virtualization: hypervisor or containers
(Dockers, Kubernetes, …)
VMM and container management
I/O libraries
(HDF5, …)
Orchestration and RMS
Cloud service I/FStorage systems
(DFS, Key/value, …)
Map-Reduce Processing
(Hadoop, Spark)
Data stream processing
(Storm, …)
Distributed coordination
(Zookeeper, …)
Workflows combining many application elements
Compiled languages
(C++)
Traditional ML
(Mahout)
Scripting & WF languages
(R, Python, Java, …)
Linux OS Variant (Windows?)
Ethernet
(traditional)
x86 +
GPU/FPGA, TPUInfiniband + OPA
(scale-out)
Virtualization: hypervisor or containers
(Dockers, Kubernetes, …)
VMM and container management
Orchestration and RMS
Cloud service I/FStorage systems
(DFS, Key/value, …)
Numerical libraries
(dense LA)
Neural network frameworks
(Caffe, Torch, Theano, … )
Load distribution layer
Accelerator APIs
Scripting languages
(Python, …)
Inference engines
(low precision)
Defined and instantiated/trained neural networks
Can be part of
Applications
Middleware
& Mgmt.
System
SW
Hardware
Source: Hans-Christian Hoppe, Intel Corp
6-7-2017 7www.bdva.eu
Historical Differences between Big Data and HPC
Workload
type
Typical workload focus is Design principles for
infrastructure and
software are
Big Data Data-intensive: devote
most of their processing
time to I/O and
manipulation of data
optimised for cost (IOPS)
first, rather than
maximum performance
HPC Compute-intensive:
devote most of their
execution time to
computation
optimised for
performance (FLOPS) first,
rather than for minimal
cost.
6-7-2017 8www.bdva.eu
Traditional Big Data Extreme Data Analytics
Enterprise IT HPC
Data-intensive workloads
[Example] Inferring new insights from big data-sets e.g. pattern recognition across suppliers, consumers, etc for data-driven insights and innovation
Compute- and Data intensive workloads:[Example] Reshaping healthcare through advanced analytics and artificial intelligence – leading to predictive and personalized medicine
‘Regular’ workloads
[Example] Running the enterprise – HR, Legal, Payroll, finance, etc.
Compute-intensive workloads:
[Example] Modelling and simulating focusing on interaction amongst parts of a system and the system as a whole e.g. product design
The hyper-growth area of Machine and Deep Learning and AI sit at the intersection of HPDAand HPC…Key workloads include video analysis, image speech & text analytics, medicine, IoT, ADAS, Security
Real-World Use Cases • Fraud/error anomaly detection e.g. FSI• Intelligence community e.g. anti-terrorism, anti-crime• Cyber security• Data-driven science/ engineering (e.g., biology)• Knowledge discovery e.g. ML/DL, cognitive, AI
6-7-2017 9www.bdva.eu
Cross-Pollination of HPC and BD Technologies
Cross-Pollination of respective BD and HPC platforms to build respectively
for
compute-intensive analytics (BD)
data-driven simulations (HPC)
Complex scenarios of this type of computation are emergingThe entire engineering domain based on digital twins is full of scenarios requiring a
hybird system
Digital twins use data from sensors installed on physical objects to represent their near
real-time status, working condition or position.
Increasingly used for improving the real-time operation of complex products/systems
6-7-2017 11www.bdva.eu
DIGITAL TWIN = digital model exists1000+ parameters
10+ parameters
COMPUTE INTENSIVE
DATA LESS-INTENSIVE
HPC
COMPUTE LESS INTENSIVE
DATA INTENSIVE
BD
HPC and BD(separate)
6-7-2017 12www.bdva.eu
DIGITAL TWIN = digital model exists
COMPUTE INTENSIVE
1000+ parameters
10+ parameters
DATA LESS-INTENSIVE
DATA INTENSIVE
TBs/hour
COMPUTE INTENSIVE
PBs
HPC
BD
HPC and BD INTEGRATED
EXTREME DATAANALYTICS BEHAVIOUR
SIMULATIONS
Model of normal
behaviour(predictive)
BEHAVIOUR PREDICTIONS
DATA TWIN
6-7-2017 13www.bdva.eu
Connected car example
1
380 million connected cars will be on the road by 2021
Ford: Predicting data storage requirements of 200PB by 2021 – growing from today’s 13PB
1TB data/hour
PB dataEXTREME DATAANALYTICS
DATA TWIN
BD HPCDIGITAL TWIN
HPCsimulations
edge analytics
streaming analytics
BD
BD
BD
6-7-2017 14www.bdva.eu
What are BD advantages (connected car example)
Stream processing
Efficient complex pipelines for real-time processing (e.g. Storm)
Real-time stream analytics (on-the-fly, no storage)
Edge analytics (real-time, on-the-fly, no storage)
Methods for real-time stream analytics can be downsized to work efficiently on the edge
Service logistics
Analytics on different levels
Combining real-time and batch processing (lambda architecture)
6-7-2017 15www.bdva.eu
What are challenges for BD
Streaming analytics
The streams and context can be dynamic, implying the need for
dynamically changing processing infrastructure (e.g. Storm has a static
topology)Self-adaptivity is the goal
Intelligent service placementEdge off-loading
Efficiency in processing extremly huge datasets