Distinguishing Parallel and Distributed Computing Performance
High Performance Distributed Computing
description
Transcript of High Performance Distributed Computing
Henri BalVrije Universiteit Amsterdam
High Performance Distributed Computing
Outline
1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle
4. Conclusions
Developments
• Multiple types of data explosions:– Big data: huge processing/transportation
demands– Complex heterogeneous data
LOFAR: ~15 PB/yearSKA: >300 PB/year, exascale processing
Complex data
Developments
• Infrastructure explosion– High complexity: heterogeneous systems with
diversity of processors, systems, networks
VU HPDC GROUP• Bridge the gap between demanding
applications and complex infrastructure• Distributed programming systems for
– Clusters, grids, clouds– Accelerators (GPUs) – Heterogeneous systems (``Jungles”) – Clouds & mobile devices
• Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….
Highlights VU-HPDC group
1st Prize: SCALE 2008
AAAI-VC 2007 DACH 2008 - BS DACH 2008 - FT
3rd Prize: ISWC 2008 1st Prize: SCALE 2010 EYR 2011Sustainability award
Solved Awari 2002
Links to data science cycle
Understand and decide
Analyze and model
Store and process
ReasoningKnowledge representati
on
MultimediaRetrieval
Modeling and
simulationMachine Learning
Information Retrieval
Decision Theory
PerceptionCognition
VisualAnalytics
DistributedProcessing
Large Scale Databases
SoftwareEng.
System / Network
Eng.
Distributed reasoning
Jungle computingMapReduce
Reasoning – Semantic Web
• Make the Web smarter by injecting meaning so that machines can “understand” it.o initial idea by Tim Berners-Lee in 2001
• Now attracted the interest of big IT companies
Google Example
Google Example
Distributed Reasoning• WebPIE: web-scale distributed reasoner
doing full materialization• QueryPIE: distributed reasoning with
backward-chaining + pre-materialization of schema-triples
• DynamiTE: maintains materialization after updates (additions & removals)
Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data
With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen
COMMIT/
Glasswing: MapReduceon Accelerators
• Use accelerators as a mainstream feature• Massive out-of-core data sets• Scale vertically & horizontally• Code portability using OpenCL• Maintain MapReduce abstraction
With: Ismail El Helw, Rutger Hofman
Glasswing Pipeline
• Overlaps computation, communication & disk access
• Supports multiple buffering levels
Evaluation of Glasswing
• Glasswing uses CPU, memory & disk resources more efficiently than Hadoop
• Compute-bound applications benefit dramatically from GPUs
• Better scalability than Hadoop• Runs on a variety of accelerators• E.g. k-means clustering:
– 8.5× (1 node) vs.15.5 × (64 nodes) vs. 107 × (GPU node)