Stories About Spark, HPC and Barcelona by Jordi Torres
-
Upload
spark-summit -
Category
Data & Analytics
-
view
1.360 -
download
1
Transcript of Stories About Spark, HPC and Barcelona by Jordi Torres
STORIES ABOUT SPARK, HPC & BARCELONAJordi TorresBarcelona Supercomputing CenterUPC Barcelona Techwww.JordiTorres.eu - @JordiTorresBCN
Scientists always needed the best instruments which technology of the time allowed to build
Microscope (Santiago Ramon y Cajal) Large Hadron Collider (CERN)
And supercomputers today can be considered as the ultimate scientific
instrument that enables progress in science
The Evolution of The Research Paradigm
High Performance Computing means Numerical Simulation and Big Data Analysis that allows
Reduce expense Avoid dangerous experiments Help to build knowledge where experiments are impossible or not affordable
HPC is an enabler for all scientific fields
Life Sciences & Medicine
Earth Sciences
Astro, High Energy & Plasma Physics
Materials, Chemistry & Nanoscience
Engineering Neuroscience
Emergent focus on big data requires a transition of computing facilities into a data-centric paradigm too
However, traditional HPC systems are designed according to the compute-centric paradigm
We have experimented with this in our HPC facility in Barcelona.
And this is what I’m going to talk about today!
How can traditional HPC existing infrastructure evolve to meet the new demands?
Joint Research Centres with IT Companies
BSC-Microsoft Research Centre
BSC-IBM Technology Center for Supercomputing
Intel-BSC Exascale Lab
BSC-NVIDIA CUDA Center of Excellence
The Marenostrum 3 SupercomputerOver 1015 Floating Points Operations per second (Petaflop)
– Nearly 50,000 cores
– 100.8 TB of memory
– 2000 TB disk storage
The third of three brothers• 2004: MareNostrum 1
– Nearly 5x1013 Floating Points per second
– Nearly 5.000 cores– 236 TB disk storage
• 2006: MareNostrum 2– Nearly 1014 Floating
Points per second– Over 10.000 cores– 460 disk storage
• 2012: MareNostrum 3
Marenostrum ancestors in the chapelA parallel system inside the same chapel:
Grandparent:Processing capacity: Over 1000 operations-beats per minuteParallel system with 8 parallel typewriter units.
Grandmother:Storage capacity: over 100MbParallel Storage System with 14 drawer devices.
Until now, the habitual MN3 workloads have been numerical applications
• MN3 Basic software Stack:– OpenMP– MPI– Threads– …
How can MN3 evolve to meet new Big Data Analytics demand?
New module developed at BSC
MarenostrumSupercomputer+
Spark4MN in action• We performed a System level Performance
Evaluation & Tuning to MN3• Example of some results:
– Speed-up– Scale-up– Parallelism
Example 1: Kmeans Speed-upMore dimensions smaller speed-up because of increased shuffling (same number of centroids to shuffle but bigger)
• Times for running k-means for 10 iterations.
• Problem size constant = 100GBs (10M1000D = 10M vectors of 1000 dimensions)
Example 2: Kmeans Scale-up
• modify both the number of records and the number of machines.
• Ideally, all the plots should be horizontal
our system behaves closely to that.
Example 3: Configuring task parallelismVarying the number of tasks over the same amount of cores
for k-means, the best-performing configuration is to have as many partitions as cores = 1 task per core is better!
• Median times for running k-means for 10 iterations with different number of partitions
• In our benchmarks the number of tasks is equal to the number of RDD partitions.
Example 3: Configuring task parallelism• Using Sort-by-key: a more intensive shuffling-intensive scenario
– We sort 1 billion records using 64 nodes & different partition sizes
– Contrary to the previous case, we observe speed-ups when there are 2 partitions per core
Exemple 4: sort-by-key• How many concurrent tasks an executor
can supervise?Having 2 8-core executors instead of 8 2-core ones, improves on the running time by a factor of 2.79 leaving all the other parameters the same.
More results on Friday at the Santa Clara conference!
2015 IEEE International Conference on Big Data October 29-November 1, Santa Clara, CA, USA
New Architecture Support for Big Data Analytics
Exponential increase in core countNever promising technologies (Hybrid Memory Cubes, NVRAM, etc)
Our Research Goal
Improve the node level performance of
state-of-the-art scale-out data
processing framework
+
Data Processing Capacity scaling at large input dataset
The performance of Spark workloads degrades with large volumes of data due to substantial increase in garbage collection and file I/O time.
Spark workloads do not saturate the available bandwidth and hence their performance is bound on DRAM latency
More results on
• A. J. Awan, M. Brorsson, V. Vlassov and E. Ayguade, "Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server", in 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud), Aug 2015, Dalina, China (Best Paper Award)
• A. J. Awan, M. Brorsson, V. Vlassov and E. Ayguade, "How Data Volume Affects Spark Based Data Analytics on a Scale-up Server", in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with 41st International Conference on Very Large Data Bases, Sep 2015, Hawaii, USA.
BSC programming model COMPSs
– Sequential programming model
– Abstracts the application from the underlying distributed infrastructure
– Exploit the inherent parallelism at runtime
We are studying the comparison and interaction between these two programming
models in platforms like marenostrum 3
MarenostrumSupercomputer
MarenostrumSupercomputer
Profiling Spark with BSC’s HPC tools
• Relying on over 20 years HPC experience & tools for profiling
• Preliminary work: Developed the Hadoop Instrumentation Toolkit
CPU
Memory
Page Faults
processes and communication
Project ALOJA: Benchmarking Spark
• Open initiative to Explore and produce a systematic study of Hadoop/Spark efficiency on different SW and HW
• Online repository that allows compare, side by side all execution parameters ( 50,000+ runs over 100+ HW config.)
Preliminary work• Multimedia Big Data Computing: Work with three kinds of data at the same time
social network
relationships
audiovisualcontent metadata
Preliminary case study Multimodal Data Analytics systems
E.g. Latent User Attribute Inference to Predicting Desigual Followers
44
Example of tools created: VectorizationNecessary for visual similarity search, visual clustering, classification, etc.
45
Available in our github: bsc.spark.image scala> import bsc.spark.image.ImageUtils…scala> images = ImageUtils.seqFile("hdfs://...", sc);
scala> dictionary = ImageUtils.BoWDictionary(images);
scala> vectors = dictionary.getBags(images); … scala> val splits = vectors.randomSplit(Array(0.6, 0.4), seed = 11L)
scala> training = splits(0)
scala> test = splits(1)
scala> model = NaiveBayes.train(training, lambda = 1.0)…
Applications: Locality Sensitive Hashinge.g. near-replica detection (visual spam detection, copyright infringement)
PATCH 1
PATCH 2
PATCH 3
PATCH 4
KP1
KP2
KP3
KP4
feature detection
feature description
0000 0100 1100
0010 0110 1110
0011 0111 1111
features are sketched, embedded into a Hamming space
Similar features are hashed into similar buckets in a hash table
SIFT, SURF, ORB, etc.
0 1 1 0
Current work: Computer Vision
• Makes very productive use of (convolutional) neural networks • SIFT features became unnecessary (used for decades)
BSC vision:Giving computers a greater
ability to understand information, and to learn, to
reason, and act upon it
Old wine in a new bottle?
• the term itself dates from the 1950s. • periods of hype and high
expectations alternating with periods of setback and disappointment.
Artificial Intelligence
plays an important
role
Why Now?1. Along the explosion of
data …
now algorithms can be “trained” by exposing them to
large data sets that were previously unavailable.
2. And the computing power necessary to
implement these algorithms are now available
Evolution of computing powerFLOP/second
1988Cray Y-MP (8 processadors)
1998Cray T3E (1024 processadors)
2008Cray XT5 (15000 processadors)
~2019? (1x107 processadors
This new type of computing requires
DATA
SupercomputersResearch
Big DataTechnologies
Advanced Analytic
Algorithms
1. the continuous development of supercomputing systems
2. enabling the convergence of advanced analytic algorithms
3. and big data technologies
Cognitive Computing requires a transition of computing facilities into a new paradigm too
Name? … We use Cognitive Computing
Yesterday Today Tomorrow
Welcome to our academic activities• Teaching Spark @ Master courses• Using Spark @ Final Master Thesis• Using Spark @ Research activity
• NEW Spark Book in Spanish• Editorial UOC • Presentation November 3, 2015
61
Foreword by Matei Zaharia