Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi...

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core

Cluster

Vignesh Ravi and Gagan Agrawal

{raviv,agrawal}@cse.ohio-state.edu

OUTLINE

• Motivation • FREERIDE Middleware• Generalized Reduction structure• Shared Memory Parallelization techniques• Scalability results - Kmeans, Apriori & EM• Performance Analysis results• Related work & Conclusion

Motivation

• Availability of huge amount of data – Data-intensive applications

• Advent of multi-core• Need for abstractions and parallel

programming systems• Best Shared Memory Parallelization (SMP)

technique is still not clear.

Context: FREERIDE

• A middle-ware for parallelizing Data-intensive applications

• Motivated by difficulties in implementing parallel datamining applications

• Provides high-level APIs for easier parallel programming

• Based on an observation of similar generalized reduction among many datamining and scientific applications

FREERIDE – Core

• Reduction Object – A shared data structure where results from processed data instances are stored

Types of Reduction• Local Reduction – Reduction within a single

node• Global Reduction – Reduction among a cluster

of nodes

Generalized Reduction structure

Parallelization Challenges

• Reduction object cannot be statically partitioned between threads/nodes– Data races should be handled at runtime

• Size of reduction object could be large– Replication can cause memory overhead

• Updates to reduction object is fine-grained– Locking schemes can cause significant overhead

Techniques in FREERIDE

• Full-replication(f-r) • Locking based techniques– Full-locking (f-l)– Optimized Full-locking(o-f-l)– Cache-sensitive locking( cs-l)

Memory Layout of locking schemes

Applications Implemented on FREERIDE

• Apriori (Association mining)• Kmeans (Clustering based)• Expectation Maximization (E-M) (clustering

based)

Goals in Experimental Study

• Scalability of data-intensive applications on multi-core

• Comparison of different shared memory parallelization (SMP) techniques and mpi

• Performance analysis of SMP techniques

Experimental setup

Each node in the cluster has:• Intel Xeon E5345 CPU• 2 Quad-core machine• Each core 2.33GHz• 6GB Main memoryNodes in cluster are connected by Infiniband

Experiments

Two sets of experiments:• Comparison of scalability results for f-r, cs-l, o-f-l and mpi

with k-means, Apriori and E-M– Single node– Cluster of nodes

• Performance analysis results with k-means, Apriori and E-M

Applications data setup

• Apriori– Dataset size 900MB– Support = 3%, Confidence = 9%

• K-means– Dataset size 6.4 GB– 3-Dimensional points– No. of clusters, 250

• E-M– Dataset size 6.4 GB– 3-Dimensional points– No. of clusters, 60

Apriori (Single node)

Apriori (cluster)

k-means (single node)

K-means (cluster)

E-M (Single node)

E-M (cluster)

Performance Analysis of SMP techniques

• Given an application can we predict the factors that determines the best SMP technique?

• Why locking techniques suffer with Apriori, but competes well with other applications?

• What factors limit the overall scalability of data-intensive applications?

Performance Analysis setup

• Valgrind used for the Dynamic Binary Analysis• Cachegrind used for the analysis of cache

utilization

Performance Analysis

Locking vs Merge Overhead

Performance Analysis (contd…)Relative L2 misses for reduction object

Performance Analysis (contd …) Total program read/write misses

Analysis• Important Trade-off– Memory needs of application– Frequency of updating reduction object

• E-M is compute and memory intensive– Locking overhead is very low– Replication overhead is high

• Apriori has high update fraction and very less computation– Locking overhead is extremely high– Replication performs the best

Related Work

• Google Mapreduce• Yahoo Hadoop• Phoenix – Stanford university• SALSA – Indiana university

Conclusion• Replication and locking schemes can outperform

each other• Locking schemes have huge overhead when there is

little computation between updates in ReductionObject

• MPI processes competes well upto 4 threads, but experiences communication overheads with 8 threads

• Performance analysis shows memory needs of an application and update fraction are significant factors for scalability

Thank you!!!!Questions???

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi...

Documents

Transcript of Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi...

ISRAEL › Lisboa › NewsAndEvents › Documents...Manager, Lior Raviv, explains “Israel is a small country that has everything”. For perfectionist Raviv, this innovative quest

Welcome to CSE459.11 Name: Di Cao Email: caod@cse.ohio-state.edu caod@cse.ohio-state.edu Classroom: DL357 Class Time: T 8:30am - 9:18am Office.

Harris Raviv - Cs 1991

1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal {liut,agrawal}@cse.ohio-state.edu Ohio State University April 12, 2011.

Name : Mr. Agrawal Anjani K Spouse : Mrs. Agrawal Anjana ... · PDF fileName : Mr. Agrawal Anjani K Spouse : Mrs. Agrawal Anjana Designation : Managing Director Office : ... Spouse

…ركز الارشاد النفسي... · q 'NV) George and Linda " Raviv " , Wiesner " Raviv JS ... (h éú.,.g . (Ÿq ('0

…ركز الارشاد النفسي بعض...q 'NV) George and Linda " Raviv " , Wiesner " Raviv JS Tesh " Gaeger " " ) (h éú.,.g

Advanced MPI Capabilities - cse.ohio-state.edu

Harris, M. and A. Raviv

School Psychology International - TAUfreud.tau.ac.il/~raviv/pdf/Raviv et al 2002 - The Israeli... · 2010-06-23 · School Psychology International 2002; 23; 283 Amiram Raviv, Sharon

Anshika Agrawal; Stuti Agrawal S0601csef.usc.edu/CSSF/History/2017/Projects/S06.pdf · Anshika Agrawal; Stuti Agrawal The Most Effective Antacid S0601 Objectives/Goals Several antacids

Dr. Agrawal

Capital Structure Decisions1 Professor Artur Raviv Kellogg School of Management.

Wangfa@cse.ohio-state.edu Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.

INDIVIDUAL MEASUREMENT OF EXPOSURE TO ...freud.tau.ac.il/~raviv/pdf/Raviv et al 2001 - exposure to...of mild and severe violent acts in different settings ~home, school, neighborhood!,

MR.KALAYAN AGRAWAL

Diseño Organizacional Harris & Raviv

Raj agrawal

1 Yue Qiao Computer Science and Engineering Email: qiaoyu@cse.ohio-state.eduqiaoyu@cse.ohio-state.edu Sep. 10 2015 AirExpress: Enabling Seamless In-band.

Agrawal nuevo.pdf