Data-centric cloudification of scientific applications with many … · 2017-12-14 · Data-centric...

Data-centric cloudification of scientificapplications with many-task computing and

map-reduce

Silvina Caıno Lores

Computer Architecture, Communication and Systems AreaDepartment of Computer Science, University Carlos III of Madrid

February 11, 2016

Data-centric cloudification of scientific applications with many-task computing and map-reduce

1 Introduction

2 Data-Centric Transformation Methodology

3 Enabling Large-Scale Parallelism

4 EvaluationApplication Analysis and AdaptationExecution EnvironmentsAssessing the Cloudified ApplicationScalability Study of the Many-Task Deployment

5 Conclusions

6 The HGS Case Study

Introduction

Context: Supercomputers

High-performance computing(HPC) targets complexcomputational problems and largeamounts of data by aggregatingcomputing resources anddeveloping parallel processingtechniques.

Limited by power consumptionand hardware architecture.

Tianhe-2 supercomputer

Introduction

Context: Cloud Computing

Cloud computing relies on resource sharing and virtualization toprovide on-demand elastic resources.

Highly scalable alternative to grid/cluster infrastructures.

Abstraction layers as service models.Infrastructure-as-a-Service (Iaas): raw computing resources.Platform-as-a-Service (PaaS): computing frameworks.Software-as-a-Service (SaaS): production-ready applications.Anything-as-a-Service (XaaS): databases, networks, security,simulations...

A popular provider example: Amazon Elastic Compute Cloud(EC2).

Reddit, Twiitch, IMDb, NASA, Pinterest...

Introduction

Motivation

Scientific simulations are widely used to model real-worldphenomena.

Resource intensive.I/O and intermediate data volumes keep increasing (Lang et al.,2009).Traditionally run on HPC infrastructures (limited by underlyingresources).

One simulation is not sufficient.

Expert systems.Several variables and domains.

Cloud device aggregation can make up for lack of HPC scalability.

Introduction

Objectives

1 Migrate scientific simulators to the Cloud while retainingperformance.

2 Minimise impact in the original code.

Benefits:

Increase performance and throughput.

Address larger problems.

Reduce economical and environmental costs.

Introduction

Trends in Cloudification Techniques

Several migration options:VM bundling (Srirama et al., 2013; Yu et al., 2011; D’Angelo,2011).

Middleware overhead.

Code redesign (Srirama et al., 2012; Ibrahim et al., 2010; Zhanget al., 2015).

Expensive development.

Our proposal:

Data-centric generalist wrap deployed as a many-task framework(Caıno-Lores et al., 2015; Carretero et al., 2015).

Introduction

Proposal Overview

1 Rely on map-reduce to induce parallelism.Minimise code manipulation.Immediate data-awareness.Simulation partitioning and distribution (Caıno-Lores et al., 2014;Caıno-Lores et al., 2014).

2 Follow an MTC deployment to overlap experiments.Increased granularity and task overlapping.Better utilisation and balance (Zhang et al., 2011).Suitable for distributed scientific computing (Manuali et al., 2012;Ogasawara et al., 2009).Fits parameter-based simulations nicely (Abramson et al., 2011;Dias et al., 2010).

3 Help the user to estimate the cluster size.Minimise deadline, cost, or a trade-off.Maximise resource usage.

Data-Centric Transformation Methodology

Map-Reduce in a Nutshell

PERSISTENT STORAGE

INPUT READING

OUTPUT GENERATION

MAP REDUCESHUFFLE

PERSISTENT STORAGE

map and reduce run independently and autonomously.

Data-Centric Transformation Methodology

Transformation Procedure

DATABASE

SIMULATION PARAMETERS

SELECT INDEPENDENT VARIABLE, T

ADAPTATION JOB

SIMULATION JOB

DATABASE

READ INPUT DATA

FORMAT INPUT DATA

(REDUCE)

FORMAT OUTPUT DATA

(REDUCE)

SIMULATION KERNEL T

i(MAP)

INTERMEDIATEDATA INDEXED

DEFINE OUTPUT FORMAT

Partition the application: run the same simulation kernel on a fragmentof the domain.

Enabling Large-Scale Parallelism

Many-Task Approach

Better utilisation due to granularity, but depends on platform tuning.

Deployment Scheme

··· Pp

Experiment partitions

··· Ee

··· Jj

Experiment subset

Inner jobs

··· TtJob tasks

Experiment poolProvides

Coordinator

Master

Slaves

Clients

Distributes

Submit

Manages

Execute

Management infrastructure

Execution infrastructure

Exploit map-reduces’s multi-tenancy to maximise resource usage.

Dimensioning Model

Select the client and slave types to meet the following objectives:

1 Balance both master-worker schemes.

Minimise the difference between the runnable tasks and schedulabletasks.

2 Optimise performance.

Maximise the number of tasks that can be run concurrently.

3 Minimise the virtual cluster’s operational costs.

Evaluation

Target Application

EXPERIMENT DBSIMULATIONPARAMETERS

TRAIN MOVEMENT FILES

SIMULATION RESULTS

ALLOCATECONSUMERS

SOLVECIRCUIT

ITERATIVEPROCESS

WRITERESULTS

= Tn,0

DATA MODULE

ALGORITHMMODULE

SIMULATION KERNEL FOR INSTANT Tn,i

WORKLOAD FOR THREAD Wn

THREADSCHEDULER

MERGEFILES

READSCENARIO

READCONSUMERS

READPARAMETERS

ONTOLOGYMODULE

Memory-bound railway electric power consumption simulator.

Relies on:

1 Description of the railway infrastructure.

2 Instantaneous train position and power demand.

Evaluation

Adaptation

MR Job 1

INPUT ADAPTATION

Instant | Parameters...

Train File 1

MR Job 2

SIMULATION EXECUTION

File 1

File 2

File K

OUTPUT

Instant | Parameter List...

Input File 1

ADAPTED INPUT

Infrastructure File

Train File 2

Train File I

Input File 2

Input File J

The temporal variable becomes the independent variable, Tx .

One simulation per instant.

Evaluation

Execution Environments

Configuration Platform Infrastructure

1 Multi-thread Cluster node1

2 Hadoop 2.2.0 Cluster node

3 Hadoop 2.2.0 EC2

Type Role Virtual CPUs Memory (GB) Local storage (GB)

m1.medium master 1 3.75 410

m2.xlarge slave2 2 17.1 420

148 Xeon E7 cores and 110GB of RAM2Five slaves used to match RAM in cluster node.

Evaluation

Performance Evaluation

I II III IV

Experiment

Total time

MRMR/EC2Original

Includes input data upload (data replication and balancing).

Performance with MR in the local node and the cloud is remarkablybetter than the original (68% and 85% less, respectively).

Platform overhead is significant with small experiments.16

Evaluation

Scalability

1 4 16 64

Number of experiments

Speed-up over one node

4 nodes16 nodes64 nodes

Performance does not scale up linearly with the number of nodes.

Resources become underutilised.

The infrastructure must fit the experiment pool.

Evaluation

Efficiency

1 4 16 64

Number of experiments

Efficiency over one node

4 nodes16 nodes64 nodes

Per-node efficiency:normalised speed-up with relationto the number of slaves (Gu-narathne et al., 2011).

System becomes underutilised as nodes are added with the sameexperiments.

Better efficiency with more experiments (even superlinear).

We can scale to thousands of experiments!

Conclusions

Highlights

Scientific applications require an increasing amount of computingresources.

Migrating applications to the Cloud could overcome theselimitations.

We proposed a methodology able to:1 Cloudifiy production-ready simulators.2 Maximise code reuse.3 Improve scalability.4 Support multidimensional/parametric studies efficiently.5 Minimise execution costs.

According to our results with a real application:Better performance and scalability.High efficiency and resource utilisation under heavy workload.The platform holds scalability issues on cluster configuration.

Conclusions

Future Works

Improve the adaptation model.

Multi-key mechanisms (several independent variables).More complex base functions (partition, group-by-key...)

Study MTC in federated infrastructures (private an public, hybrid).

Expand the dimensioning model.

Stage-based.Support for spot pricing.Mix with brokering systems.

Adopt the in-memory computing perspective.

Other applications (CPU bound, network intensive...).

The HGS Case Study

The HGS Case Study: Background

Compute-intensive MPI scientific application from thehydrogeology domain.

Kernel contained in Ensemble Kalman Filter.

Iterative in nature, but pleasingly parallel within each step.By requirement, kernel is a black box.Data can only be modified in files by an external library.

The HGS Case Study

The HGS Case Study: Approach

Current approach:

Distribute realizations, takepost-processing as barrier (i.e.iterative MR)

Towards in-memorycomputing: Spark (advancedMR) + Tachyon (faster I/O)

Potential issues:

Fault-tolerance.

Platform and streamingoverhead.

Loss of flexible data-locality.

PRE-PROCESSING

POST-PROCESSING

INITIAL DATA

INPUT DATA

OUTPUT DATA

REALIZATION 0 REALIZATION r-1REALIZATION 1

t == Tf

t = T0

Data-centric cloudification of scientificapplications with many-task computing and

map-reduce

Silvina Caıno Lores

Computer Architecture, Communication and Systems AreaDepartment of Computer Science, University Carlos III of Madrid

February 11, 2016

References I

Abramson, D., Bethwaite, B., Enticott, C., Garic, S., and Peachey, T. (2011). Parameterexploration in science and engineering using many-task computing. Parallel andDistributed Systems, IEEE Transactions on, 22(6):960–973.

Caıno-Lores, S., Garcıa, A., Garcıa-Carballeira, F., and Carretero, J. (2014). A cloudificationmethodology for numerical simulations. In Euro-Par 2014: Parallel Processing Workshops- Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, RevisedSelected Papers, Part II, pages 375–386.

Carretero, J., Caıno-Lores, S., Garcıa-Carballeira, F., and Garcıa, A. (2015). Amulti-objective simulator for optimal power dimensioning on electric railways using cloudcomputing. In Proceedings of the 5th International Conference on Simulation andModeling Methodologies, Technologies and Applications, pages 428–438.

Caıno-Lores, S., Fernandez, A. G., Garcıa-Carballeira, F., and Perez, J. C. (2015). Acloudification methodology for multidimensional analysis: Implementation and applicationto a railway power simulator. Simulation Modelling Practice and Theory, 55:46 – 62.

Caıno-Lores, S., Garcıa, A., Garcıa-Carballeira, F., and Carretero, J. (2014). Breaking datadependencias in numerical simulations using mapreduce. In XXV Jornadas de Paralelismo.

References II

D’Angelo, G. (2011). Parallel and distributed simulation from many cores to the publiccloud. In High Performance Computing and Simulation (HPCS), 2011 InternationalConference on, pages 14–23.

Dias, J., Ogasawara, E., de Oliveira, D., Pacitti, E., and Mattoso, M. (2010). Improvingmany-task computing in scientific workflows using p2p techniques. In Many-TaskComputing on Grids and Supercomputers (MTAGS), 2010 IEEE Workshop on, pages1–10.

Gunarathne, T., Wu, T.-L., Choi, J. Y., Bae, S.-H., and Qiu, J. (2011). Cloud computingparadigms for pleasingly parallel biomedical applications. Concurrency and Computation:Practice and Experience, 23(17):2338–2354.

Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., and Qi, L. (2010). Leen: Locality/fairness-awarekey partitioning for mapreduce in the cloud. In Cloud Computing Technology and Science(CloudCom), 2010 IEEE Second International Conference on, pages 17–24.

Lang, S., Carns, P., Latham, R., Ross, R., Harms, K., and Allcock, W. (2009). I/operformance challenges at leadership scale. In Proceedings of the Conference on HighPerformance Computing Networking, Storage and Analysis, SC ’09, pages 40:1–40:12.

References III

Manuali, C., Costantini, A., Lagana, A., Cecchi, M., Ghiselli, A., Carpene, M., and Rossi, E.(2012). Efficient workload distribution bridging htc and hpc in scientific computing. InMurgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A., Taniar, D., and Apduhan,B., editors, Computational Science and Its Applications – ICCSA 2012, volume 7333 ofLecture Notes in Computer Science, pages 345–357. Springer Berlin Heidelberg.

Ogasawara, E., de Oliveira, D., Chirigati, F., Barbosa, C. E., Elias, R., Braganholo, V.,Coutinho, A., and Mattoso, M. (2009). Exploring many task computing in scientificworkflows. In Proceedings of the 2Nd Workshop on Many-Task Computing on Grids andSupercomputers, MTAGS ’09, pages 2:1–2:10, New York, NY, USA. ACM.

Srirama, S., Ivanistsev, V., Jakovits, P., and Willmore, C. (2013). Direct migration ofscientific computing experiments to the cloud. In High Performance Computing andSimulation (HPCS), 2013 International Conference on, pages 27–34.

Srirama, S. N., Jakovits, P., and Vainikko, E. (2012). Adapting scientific computing problemsto clouds using mapreduce. Future Generation Computer Systems, 28(1):184 – 192.

Yu, D., Wang, J., Hu, B., Liu, J., Zhang, X., He, K., and Zhang, L.-J. (2011). A practicalarchitecture of cloudification of legacy applications. In Services (SERVICES), 2011 IEEEWorld Congress on, pages 17–24.

References IV

Zhang, Z., Barbary, K., Nothaft, F. A., Sparks, E., Zahn, O., Franklin, M. J., Patterson,D. A., and Perlmutter, S. (2015). Scientific computing meets big data technology: Anastronomy use case. arXiv preprint arXiv:1507.03325.

Zhang, Z., Katz, D. S., Ripeanu, M., Wilde, M., and Foster, I. T. (2011). Ame: An anyscalemany-task computing engine. In Proceedings of the 6th Workshop on Workflows inSupport of Large-scale Science, WORKS ’11, pages 137–146, New York, NY, USA. ACM.

Data-centric cloudification of scientific applications with many … · 2017-12-14 · Data-centric...

Documents

Transcript of Data-centric cloudification of scientific applications with many … · 2017-12-14 · Data-centric...

Software modernization and cloudification using the ARTIST … · Software Modernization and Cloudiﬁcation using the ARTIST Migration Methodology and Framework 133 milestone in

Theo Centric

Rearrangement planning using object-centric and robot-centric … · 2016. 5. 3. · Rearrangement planning using object-centric and robot-centric action spaces Jennifer E. King ⇤,

CloudFabric DCN Solution Technical Poster 02 · Huawei CloudFabric DCN Solution Technical Poster Enterprise IT Cloudification Demands Efficient Network Deployment Dramatic Increase

Combining client-centric and data-centric consistency models · The impact of data-centric models on session guarantees Session guarantees to obtain a data-centric model Data-centric

Let CLOUDI do the job v1

EarlyBridge case from product centric to customer centric eb

Centric AutoMotive

Risk Assessment Based Cloudification

Centric Magazine

Guest centric

Aeolus: a Component Model for the CloudI - Stefano …zack/research/publications/ic-2014-aeolus.pdf · Aeolus: a Component Model for the CloudI ... ANR-2010-SEGI-013-01 project Aeolus

Network Centric

Network Centric Operations Conceptual Framework Version 1 · Network Centric Operations Conceptual Framework Version 1 ... and mature the concepts of Network Centric ... Network Centric

Rearrangement planning using object-centric and robot-centric … · 2018-03-29 · Rearrangement planning using object-centric and robot-centric action spaces Jennifer E. King ⇤,

Life of Cloudi the Cloud

TANDBERG Centric 1700 MXP Data Sheet - Vuports Centric 1700 MXP Title TANDBERG Centric 1700 MXP Data Sheet Author TANDBERG Subject TANDBERG Centric 1700 MXP Keywords TANDBERG Centric

Namics: From Conten-Centric to User-Centric

Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

User-centric vs. System-centric Evaluation of Recommender ... · PDF fileUser-centric vs. System-centric Evaluation of Recommender Systems . Paolo Cremonesi. 1, Franca Garzotto , Roberto