August 28, 2015 1 Performance Analysis of Cloud Computing Services for Many-Tasks Scientific...

April 21, 20231

Performance Analysis of Cloud ComputingServices for Many-Tasks Scientific Computing

Berkeley, CA, USA

Alexandru Iosup, Nezih Yigitbasi, Dick Epema

Parallel and Distributed Systems Group,Delft University of Technology,The Netherlands

Simon Ostermann, Radu Prodan, Thomas Fahringer

Distributed and Parallel Systems,University of Innsbruck,

Austria

About the Team

• Team’s Recent Work in Performance• The Grid Workloads Archive (Nov 2006)• The Failure Trace Archive (Nov 2009)• The Peer-to-Peer Trace Archive (Apr 2010)• Tools: GrenchMark workload-based grid benchmarking, other

Monitoring and Perf. Eval. tools

• Speaker: Alexandru Iosup• Systems work: Tribler (P2P file sharing), Koala (grid scheduling),

POGGI and CAMEO (massively multiplayer online gaming)• Grid and Peer-to-Peer workload characterization and modeling

April 21, 20232

Many-Tasks Scientific Computing

• Jobs comprising Many Tasks (1,000s) necessary to achieve some meaningful scientific goal• Jobs submitted as bags-of-tasks or over short

periods of time• High-volume users over long periods of time

• Common in grid workloads [Ios06][Ios08]

• No practical definition (from “many” to “10,000/h”)

April 21, 20233

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 4

The Real Cloud

• “The path to abundance”• On-demand capacity• Cheap for short-term

tasks• Great for web apps (EIP,

web crawl, DB ops, I/O)

• “The killer cyclone”• Not so great

performance for scientific applications1 (compute- or data-intensive)

• Long-term perf. variability2

http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ Tropical Cyclone Nargis (NASA, ISSS, 04/29/08)

1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks

Scientific Computing, (under submission).

2- Iosup et al., On the Performance Variability of Production Cloud Services,

Technical Report PDS-2010-002, [Online] Available:

http://pds.twi.tudelft.nl/reports/2010/PDS-2010-002.pdf

Research Question and Previous WorkDo clouds and Many-Tasks Scientific Computing

fit well, performance-wise?• Virtualization Overhead

• Loss below 5% for computation [Barham03] [Clark04]• Loss below 15% for networking [Barham03] [Menon05]• Loss below 30% for parallel I/O [Vetter08] • Negligible for compute-intensive HPC kernels [You06] [Panda06]

• Cloud Performance Evaluation• Performance and cost of executing a sci. workflows [Dee08]• Study of Amazon S3 [Palankar08]• Amazon EC2 for the NPB benchmark suite [Walker08] or

selected HPC benchmarks [Hill08]

Theory: just use virtualization overhead results. Practice?

April 21, 20235

April 21, 20236

Agenda

1. Introduction & Motivation2. Proto-Many Task Users3. Performance Evaluation of Four Clouds4. Clouds vs Other Environments5. Take Home Message

Proto-Many Task Users

MTC user• At least J jobs in

B bags-of-tasks

Trace-based analysis• 6 grid traces,

4 parallel productionenvironment traces

• Various criteria (combinations of values for J and B)

Results• “number of BoTs submitted 1,000 & number of tasks submitted

10,000”• Easy to grasp + Dominate most traces (jobs and CPUTime) + 1-CPU jobs

April 21, 20237

April 21, 20238

Agenda

1. Introduction & Motivation2. Proto-Many Task Users3. Performance Evaluation of Four Clouds

1. Experimental Setup2. Selected Results

4. Clouds vs Other Environments5. Take Home Message

Experimental SetupEnvironments

Four commercial IaaS clouds (NIST definitions)• Amazon EC2• GoGrid• Elastic Hosts• Mosso

No Cluster instances(not releasedin Dec’08-Jan’09)

April 21, 20239

Experimental SetupExperiment Design

Principles• Use complete test suites• Repeat 10 times• Use defaults, not tuning• Use common benchmarks

Compare results with results for other systems

Types of experiments• Resource acquisition and release• Single-Instance (SI) benchmarking• Multiple-Instance (MI) benchmarking

April 21, 202310

Resource Acquisition: Can Matter

• Can be significant• For single instances (GoGrid)• For multiple instances (all)

• Short-term variability can be high (GoGrid)

• Slow long-term growth

April 21, 202311

Single Instances: ComputePerformance Lower Than Expected• ECU = 4.4 GFLOPS (at 100% efficient code)

= 1.1GHz 2007 Opteron x 4 FLOPS/cycle (full pipeline)

• In our tests: 0.6-0.8 GFLOPS• Sharing of the same physical machines (working set)• Lack of code optimizations beyond –O3 –funroll-loops• Metering requires more clarification

• Instances with excellent float/double addition perf. may have poor multiplication perf. (c1.medium, c1.xlarge)

April 21, 202312

Multi-Instance: Low Efficiency in HPL

Peak Performance• 2 x c1.xlarge (16 cores) @ 176 GFLOPS,

HPCC-227 (Cisco, 16c) @ 102, HPCC-286 (Intel, 16c) @ 180

• 16 x c1.xlarge (128 cores) @ 1,408 GFLOPS, HPCC-224 (Cisco, 128c) @ 819, HPCC-289 (Intel, 128c) @ 1,433

Efficiency• Cloud: 15-50% even

for small (<128)instance counts

• HPC: 60-70%

April 21, 202313

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 14

Cloud Performance Variability

• Performance variability of production cloud services• Infrastructure:

Amazon Web Services• Platform:

Google App Engine

• Year-long performance information for nine services• Finding: about half of the cloud services

investigated in this work exhibits yearly and daily patterns; impact of performance variability depends on application.A. Iosup, N. Yigitbasi, and D. Epema, On the Performance Variability of Production Cloud Services, (under submission).

Amazon S3: GET US HI operations

April 21, 202315

Agenda

1. Introduction & Motivation2. Proto-Many Task Users3. Performance Evaluation of Four Clouds4. Clouds vs Other Environments5. Take Home Message

Clouds vs Other Environments

• Trace-based simulation, DGSim (grid) simulator• Compute-intensive, no data IO

• Source Env v Cloud w/ source-like performance v Cloud w/ real (measured) performance• Slowdown for Sequential: 7 times, Parallel: 1-10 times

• Results• Response time 4-10 times higher in real clouds• Good for short-term, deadline-driven projects

April 21, 202316

April 21, 202317

Take Home MessageTake Home Message

• Many-Tasks Scientific Computing• Quantitative definition: J jobs and B bags-of-tasks• Extracted proto-MT users from grid and parallel prod. envs.

• Performance Evaluation of Four Commercial Clouds• Amazon EC2, GoGrid, Elastic Hosts, Mosso• Resource acquisition, Single- and Multi-Instance

benchmarking• Low compute and networking performance

• Clouds vs Other Environments• An order of magnitude better performance needed for clouds• Clouds already good for short-term, deadline-driven sci.

April 21, 202318

Potential for CollaborationPotential for Collaboration

• Other performance evaluation studies of clouds• The new Amazon EC2 instance—Cluster Compute• Other clouds?

• Data-intensive benchmarks

• General logs• Failure Trace Archive• Grid Workloads Archive

• …

April 21, 202319

Thank you! Questions? Observations?

More Information:• The Grid Workloads Archive: gwa.ewi.tudelft.nl

• The Failure Trace Archive: fta.inria.fr

• The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl

• Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.html

• see PDS publication database at: www.pds.twi.tudelft.nl/

email: A.Iosup@tudelft.nl

Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …

August 28, 2015 1 Performance Analysis of Cloud Computing Services for Many-Tasks Scientific...

Documents

Transcript of August 28, 2015 1 Performance Analysis of Cloud Computing Services for Many-Tasks Scientific...

In1705/07-PDS Computer Organization (Recap) iosup/Courses/2011_ti1400_0-plus-4.ppt.

Euro-Par 2008, Las Palmas, 27 August 2008 1 DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.

Vermelding onderdeel organisatie September 18, 2015 1 Literature Search iosup/Courses/2012_aiosup_lit_search.ppt IN 3305.

11-2-2014 Challenge the future Delft University of Technology Analysis and Modeling of Time-Correlated Failures in Large-Scale Distributed Systems Nezih.

19 November 2013 Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds Alexandru Iosup Delft University of Technology.

1 Digital Logic iosup/Courses/2011_ti1400_1.ppt.

Presto: SQL on Everything · Presto: SQL on Everything Raghav Sethi, Martin Traverso , Dain Sundstrom , David Phillips , Wenlei Xie, Yutian Sun, Nezih Yigitbasi, Haozhun Jin, Eric

Income Taxation of U.S. Households: Facts and Parametric … · Income Taxation of U.S. Households: Facts and Parametric Estimates Nezih Guner, Remzi Kaygusuz and Gustavo Ventura

June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.

“THE KARAZ CULTURE AND EARLY BRONZE AGE IN …sosyalarastirmalar.com/cilt5/cilt5sayi20_pdf/4_arkeoloji/yigitbasi... · Havzası, tektonik kökenli bir depresyon alanın batısında

1 Instructions and Addressing iosup/Courses/2011_ti1400_4.ppt.

Computer Organization TI1400 Alexandru Iosup (lecturer) Henk Sips (original slides) Parallel and Distributed Systems iosup/Courses/2011_ti1400_0.ppt.

1 The Memory System (Chapter 5) iosup/Courses/2011_ti1400_9.ppt.

Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,

1 Pipelining (Chapter 8) TU-Delft TI1400/11-PDS iosup/Courses/2011_ti1400_10.ppt.

MILITARY - journal.forces.gc.ca · Dr. Sean Maloney Professor Brian McKercher Dr. Paul Mitchell Dr. Nezih Mrad Dr. Scot Robertson Professor Stéphane Roussel Professor Elinor Sloan

June 1, 2015 1 Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema PDS Group, TU Delft, NL Todd Tannenbaum, Matt Farrellee,

Sonochemistry: Science and Engineering · 2018-12-13 · Review Sonochemistry: Science and Engineering Nimesh Pokhrel⇑, Phani Kiran Vabbina, Nezih Pala Integrated Nanosystems Research

ComplexHPC Spring School Day 2: KOALA Tutorial The KOALA ... · ComplexHPC Spring School Day 2: KOALA Tutorial The KOALA Scheduler Nezih Yigitbasi ... Introduction • Developed in

Enhancement of photothermal heat generation by ... of photothermal heat generation by metallodielectric nanoplasmonic clusters Arash Ahmadivand,1,* Nezih Pala,1 and Durdu Ö. Güney2