Materials to be Covered include

60
Materials to be Covered include: 1. Distributed Systems Modeling 2. Computer Clustering 3. Virtual Machines and Virtualization 4. Cloud Platform Architectures 5. Service-Oriented Architecture 6. Cloud Programming and Software Environments

Transcript of Materials to be Covered include

Page 1: Materials to be Covered include

Materials to be Covered include: 1.  Distributed Systems Modeling

2.  Computer Clustering

3.  Virtual Machines and Virtualization

4.  Cloud Platform Architectures

5.  Service-Oriented Architecture

6.  Cloud Programming and Software Environments

Page 2: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 2

Distributed and Cloud Computing K. Hwang, G. Fox and J. Dongarra

Chapter 1: Enabling Technologies

and Distributed System Models (suggested for use in 2 lectures of 50 minutes each)

Prepared by Kai Hwang

Modified by S. J. Kao Sept. 15, 2013

Page 3: Materials to be Covered include

From Desktop/HPC/Grids to Internet Clouds in 30 Years (1/2)

n  HPC moving from centralized supercomputers to geographically distributed desktops, desksides, clusters, and grids to clouds over last 30 years.

n  R/D efforts on HPC, clusters, Grids, P2P, and virtual machines has laid the foundation of cloud computing that has been greatly advocated since 2007.

n  Location of computing infrastructure in areas with lower costs in hardware, software, datasets, space, and power requirements – moving from desktop computing to datacenter-based clouds.

Page 4: Materials to be Covered include

From Desktop/HPC/Grids to Internet Clouds in 30 Years (2/2)

n  With the introduction of SOA, Web 2.0 services, RFID, GPS, and sensor technologies, IoT is becoming reality.

n  P2P, clouding computing, and web service platforms are more focused on HTC applications rather than on HPC.

n  Cloud Computing overlaps with distributed, centralized, and parallel computing.

n  Internet Clouds are the results of moving desktop computing to service-oriented computing using server clusters and huge databases at data centers.

Page 5: Materials to be Covered include

History Notes n  1960 – John McCarthy formulated the idea that computer may

be organized as a public utility, like water and electricity.

n  1992 – Gordon Bell addressed at a conference on parallel computations with the provocative title “Massively parallel computers: why not parallel computers for the masses?”

n  1998 - Google Inc. by Page and Brin

n  2006 – Amazon EC2 was initially released, 2008 Microsoft announced Windows Azure, 2011 iCloud was announced by Apple, and 2012 the Oracle Cloud was announced.

Page 6: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 6

Clouds and Internet of Things

HPC: High-Performance Computing

HTC: High-Throughput Computing

P2P: Peer to Peer

MPP: Massively Parallel Processors

Source: K. Hwang, G. Fox, and J. Dongarra, Distributed and Cloud Computing, Morgan Kaufmann, 2012.

Page 7: Materials to be Covered include

Network-centric Computing (1/3) u Information processing can be done more

efficiently on large farms of computing and storage systems accessible via the Internet. – Grid computing – initiated by the National Labs in the

early 1990s; targeted primarily at scientific computing.

– Utility computing – initiated in 2005-2006 by IT companies and targeted at enterprise computing.

u The focus of utility computing is on the business model for providing computing services; it often requires a cloud-like infrastructure.

u Cloud computing is a path to utility computing embraced by major IT companies including: Amazon, HP, IBM, Microsoft, Oracle, and others.

Page 8: Materials to be Covered include

Network-centric Computing (2/3)

u Content: any type or volume of media, be it static or dynamic, monolithic or modular, live or stored, produced by aggregation, or mixed.

u The “Future Internet” will be content-centric.

The creation and consumption of audio and visual content is likely to transform the Internet to support increased quality in terms of resolution, frame rate, color depth, stereoscopic information.

Page 9: Materials to be Covered include

Network-centric Computing (3/3)

u Data-intensive: large scale simulations in science and engineering require large volumes of data. Multimedia streaming transfers large volume of data.

u Network-intensive: transferring large volumes of data requires high bandwidth networks.

u Low-latency networks for data streaming, parallel computing, computation steering.

u The systems are accessed using thin clients running on systems with limited resources, e.g., wireless devices such as smart phones and tablets.

u The infrastructure should support some form of workflow management.

Page 10: Materials to be Covered include

Computing Paradigms 1.  Centralized computing 2.  Parallel computing 3.  Distributed computing 4.  Cloud computing Alternatives could be concurrent computing, ubiquitous

computing, and Internet computing. IoT = networked collection of everyday objects

including computers, sensors, humans, etc. Internet Clouds = the result of moving desktop

computing to service-oriented computing using server clusters and huge databases at data centers.

Page 11: Materials to be Covered include

3 New Computing Developments

1.  Web 2.0 services (SOA) 2.  Internet clouds (virtualization) 3.  Internet of Things (RFID, GPS, …)

u The network is the computer - John Gage, in 1984 (Sun Micro)

u The data center is the computer – David Patterson, in 2008 (UC Berkeley)

u The cloud is the computer – Rajkumar Buyya, recently (Melbourne U.)

Page 12: Materials to be Covered include

Technology Convergence toward HPC for Science and HTC for Business

(Courtesy of Raj Buyya, University of Melbourne, 2011)

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 13: Materials to be Covered include

HPC and HTC u  HPC for high-performance computing, which is

oriented for scientific computing, such as Gflops and Pflops engineering and manufacturing applications.

u  HTC for high-throughput computing, which is

oriented for business computing, such as Internet searches and web services.

Both HPC and HTC systems emphasize parallelism and distributed computing. Future systems must also satisfy the huge demand in computing power in terms of throughput, efficiency, scalability, and reliability.

Page 14: Materials to be Covered include

2011 Gartner “IT Hype Cycle” for Emerging Technologies

2007

2008

2009

2010

2011

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 15: Materials to be Covered include

Hype Cycle of New Technologies u  Cloud technology has just crossed the peak of the

expectation stage in 2010, and it was expected to take 2 to 5 more years to reach the productivity stage.

u  Broadband over power line technology was

expected to become obsolete before leaving the valley of disillusionment in 2010.

p  Future promising technologies may include clouds, biometric authentication, interactive TV, speech recognition, predictive analytics, and media tablets.

Page 16: Materials to be Covered include

2 Internet Development Trends u  Internet of Things (IoT) (introduced at MIT in 1990) IoT refers to the networked interconnection of

everyday objects, tools, devices, or computers. One can view the IoT as a wireless network of sensors that interconnect all thing in our life.

u  Cyber-Physical Systems (CPS) is the result of interaction between computational

processes and the physical world. It merges the 3C (computation, communication, and control) into an intelligent closed feedback system between the physical world and the information world.

u  IoT emphasizes various networking connections among physical objects, CPS emphasizes exploration of virtual reality (VR) applications.

Page 17: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 17

Process Speed and Network Bandwidth

Page 18: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 18

Multicore CPU Chip

Page 19: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 19

Multithreading Technologies

Page 20: Materials to be Covered include

Multicore CPU and Many-core GPU u  Both multicore CPU and many-core GPU can handle

multiple instruction threads at different magnitudes.

u  Examples of currently available multicore CPUs: l  Intel i7, Xeon, AMD Opteron l  Sun Niagara, IBM Power 6, X cell

u  A GPU is a graphics coprocessor or accelerator mounted on a computer’s graphics card or video card. GeForce 256 by NVIDIA in 1999 is the pioneer.

u  Traditional CPU’s are structured with a few cores, while modern GPU chip can be built with hundreds of processing cores.

Page 21: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 21

Memory and Disk Evolution

Page 22: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 22

Architecture of A Many-Core Multiprocessor GPU interacting with a CPU Processor

Page 23: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 23

Streaming Multiprocessor in NVIDIA Fermi GPU

Page 24: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 24

Datacenter and Server Cost Distribution

Page 25: Materials to be Covered include

Data Center Costs u  According to IDC report in 2009, in data center costs

l  30% goes toward purchasing IT equipment l  33% is attributed to the chiller l  18% to the UPS l  9% to computer room air conditioning l  7% to power distribution, lighting, and

transformer

u  About 60% of the cost to run the data center is allocated to management and maintenance.

u  Over past 15 years, server purchase cost did not increase much, while the cost of electricity and cooling increases from 5% to 14%.

Page 26: Materials to be Covered include

Virtual Machine Architecture

Copyright © 2012, Elsevier Inc. All rights reserved. (Courtesy of VMWare, 2010)

Page 27: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 27

Primitive Operations in Virtual Machines

Page 28: Materials to be Covered include

Convergence of Technologies (1/2) u Cloud computing is enabled by the

convergence of technologies in the following areas:

1.  Hardware virtualization and multi-core chips

2.  Utility and grid computing 3.  SOA, Web 2.0 and WS mashups 4.  Autonomic computing and data center

automation

Page 29: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 29

Convergence of Technologies (2/2)

(Courtesy of Judy Qiu, Indiana University, 2011)

Page 30: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 30

IDC report, 2009

Clusters, P2P, Grids, and Internet Clouds

Page 31: Materials to be Covered include

Clusters and Grids u  A cluster consists of interconnected

standalone computers which work cooperatively as a single integrated computing resource. It is designed for handling heavy workloads with large data sets.

u  A computing grid offers an infrastructure that couples computers, software, middleware, special instruments, and people and sensor together, which is usually constructed across LAN, WAN, or Internet backbone. 2 categories: computational/data grids and P2P grids.

Page 32: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 32

A Typical Cluster Architecture

Page 33: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 33

An Example of a Computational Grid

Page 34: Materials to be Covered include

P2P Systems and Overlay Networks u  In P2P systems, every node acts as both

client and server to provide part of system resources. No mater-slave, central coordination or central database is necessary. P2P networks don’t use a dedicated interconnection network.

u  Based on communication or file sharing

needs, the peer nodes (IDs) form an overlay network at the logical level. There are either unstructured or structured overlay networks.

Page 35: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 35

An Example of an Overlay Network

Page 36: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 36

Page 37: Materials to be Covered include

P2P Application Families 1.  Distributed file sharing o digital contents, such as

music, videos, etc Ø Gnutella, BitTorrent, Napster, etc

2.  Collaborative platform Ø MSN, Skype, instant messaging, etc

3.  Distributed P2P computing in specific applications Ø SETI@home, Geonome@home, etc

4.  Other P2P platforms Ø  JXTA, .NET, FightingAid@home, etc

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 38: Materials to be Covered include

P2P Computing Challenges 1. Three types of heterogeneity problems:

u Hardware models and architectures u Different network connections and protocols u  Incompatibility between software and the OS

2. P2P performance is affected by routing

efficiency and self-organization by participating peers.

3. Fault tolerance, failure management, load balancing

4. Security, privacy, and copyright violations (due to lack of trusts among users)

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 39: Materials to be Covered include

Internet Clouds u The basic idea is to move desk computing to a service-

oriented platform using server clusters and huge databases at data centers.

u One may consider the cloud as a server cluster which participates distributed computing, others may treat it as a centralized resource pool.

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 40: Materials to be Covered include

The Cloud (1/2) •  Historical roots in today’s

Internet apps •  Search, email, social networks •  File storage (Live Mesh, Mobile

Me, Flicker, …) •  A cloud infrastructure provides a

framework to manage scalable, reliable, on-demand access to applications

•  A cloud is the “invisible” backend to many of mobile applications

•  A model of computation and data storage based on “pay as you go” access to “unlimited” remote data center capabilities Copyright © 2012, Elsevier Inc. All rights reserved.

Page 41: Materials to be Covered include

The Cloud (2/2) u Computational science is changing to

be data-intensive. u Working with large data sets will mean

sending the computations (programs) to the data, rather than copying the data to the workstations.

u A cloud is a pool of virtualized computer resources, which can host a variety of different workloads, including batch and interactive applications.

u Cloud computing serves as an on-demand computing paradigm to resolve or relieve us from traditional operation problems.

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 42: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 42

Virtualized Platform of Internet Clouds

Page 43: Materials to be Covered include

The Next Revolution in IT Cloud Computing

•  Classical Computing –  Buy & Own

•  Hardware, System Software, Applications often to meet peak needs.

–  Install, Configure, Test, Verify, Evaluate

–  Manage –  .. –  Finally, use it –  $$$$....$(High CapEx)

•  Cloud Computing –  Subscribe –  Use

–  $ - pay for what you use, based on QoS Ev

ery

18

mon

ths?

Copyright © 2012, Elsevier Inc. All rights reserved.

(Courtesy of Raj Buyya, 2012)

Page 44: Materials to be Covered include

Three Cloud Service Models (1/2) u IaaS – it puts together infrastructures

demands by users – namely servers, storage, networks and data center fabric.

u PaaS – it enables the user to deploy user-built applications onto a virtual cloud platform. The platform supplies both hardware and software integrated with specific programming interface.

u SaaS – this refers to browser initiated application software over thousands of paid cloud customers. SaaS applies to business processes, industry applications, CRM, ERP, etc.

Copyright © 2012, Elsevier Inc. All rights reserved.

Page 45: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 45

Three Cloud Service Models (2/2)

Page 46: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 46

Cloud Computing Challenges: Dealing with too many issues (Courtesy of R. Buyya)

Resource Metering

Billing

Pricing

Provisioning on Demand Utility & Risk Management

Scalability

Reliability

Energy Efficiency

Security

Privacy

Legal & Regulatory

Software Eng. Complexity

Programming Env. & Application Dev.

Page 47: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 47

The Internet of Things (IoT)

Internet Clouds Internet of

Things

The Internet

Smart Earth

Smart Earth:

An IBM

Dream

Page 48: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 48

Opportunities of IoT in 3 Dimensions

(courtesy of Wikipedia, 2010)

Page 49: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 49

System Scalability vs. OS Multiplicity

Page 50: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 50

System Availability vs. Configuration Size

Page 51: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 51

Amdahl’s Law and System Availability Amadahl’s Law states the speedup factor of using a n-processor

system over the use of a single processor, assume that

1.  Consider the execution of a given program on a uniprocessor workstation with a total execution time T.

2.  A fraction α f the code can be compiled for parallel execution by n processors, called the sequential bottleneck.

3.  I/O time and exceptional handling time are ignored.

Speedup S = T / [αT + (1- α)T/n] = 1 / [α + (1- α)/n]

System Availability is defined to be

MTTF / (MTTF + MTTR), where MTTF denotes the mean time to failure, and MTTR is the mean time to repair.

Page 52: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 52

Three Distributed Operating Systems

Page 53: Materials to be Covered include

Transparent Cloud Computing Environment

Page 54: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 54

Parallel and Distributed Programming

Page 55: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 55

Grid Standards and Middleware

Page 56: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 56

Page 57: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 57

Two Methods in Power Consumption.

1.  Dynamic Power Management (DPM) Hardware devices, such as CPU, have the

capability to switch from idle mode to a lower power mode.

2. Dynamic Voltage-Frequency Scaling

(DVFS) Energy savings are achieved based on the fact

that the power consumption in CMOS circuits has a direct relationship with the frequency and the square of voltage supply.

Page 58: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 58

Energy Efficiency (DVFS)

Where ν, Ceff, K, νt are the voltage, circuit switching capacity, a technology dependent factor, and the threshold voltage , respectively.

Page 59: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 59

System Attacks and Network Threads

Page 60: Materials to be Covered include

Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 60

Five Reference Books: 1. Dan C. Marinescu, Cloud Computing, Morgan Kauffmann

Publishers, 2013.

2. K. Hwang, G. Fox, and J. Dongarra, Distributed and Cloud Computing: from Parallel Processing to the Internet of Things Morgan Kauffmann Publishers, 2011

3. R. Buyya, J. Broberg, and A. Goscinski (eds), Cloud Computing: Principles and Paradigms, ISBN-13: 978-0470887998, Wiley Press, USA, February 2011.

4. T. Chou, Introduction to Cloud Computing: Business and Technology, Lecture Notes at Stanford University and at Tsinghua University, Active Book Press, 2010.

5. T. Hey, Tansley and Tolle (Editors), The Fourth Paradigm : Data-Intensive Scientific Discovery, Microsoft Research, 2009.