Cloud Computing: Technologies and Opportunities H. T. Kung Harvard School of Engineering and Applied...

Cloud Computing:Technologies and Opportunities

H. T. Kung

Harvard School of Engineeringand Applied Sciences

January 21, 2010 at NTHU

Copyright © 2010 by H. T. Kung

2

“Cloud Computing” Is Hot, At Least the Term

(This chart and much of the other material presented in this talk is based on literature openly available from Google)

Cloud Computing

Grid Computing

Utility Computing

Search Term Popularity over Years

2004 2005 2006 2007 2008

3

Server Racks in Google Data Center

Source: Edward Chang, Google (MMDS 08)

4

First, Some Definitions

• “Cloud computing” refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services

• A “cloud” is a specific instance of a cloud computing service, e.g., Amazon’s Elastic Compute Cloud (EC2)

• “Data centers” can hold tens to hundreds of thousands of servers that concurrently support a large number of cloud computing services (e.g., search, email, and utility computing)

5

Advantages of Cloud Computing

• Utility computing• Reduced capital expenditure for users• Resource sharing and scaling• Simplified IT management• Device and location independence in access• Reliability through multiple redundant sites• Security improvement via centralization

6

Cloud Computing Can Potentially Offer Some Solutions to Computer Attacks on the Internet

7

Nine Goals of Computer Utility

1. Convenient remote terminal access as the normal mode of system usage

2. A view of continuous operation analogous to that of the electric power and telephone companies

3. A wide range of capacity to allow growth or contraction without either system or user reorganization

4. An internal file system so reliable that users trust their only copy of programs and data to be stored in it

5. Sufficient control of access to allow selective sharing of information

6. The ability to structure hierarchically both the logical storage of information as well as the administration of the system

7. The capability of serving large and small users without inefficiency to either

8. The ability to support different programming environments and human interfaces within a single system

9. The flexibility and generality of system organization required for evolution through successive waves of technological improvements and the inevitable growth of user expectations

8

These Nine Goals Were Actually Articulated More Than 40 Years Ago

• The nine goals listed on the preceding slide were excerpts from a 1972 paper: – “Multics – The first seven years,” by F. J. Corbato and J. H.

Saltzer, MIT and C. T. Clingen, Honeywell Information Systems• Thus visions similar to cloud computing vision have been

around for a long time• It is the incredible scalability in cloud computing

implementations of recent years that is new• Therefore, to understand cloud computing, it is instructive

to focus on its enabling technologies, beyond just concept discussions

9

Traditional Examples of Cloud Computing

This “everything X as a Service” (XaaS) formulation can still be difficult to understand, and is in fact a little boring

So what is really the essence of cloud computing?

10

The Essence: In Cloud ComputingWe Program or Run a Cloud As a System• For example, we program a cloud to allow parallel

execution of tasks or run a cloud to attain load balancing• Thus a cloud is much more than a merely collection of

servers• More precisely, a cloud takes on a set of tasks

expressed in certain abstraction (“virtual tasks”) and assigns them to physical resources for their executions. These tasks can be, for example:– Virtual machines (for IaaS)– Virtual services (for SaaS)– Virtual platforms (for PaaS)

• The result is that hardware and software in a cloud work in concert to deliver scalable performance

11

Outline for the Rest of the Presentation

• Challenges• Enabling Technologies (or Best Current

Practices)• Opportunities

12

Some Areas of Challenges

• User-experienced performance• Scalable implementation of services• Privacy• Distributed modular deployment for

the network edge (to meet future needs)

• Data lock-in

13

Challenge: Programming a Massive Number of Servers

• Parallel programming, while managing non-uniform inter-node communications with high latency disparity

14

Challenge: Programming underHigh Workload Churn

• Because Internet services are still a relatively new field, new products and services frequently emerge (e.g., YouTube and iPhone phenomena) and fade away (e.g., lately, people have been saying P2P is dying as far as getting new investments are concerned)

• Google’s front-end Web server binaries are released on a weekly cycle, with nearly a thousand independent code changes checked in by hundreds of developers—the core of Google’s search services has been reimplemented nearly from scratch every 2 to 3 years

15

Challenge: Managing Power Consumption

• We need drive servers as well as high-speed and redundant networks for connecting servers, racks and clusters

• Large data centers are literally both gigantic heaters and air conditioners at the same time

• Due to the high heat concentration, power-consuming heat-exchange mechanisms are needed consuming about 40% of total center’s power

16

Power Consumption Can Be Very LargeJust for Data Storage Alone

Consider, e.g., a large-scale email service• 140M users• 10GB storage per user, assuming 10X online

redundancy• 1TB disk drive with 8W standby power

consumption (e.g., Seagate ST31000340AS)• Server power consumption: 10X• Datacenter power and cooling: 1.4X

(chillers/pumps, power distribution, UPS, etc.)

This means 140M*.001*8*10*1.4 = 15.7 megawatts

17

Data Centers Power Consumption:A Very Serious Issue

• In year 2007, the U.S. Environmental Protection Agency estimates that data centers currently use 7 gigawatts of electricity during peak loads

• These data centers consume about 1.5% of the total electricity used on the planet, and this amount is expected to grow rapidly. Environmental agencies worldwide are considering standards to regulate server and data center power

• We will need to rethink about our use of data centers, and their deployment for the sake of the environment. This is like that even if our income allows us to drive big cars it does not mean that we should abuse our use of resources

Challenge: Distributed Cloud Computing at the Network Edge

• Processing at the network edge is advantageous when backhaul networking for reaching back is seriously constrained and/or low-latency responses are essential

• Local enterprise computing may be needed for control and security/privacy reasons

• Sensor data sets may be too large to move

• Heat is dissipated over distributed edge data centers, no longer highly concentrated

• Someone should do a serious calculation on energy consumption implications when most of desktops are migrated into clouds

Edge

Edge

Edge

Edge

Cloud

Edge

19

Challenge: Handling Failures

• Disk drives, for example, can exhibit annualized failure rates higher than 4%

• In addition, there are application bugs, operating system bugs, human errors, and the failures of memory, connectors, networking, and power supplies

• Different deployments have reported between 1.2 and 16 average server-level restarts per year

• Component failures are the norm rather than the exception

• Therefore, constant monitoring, error detection, fault tolerance, and automatic recovery must be integral to the system

20

Challenge: Scheduling

• In parallel computing, a single slower worker can determine the response time of a huge parallel task. To deal with “stragglers,” toward the end of a job we may need to identify such situations and speculatively start redundant workers only for those slower jobs

• We may also need to preempt a running job in order to accommodate higher-priority tasks such as a real-time interactive task. Preemption actually is a major source for system instability

21

Challenge: Migration

• We migrate machines for balancing or consolidating load– Note that less loaded servers normally don’t decrease

their power proportionally. Thus consolidating their loads to a single server will save power

• In addition, migration facilitates fault management, load balancing, and low-level system maintenance

• But migration should minimize its disruption to normal operations

• Furthermore, when a server CPU is put to sleep, we may still need to access the data on its local disks (what is your solution?)

22

Challenge: Performance Assurance in Virtualization

• When virtual machines of uncooperative users share the same resources, these virtual machines must be isolated from one another

• In particular, the execution of one virtual machine should not adversely affect the performance of another

23

Challenge: Data Center Networking

• Layer-3 IP networking is scalable, but cloud-computing services and end-host virtualization cannot be tied to layer-3 IP addresses. This is because we want to allow a virtual machine/service/application to be run on any server

• This means that we need to use a flat layer-2 network to enable any server to be assigned to any service

• However, layer-2 networks are generally not scalable and they do not allow multiple paths required for high-bandwidth traffic patterns

24

Challenge: Data Lock-in

• Can you imagine the day when you need to move tons of your emails to a new email provider?

• Can you see the strategies of Apple behind its MobileMe service?

• Data lock-in is likely an even more serious issue than application/software lock-in

• Data lock-in over the cloud has become one of the most important control points for many businesses

25

Outline

• Challenges• Enabling Technologies (or Best

Current Practices)• Opportunities

26

Current Practice: High-level Programming Models and Implementation

• Provide a programming model layer to facilitate application development

• The most famous programming model in cloud computing is MapReduce. Thousands of data parallel applications have been programmed in MapReduce

• Run-time implementation is the key. It takes care of scheduling and fault-tolerance for the application

Programming Models

(e.g., MapReduce)

Applications(e.g., Page Ranking)

27

Current Practice: Master-slave Model inMapReduce Execution

The Master server dispatches map and reduce tasks to worker servers, monitors their progress, and reassigns workers when failures occur

28

Current Practice: Software Infrastructure To Support Programming Model Implementation

• For example, with the GFS support, a MapReduce implementation can focus on its Master-based scheduling, by leaving system issues, such as file fault tolerance and data locality support to GFS

• That is, MapReduce can now write files to a "persistent" file system distributed over racks of servers in a data center

Programming Models(e.g., MapReduce)


Software Infrastructure(e.g., Google File System)

29

Current Practice: Google File System (GFS)

• In GFS, every file chunk is replicated three times automatically• The master server can potentially be a performance bottleneck.

In principle, it could be replaced with a set of distributed master servers using, e.g., distributed hash table (DHT). But this could complicate management due to lack of centralized knowledge

30

Current Practice: Data Center Network Wiring

• Multi-rooted Tree

31

Current Practice:Data Center Network Protocols

• We could use virtual layer-2 networks which may actually be implemented with layer-3 or even layer-4 (yes, TCP!) protocols and nodes

• Note that normal layer-3 and layer-4 protocols will be on top of this virtual layer 2 in supporting applications!

Layer 4(TCP/UDP)


Layer 3(IP)

Virtual Layer 2Implemented with

Layer 3 and Layer 4

Layer 1(E.g., Ethernet )

32

Current Practice: Virtualization Technologies

• For example, Xen’s virtual machine hypervisor (used by Amazon’s EC2) can multiplex physical resources at the granularity of an entire operating system

• Virtualization allows individual users to run unmodified binaries, or collections of binaries, in a resource controlled fashion (for instance an Apache server along with a PostgreSQL backend)

• Furthermore it provides an extremely high level of flexibility since the user can dynamically create the precise execution environment their software requires. Unfortunate configuration interactions between various services and applications are avoided (for example, each Windows instance maintains its own registry)

33

Fast Live Migration with Pre-copy

• Pages of memory are iteratively copied from the source machine to the destination host, all without ever stopping the execution of the virtual machine being migrated. Page-level protection hardware is used to ensure a consistent snapshot is transferred, and a rate-adaptive algorithm is used to control the impact of migration traffic on running services

• The final phase pauses the virtual machine, copies any remaining pages to the destination, and resumes execution there

• Migrating entire OS instances on a commodity cluster can incur service downtimes as low as 60ms

Current Practice: Container-based Datacenter

• Placing the server racks (thousands of servers) into a standard shipping container and integrating heat exchange and power distribution into the container

• Air handling is similar to in-rack cooling and typically allows higher power densities than regular raised-floor datacenters

• The container-based facility has achieved extremely high energy efficiency ratings compared with typical datacenters today

Microsoft Data Center Near Chicago(9/30/2009)

Source: http://www.datacenterknowledge.com/archives/2009/09/30/microsoft-unveils-its-container-powered-cloud

35

Packaging Related Issues

• A container-based datacenter will likely be prefabricated in factory. It is rather difficult, if not impossible, to service it by humans once it is deployed in the field, due to operational and space constraints– “Data center in a shipping-container” is analogous

to “system on a chip” built with low-power transistors which may fail

• Suppose we want to stack 4x4x4 units together. How should we network them together?

35

36

Opportunities in Using A WirelessNetwork as A Backplane

• Ubiquity of wireless networking– Wireless LAN such as 802.11 (aka “Wi-Fi”) has

become widely available on mobile devices

• Inherent advantages of the wireless medium– No wires! so it can support flexible and rapid

deployment– Convenience in broadcast/multicast

37

Use of Wireless Connections

• By definition, edge computing is more ad-hoc and less enterprise; wireless networking will give us the required flexibility

• Potentially by using wireless we can get rid of most of these wires in the bottom of the interconnection hierarchy without significant performance impact

May use Wireless Instead

38

Related Work at Harvardfor “Wireless Computing at the Edge ”

• We are developing two systems– Wireless Ad-Hoc File System

(AHFS)– TCP over multiple coded

wireless links• These systems incorporate

advanced technologies such as localization, clustering, network coding, and geographic routing

39

Wireless MapReduce Implementation

• We believe that wireless broadcast is natural in distributing data to multiple map worker nodes, whereas wireless remote procedure calls can efficiently facilitate communications for reduce workers

• As a proof of concept, we have a preliminary MapReduce implementation of the distributed speaker ID application we built in the past

39

Testbeds to Support Wireless Networking Research at Harvard

Indoor testbedat Harvard

Outdoor testbed(“cloud computing in the air”!)

Maxwell Dworkin

Pierce

MIDFour MIDs onthe Airplane

Two Architecture Primitives of AHFS

• Cluster-oriented file operation– Nodes in a cluster can talk to

each other well, so they can provide file redundancy

– Model use: "Write a file into a cluster“

– A file is associated with a cluster of nodes, where clients read from/write to the file. That is, the cluster is a “rendezvous” point for users of the file

• Location-oriented file operation– Put files in the proximity of their expected users in order to minimize

transmission distance– Model use: “An airplane writes a file to ground nodes at location X”

TX-2

TX-1

TX-3

S RX-2

RX-1

RX-3

D

[E4][E3][E2][E1]

Time

[F4][F3][F2][F1]

[G4][G3][G2][G1]

Rate =1/2

TCP over Multiple Network-coded Wireless Links: Exploiting Space-Time Redundancy

Spac

e

A Generationof four packets

It is sufficient for D to receive just four out of the twelve packets

Twelve network coded packets

43

Outline

• Challenges• Enabling Technologies (or Best Current

Practices)• Opportunities

44

General Comments on Opportunities

• Cloud computing is here to stay – one of the most efficient ways to do processing on a large scale– E.g., driver of further power efficient development

• Clouds are expanding into new domains– E.g., network edge: smart phones, set-top boxes, netbooks

• Cloud computing drives demand for ever increasing bandwidth and access flexibility– Data center network infrastructure– Wireless networks to provide cloud access– Faster storage technologies such as flash

• It is not unlike the dot.com period. There are many business models and applications being proposed. It is likely that it will take several years to sort through these ideas

45

Opportunities (1/2)

1. Cloud end-devices with services/applications/data from the cloud, while being able to use resources in local environments (e.g., TVs and desktops) via, e.g., 300mbps Wi-Fi wireless links

2. Fabric computing rather than traditional rack-based blade servers (e.g., network- or switch-centric servers rather than CPU-centric servers for better power management and space use)

3. Programming models beyond MapReduce, e.g., synchronized message passing

46

Opportunities (2/2)

4. GPU-based clouds for large scientific computing to complement x86-based multicore clouds

5. Cloud services capable of making use of private storage. For example, run Google Docs on Pogoplug servers at home

6. Fault-tolerant file systems for flash storage under $$ that can survive faulty blocks present in flash storage (this is different from the usual wear-leveling file system)

7. Cloud Computing for Everything (cc-for-x) where x can be e-commerce, healthcare, sensor networks, mobile phones, etc.

47

Conclusion (1/2)

1. Today’s cloud computing results from decades of technology advances in areas such as server CPUs, operating systems, programming models, networks, fault-tolerant software, virtualization, data centers management, power management, etc.

2. In fact, similar visions were actually known 40 some years ago, e.g., from Project MAC at MIT

3. It is the implementation that has made the difference. In particular, it is the highly scalable technologies developed in recent years that have suddenly made cloud computing a hot area

Cloud Computing: Technologies and Opportunities H. T. Kung Harvard School of Engineering and Applied...

Documents

Transcript of Cloud Computing: Technologies and Opportunities H. T. Kung Harvard School of Engineering and Applied...