Cloud Computing: Technologies and Opportunities H. T. Kung Harvard School of Engineering and Applied...
-
date post
19-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Cloud Computing: Technologies and Opportunities H. T. Kung Harvard School of Engineering and Applied...
Cloud Computing:Technologies and Opportunities
H. T. Kung
Harvard School of Engineeringand Applied Sciences
January 21, 2010 at NTHU
Copyright © 2010 by H. T. Kung
2
“Cloud Computing” Is Hot, At Least the Term
(This chart and much of the other material presented in this talk is based on literature openly available from Google)
Cloud Computing
Grid Computing
Utility Computing
Search Term Popularity over Years
2004 2005 2006 2007 2008
3
Server Racks in Google Data Center
Source: Edward Chang, Google (MMDS 08)
4
First, Some Definitions
• “Cloud computing” refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services
• A “cloud” is a specific instance of a cloud computing service, e.g., Amazon’s Elastic Compute Cloud (EC2)
• “Data centers” can hold tens to hundreds of thousands of servers that concurrently support a large number of cloud computing services (e.g., search, email, and utility computing)
5
Advantages of Cloud Computing
• Utility computing• Reduced capital expenditure for users• Resource sharing and scaling• Simplified IT management• Device and location independence in access• Reliability through multiple redundant sites• Security improvement via centralization
6
Cloud Computing Can Potentially Offer Some Solutions to Computer Attacks on the Internet
7
Nine Goals of Computer Utility
1. Convenient remote terminal access as the normal mode of system usage
2. A view of continuous operation analogous to that of the electric power and telephone companies
3. A wide range of capacity to allow growth or contraction without either system or user reorganization
4. An internal file system so reliable that users trust their only copy of programs and data to be stored in it
5. Sufficient control of access to allow selective sharing of information
6. The ability to structure hierarchically both the logical storage of information as well as the administration of the system
7. The capability of serving large and small users without inefficiency to either
8. The ability to support different programming environments and human interfaces within a single system
9. The flexibility and generality of system organization required for evolution through successive waves of technological improvements and the inevitable growth of user expectations
8
These Nine Goals Were Actually Articulated More Than 40 Years Ago
• The nine goals listed on the preceding slide were excerpts from a 1972 paper: – “Multics – The first seven years,” by F. J. Corbato and J. H.
Saltzer, MIT and C. T. Clingen, Honeywell Information Systems• Thus visions similar to cloud computing vision have been
around for a long time• It is the incredible scalability in cloud computing
implementations of recent years that is new• Therefore, to understand cloud computing, it is instructive
to focus on its enabling technologies, beyond just concept discussions
9
Traditional Examples of Cloud Computing
This “everything X as a Service” (XaaS) formulation can still be difficult to understand, and is in fact a little boring
So what is really the essence of cloud computing?
10
The Essence: In Cloud ComputingWe Program or Run a Cloud As a System• For example, we program a cloud to allow parallel
execution of tasks or run a cloud to attain load balancing• Thus a cloud is much more than a merely collection of
servers• More precisely, a cloud takes on a set of tasks
expressed in certain abstraction (“virtual tasks”) and assigns them to physical resources for their executions. These tasks can be, for example:– Virtual machines (for IaaS)– Virtual services (for SaaS)– Virtual platforms (for PaaS)
• The result is that hardware and software in a cloud work in concert to deliver scalable performance
11
Outline for the Rest of the Presentation
• Challenges• Enabling Technologies (or Best Current
Practices)• Opportunities
12
Some Areas of Challenges
• User-experienced performance• Scalable implementation of services• Privacy• Distributed modular deployment for
the network edge (to meet future needs)
• Data lock-in
13
Challenge: Programming a Massive Number of Servers
• Parallel programming, while managing non-uniform inter-node communications with high latency disparity
14
Challenge: Programming underHigh Workload Churn
• Because Internet services are still a relatively new field, new products and services frequently emerge (e.g., YouTube and iPhone phenomena) and fade away (e.g., lately, people have been saying P2P is dying as far as getting new investments are concerned)
• Google’s front-end Web server binaries are released on a weekly cycle, with nearly a thousand independent code changes checked in by hundreds of developers—the core of Google’s search services has been reimplemented nearly from scratch every 2 to 3 years
15
Challenge: Managing Power Consumption
• We need drive servers as well as high-speed and redundant networks for connecting servers, racks and clusters
• Large data centers are literally both gigantic heaters and air conditioners at the same time
• Due to the high heat concentration, power-consuming heat-exchange mechanisms are needed consuming about 40% of total center’s power
16
Power Consumption Can Be Very LargeJust for Data Storage Alone
Consider, e.g., a large-scale email service• 140M users• 10GB storage per user, assuming 10X online
redundancy• 1TB disk drive with 8W standby power
consumption (e.g., Seagate ST31000340AS)• Server power consumption: 10X• Datacenter power and cooling: 1.4X
(chillers/pumps, power distribution, UPS, etc.)
This means 140M*.001*8*10*1.4 = 15.7 megawatts
17
Data Centers Power Consumption:A Very Serious Issue
• In year 2007, the U.S. Environmental Protection Agency estimates that data centers currently use 7 gigawatts of electricity during peak loads
• These data centers consume about 1.5% of the total electricity used on the planet, and this amount is expected to grow rapidly. Environmental agencies worldwide are considering standards to regulate server and data center power
• We will need to rethink about our use of data centers, and their deployment for the sake of the environment. This is like that even if our income allows us to drive big cars it does not mean that we should abuse our use of resources
Challenge: Distributed Cloud Computing at the Network Edge
• Processing at the network edge is advantageous when backhaul networking for reaching back is seriously constrained and/or low-latency responses are essential
• Local enterprise computing may be needed for control and security/privacy reasons
• Sensor data sets may be too large to move
• Heat is dissipated over distributed edge data centers, no longer highly concentrated
• Someone should do a serious calculation on energy consumption implications when most of desktops are migrated into clouds
Edge
Edge
Edge
Edge
Cloud
Edge
19
Challenge: Handling Failures
• Disk drives, for example, can exhibit annualized failure rates higher than 4%
• In addition, there are application bugs, operating system bugs, human errors, and the failures of memory, connectors, networking, and power supplies
• Different deployments have reported between 1.2 and 16 average server-level restarts per year
• Component failures are the norm rather than the exception
• Therefore, constant monitoring, error detection, fault tolerance, and automatic recovery must be integral to the system
20
Challenge: Scheduling
• In parallel computing, a single slower worker can determine the response time of a huge parallel task. To deal with “stragglers,” toward the end of a job we may need to identify such situations and speculatively start redundant workers only for those slower jobs
• We may also need to preempt a running job in order to accommodate higher-priority tasks such as a real-time interactive task. Preemption actually is a major source for system instability
21
Challenge: Migration
• We migrate machines for balancing or consolidating load– Note that less loaded servers normally don’t decrease
their power proportionally. Thus consolidating their loads to a single server will save power
• In addition, migration facilitates fault management, load balancing, and low-level system maintenance
• But migration should minimize its disruption to normal operations
• Furthermore, when a server CPU is put to sleep, we may still need to access the data on its local disks (what is your solution?)
22
Challenge: Performance Assurance in Virtualization
• When virtual machines of uncooperative users share the same resources, these virtual machines must be isolated from one another
• In particular, the execution of one virtual machine should not adversely affect the performance of another
23
Challenge: Data Center Networking
• Layer-3 IP networking is scalable, but cloud-computing services and end-host virtualization cannot be tied to layer-3 IP addresses. This is because we want to allow a virtual machine/service/application to be run on any server
• This means that we need to use a flat layer-2 network to enable any server to be assigned to any service
• However, layer-2 networks are generally not scalable and they do not allow multiple paths required for high-bandwidth traffic patterns
24
Challenge: Data Lock-in
• Can you imagine the day when you need to move tons of your emails to a new email provider?
• Can you see the strategies of Apple behind its MobileMe service?
• Data lock-in is likely an even more serious issue than application/software lock-in
• Data lock-in over the cloud has become one of the most important control points for many businesses
25
Outline
• Challenges• Enabling Technologies (or Best
Current Practices)• Opportunities
26
Current Practice: High-level Programming Models and Implementation
• Provide a programming model layer to facilitate application development
• The most famous programming model in cloud computing is MapReduce. Thousands of data parallel applications have been programmed in MapReduce
• Run-time implementation is the key. It takes care of scheduling and fault-tolerance for the application
Programming Models
(e.g., MapReduce)
Applications(e.g., Page Ranking)
27
Current Practice: Master-slave Model inMapReduce Execution
The Master server dispatches map and reduce tasks to worker servers, monitors their progress, and reassigns workers when failures occur
28
Current Practice: Software Infrastructure To Support Programming Model Implementation
• For example, with the GFS support, a MapReduce implementation can focus on its Master-based scheduling, by leaving system issues, such as file fault tolerance and data locality support to GFS
• That is, MapReduce can now write files to a "persistent" file system distributed over racks of servers in a data center
Programming Models(e.g., MapReduce)
Applications(e.g., Page Ranking)
Software Infrastructure(e.g., Google File System)
29
Current Practice: Google File System (GFS)
• In GFS, every file chunk is replicated three times automatically• The master server can potentially be a performance bottleneck.
In principle, it could be replaced with a set of distributed master servers using, e.g., distributed hash table (DHT). But this could complicate management due to lack of centralized knowledge
30
Current Practice: Data Center Network Wiring
• Multi-rooted Tree
31
Current Practice:Data Center Network Protocols
• We could use virtual layer-2 networks which may actually be implemented with layer-3 or even layer-4 (yes, TCP!) protocols and nodes
• Note that normal layer-3 and layer-4 protocols will be on top of this virtual layer 2 in supporting applications!
Layer 4(TCP/UDP)
Applications(e.g., Page Ranking)
Layer 3(IP)
Virtual Layer 2Implemented with
Layer 3 and Layer 4
Layer 1(E.g., Ethernet )
32
Current Practice: Virtualization Technologies
• For example, Xen’s virtual machine hypervisor (used by Amazon’s EC2) can multiplex physical resources at the granularity of an entire operating system
• Virtualization allows individual users to run unmodified binaries, or collections of binaries, in a resource controlled fashion (for instance an Apache server along with a PostgreSQL backend)
• Furthermore it provides an extremely high level of flexibility since the user can dynamically create the precise execution environment their software requires. Unfortunate configuration interactions between various services and applications are avoided (for example, each Windows instance maintains its own registry)
33
Fast Live Migration with Pre-copy
• Pages of memory are iteratively copied from the source machine to the destination host, all without ever stopping the execution of the virtual machine being migrated. Page-level protection hardware is used to ensure a consistent snapshot is transferred, and a rate-adaptive algorithm is used to control the impact of migration traffic on running services
• The final phase pauses the virtual machine, copies any remaining pages to the destination, and resumes execution there
• Migrating entire OS instances on a commodity cluster can incur service downtimes as low as 60ms
Current Practice: Container-based Datacenter
• Placing the server racks (thousands of servers) into a standard shipping container and integrating heat exchange and power distribution into the container
• Air handling is similar to in-rack cooling and typically allows higher power densities than regular raised-floor datacenters
• The container-based facility has achieved extremely high energy efficiency ratings compared with typical datacenters today
Microsoft Data Center Near Chicago(9/30/2009)
Source: http://www.datacenterknowledge.com/archives/2009/09/30/microsoft-unveils-its-container-powered-cloud
35
Packaging Related Issues
• A container-based datacenter will likely be prefabricated in factory. It is rather difficult, if not impossible, to service it by humans once it is deployed in the field, due to operational and space constraints– “Data center in a shipping-container” is analogous
to “system on a chip” built with low-power transistors which may fail
• Suppose we want to stack 4x4x4 units together. How should we network them together?
35
36
Opportunities in Using A WirelessNetwork as A Backplane
• Ubiquity of wireless networking– Wireless LAN such as 802.11 (aka “Wi-Fi”) has
become widely available on mobile devices
• Inherent advantages of the wireless medium– No wires! so it can support flexible and rapid
deployment– Convenience in broadcast/multicast
37
Use of Wireless Connections
• By definition, edge computing is more ad-hoc and less enterprise; wireless networking will give us the required flexibility
• Potentially by using wireless we can get rid of most of these wires in the bottom of the interconnection hierarchy without significant performance impact
May use Wireless Instead
38
Related Work at Harvardfor “Wireless Computing at the Edge ”
• We are developing two systems– Wireless Ad-Hoc File System
(AHFS)– TCP over multiple coded
wireless links• These systems incorporate
advanced technologies such as localization, clustering, network coding, and geographic routing
39
Wireless MapReduce Implementation
• We believe that wireless broadcast is natural in distributing data to multiple map worker nodes, whereas wireless remote procedure calls can efficiently facilitate communications for reduce workers
• As a proof of concept, we have a preliminary MapReduce implementation of the distributed speaker ID application we built in the past
39
Testbeds to Support Wireless Networking Research at Harvard
Indoor testbedat Harvard
Outdoor testbed(“cloud computing in the air”!)
Maxwell Dworkin
Pierce
MIDFour MIDs onthe Airplane
Two Architecture Primitives of AHFS
• Cluster-oriented file operation– Nodes in a cluster can talk to
each other well, so they can provide file redundancy
– Model use: "Write a file into a cluster“
– A file is associated with a cluster of nodes, where clients read from/write to the file. That is, the cluster is a “rendezvous” point for users of the file
• Location-oriented file operation– Put files in the proximity of their expected users in order to minimize
transmission distance– Model use: “An airplane writes a file to ground nodes at location X”
TX-2
TX-1
TX-3
S RX-2
RX-1
RX-3
D
[E4][E3][E2][E1]
Time
[F4][F3][F2][F1]
[G4][G3][G2][G1]
Rate =1/2
TCP over Multiple Network-coded Wireless Links: Exploiting Space-Time Redundancy
Spac
e
A Generationof four packets
It is sufficient for D to receive just four out of the twelve packets
Twelve network coded packets
43
Outline
• Challenges• Enabling Technologies (or Best Current
Practices)• Opportunities
44
General Comments on Opportunities
• Cloud computing is here to stay – one of the most efficient ways to do processing on a large scale– E.g., driver of further power efficient development
• Clouds are expanding into new domains– E.g., network edge: smart phones, set-top boxes, netbooks
• Cloud computing drives demand for ever increasing bandwidth and access flexibility– Data center network infrastructure– Wireless networks to provide cloud access– Faster storage technologies such as flash
• It is not unlike the dot.com period. There are many business models and applications being proposed. It is likely that it will take several years to sort through these ideas
45
Opportunities (1/2)
1. Cloud end-devices with services/applications/data from the cloud, while being able to use resources in local environments (e.g., TVs and desktops) via, e.g., 300mbps Wi-Fi wireless links
2. Fabric computing rather than traditional rack-based blade servers (e.g., network- or switch-centric servers rather than CPU-centric servers for better power management and space use)
3. Programming models beyond MapReduce, e.g., synchronized message passing
46
Opportunities (2/2)
4. GPU-based clouds for large scientific computing to complement x86-based multicore clouds
5. Cloud services capable of making use of private storage. For example, run Google Docs on Pogoplug servers at home
6. Fault-tolerant file systems for flash storage under $$ that can survive faulty blocks present in flash storage (this is different from the usual wear-leveling file system)
7. Cloud Computing for Everything (cc-for-x) where x can be e-commerce, healthcare, sensor networks, mobile phones, etc.
47
Conclusion (1/2)
1. Today’s cloud computing results from decades of technology advances in areas such as server CPUs, operating systems, programming models, networks, fault-tolerant software, virtualization, data centers management, power management, etc.
2. In fact, similar visions were actually known 40 some years ago, e.g., from Project MAC at MIT
3. It is the implementation that has made the difference. In particular, it is the highly scalable technologies developed in recent years that have suddenly made cloud computing a hot area