Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

46
Woohyun Kim Cloud Platform Team S-Core 2011-05-18 Rethinking the Cloud - A View of Virtualization, Storage, Network, and Platform

description

This was presented for the subject, Rethinking the Cloud, at NEXCOM 2011.This presentation includes cloud business and strategy, and covers overall cloud technologies such as virtualization, storage, network, and cloud platform.

Transcript of Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Page 1: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Woohyun Kim

Cloud Platform Team

S-Core

2011-05-18

Rethinking the Cloud - A View of Virtualization, Storage, Network, and Platform

Page 2: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Success Stories

Page 3: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

What is the Cloud?

• A computing environment to elastically provide virtualized resources as a

service over the Internet in a pay-as-you-go manner

Page 4: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Amazon’s Challenge and Paradigm Shift

Page 5: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Success Cases in Amazon SmugMug(http://www.smugmug.com/) • an online photo storage application that stores more than half a petabyte of data on S3

• estimates cost savings on service and storage to be close to $1 million

37Signals(http://37signals.com/) • maker of popular online project-management software Basecamp, uses S3 for storage

needs.

New York Times(http://www.nytimes.com) • use EC2 to process terabytes of archival data using hundreds of EC2 instances within 36

hours

Animoto(http://animoto.com/) • an online presentation video generator that needs gobs of computing power for video

processing

• recently successfully withstood a surge in Web traffic that would kill most companies’

systems by scaling up their processing power quickly using EC2 with RightScale

• Animoto ramped from 25,000 users to 250,000 users in three days, signing up

20,000 new users per hour at peak

• Using RightScale, EC2 instances automatically scaled out 40 to 4000 at that time

• For more detail, refer to http://blog.rightscale.com/2008/04/23/animoto-facebook-

scale-up/

Page 6: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Powerset had a great idea, “Natural Language Search”

It should index millions of pages of data and content

They knew that this would require a massively large datacenter and extensive computing power CPUs, terminal switches, cable, racks, datacenters, hosting, power,

maintenance, staffs

But they needed to keep infrastructure costs at a minimum

6

Start-up Company: Powerset

“By using Amazon EC2, Powerset is able to match the infrastructure of large scale search companies on a

startup budget.” - Barney Pell, Founder and CEO of Powerset

““Amazon EC2 is a complete game-changer. EC2 and Amazon Web Services make it easy for start-ups to build a complete infrastructure without having to spend much

on capital .”- Paul Hammann

Page 7: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Powerset had a great idea, “Natural Language Search”

It should index millions of pages of data and content

They knew that this would require a massively large datacenter and extensive computing power CPUs, terminal switches, cable, racks, datacenters, hosting, power,

maintenance, staffs

But they needed to keep infrastructure costs at a minimum

7

Start-up Company: Powerset

“By using Amazon EC2, Powerset is able to match the infrastructure of large scale search companies on a

startup budget.” - Barney Pell, Founder and CEO of Powerset

““Amazon EC2 is a complete game-changer. EC2 and Amazon Web Services make it easy for start-ups to build a complete infrastructure without having to spend much

on capital .”- Paul Hammann

$100 millions

Page 8: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

The New York Times is a 150-year old company, and serves the largest newspaper Website, NYTimes.com 1 billion page views per month

20+ million monthly unique visitors

They tried to convert TIFF images to PDFs TIFF images(405,000),

Articles(3.3 million) in SGML PNG images(810,000)

XML files(405,000) mapping articles to TIFFs JavaScript files(405,000)

8

Temporary & Data-intensive : The New York Times

“I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2

instances, and generated another 1.5TB of data to store in S3. It just costs $3000.” - Derek Gottfrid

“I had was this: upload 4TB of source data into S3, write some code that would run on numerous EC2 instances to read the source data, create PDFs, and store the results back into S3.

S3 would then be used to serve the PDFs to the general public.” - Derek Gottfrid

Page 9: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

The New York Times is a 150-year old company, and serves the largest newspaper Website, NYTimes.com 1 billion page views per month

20+ million monthly unique visitors

They tried to convert TIFF images to PDFs TIFF images(405,000),

Articles(3.3 million) in SGML PNG images(810,000)

XML files(405,000) mapping articles to TIFFs JavaScript files(405,000)

9

“I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2

instances, and generated another 1.5TB of data to store in S3. It just costs $3000.”

“I had was this: upload 4TB of source data into S3, write some code that would run on numerous EC2 instances to read the source data, create PDFs, and store the results back into S3.

S3 would then be used to serve the PDFs to the general public.”

Temporary & Data-intensive : The New York Times

Page 10: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

\ 64,865,381,400

Page 11: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Skepticism

Page 12: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Amazon’s cloud outages receive a lot of exposure …

April 21 ~ 22, 2011 A networking glitch made its storage volumes automatically create back-ups of themselves, filling up storage capacity and causing connectivity issues, lasts two days Amazon’s customers include start-ups like the social networking site Foursquare but also big companies like Pfizer, Netflix and Nasdaq dkagh Affected web sites included Quora.com, Reddit.com, GroupMe.com and Scvngr.com

July 20, 2008 Failure due to stranded zombies, lasts 5 hours

Feb 15, 2008 Authentication overload leads to two-hour service outage

October 2007 Service failure lasts two days

October 2006 Security breach where users could see other users data

… and their current SLAs don’t match those(99.99%) of enterprises

Amazon EC2 99.95% Amazon S3 99.9%

The Cloud is Falling

Page 13: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Is NOT A Brand-New Technology Utility

Computing

Amazon EC2 (August 2006)

Amazon S3 (March 2006)

Google App Engine (April 2008)

Microsoft Azure (Oct 2008)

GFS MapReduce

BigTable Hadoop

Page 14: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud is just Buzz, and Marketing Hype Campaign

• Cloud computing is simply a buzzword used to repackage grid computing and utility computing, both of which have existed for decades

- Definition of Cloud Computing, whatis.com

• What is it? What is it? ... Is it - 'Oh, I am going to access data on a server on the Internet.' That is cloud computing?

• The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do

- During Oracle’s Analyst Day, Larry Ellison

• .. cloud computing was simply a trap aimed at forcing more people to buy into locked, proprietary systems that would cost them more and more over time

• It's stupidity. It's worse than stupidity: it's a marketing hype campaign

- GNU founder, Richard Stallman

• Server revenue for public cloud computing will grow from $582 million in 2009 to $718 million in 2014

• Server revenue for the much larger private cloud market will grow from $7.3 billion to $11.8 billion in the same time period

- Worldwide Enterprise Server Cloud Computing 2010-2014 Forecast, IDC

Page 15: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Wars and Strategies

Page 16: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

The Cloud Wars

EMC

VMware

SpringSource

CloudFoundry

iSilon GreenPlum

GemStone RabbitMQ

Hyperic

650M

450M

SalesForce.com

RedHat

Force.com

VMForce

Heroku 212M

OpenShift

JBoss 420M

Qumranet 107M KVM SPICE

Citrix

XenSource

Xen

NetScaler

300M

Oracle

Sun

Java MySQL

NexentaStor

ZFS

BtrFS

Ceph

OpenStack

Rackspace

Page 17: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

rPath rBuilder

Quest vControl

3Tera Applogic

Infra is getting more Programmable

Page 18: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Disruptors

• RightScale & enStratus

Page 19: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Disruptors (cont’d)

• Cloud Mgmt. Functionality in enStratus

Page 20: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloudburst and Hybrid Cloud

• CloudSwitch

Page 21: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Who Moved My Cheese?

Disrupting or Disrupted??!

Page 22: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Virtualization

Page 23: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Virtualization on x86 Architecture

• VMM(Virtual Machine Monitor) or Hypervisor – Since VMM must perform in the privileged level(0) , OS is moved to non-privileged

level(1 or 3)

Virtual Machine

Virtual Machine

Virtual Machine

Operating System #1

(Win-XP)

Operating System #2

(Mac-OS)

Operating System #3

(Linux)

app app app app app app

Virtual Machine Manager

CPU Memory NIC Disk

• Problems on x86 Architecture – Privileged Instruction

• Trap when called from CPU user mode, and Emulate its effect by VMM

– Sensitive Non-privileged Instruction • Cause physical state of CPU to leak

– smsw %eax # reads CR0 into EAX – mov %cr0, %edx # reads CR0 into EDX – sub %eax, %edx # what’s the difference? – jnz emulation_flaw # it ought to be zero!!

No Trap, No Emulation => VMM is finally crashed

Page 24: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

CPU Virtualization on x86 Architecture

• How to handle nonvirtualizable instructions – Full virtualization using binary translation – Paravirtualization using hypercalls – Hardware assisted virtualization using root/non-root mode

• VT-x : Virtualization Technology for 32bit CPU • VT-i : Virtualization Technology for 64bit CPU • VT-d : Virtualization Technology for Directed I/O • VT-c : Virtualization Technology for Connectivity

Page 25: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Virtualization on x86 Architecture

Page 26: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Hurdles in Server Virtualization

• Storage Allocation & Interfacing – On-demand, Pre-allocation

– NAS, iSCSI, Local Storage

• VM Management – Snapshot, Fast Cloning, Thin Provisioning, Live Migration, DRS

• Virtual Network – L2/L3 Network Design, Directed, Bridged, NAT, VLAN, Load-Balance, Firewall

• Resource Sharing – Resource Pool, High Availability, Scheduling, Workload Mgmt.

• Migration – P2V, V2V

• Hardware-Assisted Support – Privileged instruction virtualization

• De-privileging or ring compression to handle privileged instructuions

– Memory virtualization

• Memory partitioning and allocation of physical memory

– Device and I/O virtualization

• Routing I/O requests between virtual devices and physical hardware

Page 27: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Storage

Page 28: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

File System with Shared Storage

Cluster File Systems • GFS2 – DLM, scaling to 100 • GlusterFS – fuse, poor performance • Lustre - dfs

Page 29: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Unified Storage using Virtual Block Pool

• NexentaStor based on ZFS

Page 30: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

• GlusterFS

Cluster File System using Virtual Block Pool

Page 31: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

A Feasible SAN File System

• IBM TotalStorage SAN File System

Pooled Storage

• GlusterFS • ZFS • Openfiler • iSCSI+GNBD+DRBD

Page 32: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Network

Page 33: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Basic Virtual Network

Tap vs. Tun • Tap – simulate an Ethernet device and operate with layer 2

packets such as Ethernet frames • Tun(nel) - simulate a network layer device and operate with layer

3 packets such as IP packets • TAP is used to create a network bridge, while TUN is used with

routing.

VDE(Virtual Distributed Ethernet) and VDE Switch

IPTables vs. Bridging • IPTables - let the host forward packets between each tap on its

own subnets • Bridging – let all the taps connect into a specific bridge to put

them on the same subnet

Page 34: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Physical Machine

VLAN-1

Physical Machine

NIC

VM

VM

VM

S/W bridge (br100)

eth

eth

eth

public IP

Physical Machine

NIC

VM

VM

VM

S/W bridge (br100)

eth

eth

eth

public IP

NIC

VM

VM

VM

S/W

bridge (br100)

eth

eth

eth

private IP from dnsmasq

supports VLAN

tagging

manual config

bridge (br101)

dnsmasq

dnsmasq

auto eth0

VPN VM

Nova users

dhcpdiscover

dhcpdiscover

dhcpdiscover

dhcpdiscover

dhcpdiscover

dhcpdiscover

① Flat Mode • manual config. of bridge • get fixed public IP from the

pool

② Flat DHCP Mode • auto config. of bridge • get fixed public IP

③ VLAN DHCP Mode (default) • auto config. of bridge, • auto config. of VLAN: range

of private IPs for project VLAN

• get fixed private IP: iptables + NAT (private/public)

• VLAN: cloudpipe (=openVPN VM template TAP/TUN)

OpenStack Nova Network Virtualization

Page 35: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

CloudStack Network Virtualization

Virtual Private Network for Each Account

Page 36: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

CloudStack Network Virtualization

Detail Virtual Private Network in Node A and Node B

Page 37: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Hurdles in Network Virtualization

• L2 Network – Problem: Scalability, Performance, Security – Solution: VLAN(for Scalability and Security), RBridge(for Scalability and STP Limitation), L2 over

L3

• Multi-tier Networking Design vs. Migration Limitation – Limitation of Spanning Tree Protocol

• Keep Layer 2 networks relatively small and join them together via Layer 3 segments

• But VM migration cannot be live across the multi-tier networks

– Port Consistency • Map the settings such as VLAN, ACL, QoS, and security profiles to all the

network ports • But some VMs are not able to meet required service levels

• L2(Switching) and L3(Routing) Networking Design – Scalability and Efficiency on the service provider side

• Amazon EC2 using L3 – 500,000 VM on 60,000 PM

– Legacy Support on the service consumer side • Amazon VPC, 3Tera AppLogic

– Define virtual network topology – Select IP address range – Create public subnets and private subnets – Configure route table and network gateway

Page 38: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Platform

Page 39: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Cloud Technologies

Anatomy of Cloud Tehcnologies

Page 40: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

API Server • act as the Web services front end for the cloud controller

Compute Controller • compute server resources

Object Store • provide storage services

Auth Manager • provide authentication and authorization services

Volume Controller • provide fast and permanent block-level storage for the compute servers

Cloud Controller • represent the global state and interact with all other components

Network Controller • provide virtual networks to enable compute servers to interact with each other and with the public network

Scheduler • select the most suitable compute controller to host an instance

OpenStack Nova Architecture

Page 41: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Mgmt. Server VM

Host A Host B Mgmt. Server

Load Balancer

Management Servers

Zone

Guest VM

Volumes

attach

Host X

Guest VM

Host Y

Guest VM

Host Z

Primary Shared Storage

Computing Nodes

VM Image

Guest VM

live migration

Max(6*Volumes) per Guest VM

Guest VM

Volumes

attach

Host X

Guest VM

Host Y

Guest VM

Host Z

dynamically adding

Primary Shared Storage

Computing Nodes

VM Image

Guest VM

live migration Cluster

Max(16*Computing Nodes) per Cluster Computing Nodes should be in the same subnet, and homogeneous

Max(6*Volumes) per Guest VM

Pod

Computing Nodes should be in the same subnet, and have no limit to number of nodes

VM Image

snapshot

Templates

ISO images

Secondary Shared Storage

VM Image

copy, create,, boot, attach

CloudStack Architecture

Page 42: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Conclusion

Page 43: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Top 10 Cloud Obstacles and Opportunities

• A View of Cloud Computing, ACM, April, 2010

Page 44: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

2011 Predictions of IaaS, PaaS, and NoSQL

• IaaS Prediction

• Hybrid is the way to go: The public-private cloud discussion isn’t relevant anymore

• Openstack will dominate the open IaaS offering

• PaaS Prediction

• 2011 is the year of PaaS

• CloudFoundry – Vmware

• OpenShift - Redhat

• A new PaaS category will emerge – Building your own PaaS

• CEAP(Cloud Enabled Application Platform) is being specifically designed to handle

multi-tenancy, scalability, and on-demand provisioning, but not higher degree of

flexibility and control

• Application servers will change their name to PaaS – But won’t change their stripes

• VMForce will fail to deliver on its promise => Already open Cloud

• NoSQL (+Big Data) predictions

• NoSQL will become compatible with SQL

• More applications will run entirely In-Memory

• Real-time /Stream-based analytics big will replace majority of the MapReduce batch

processing

• i.e., Yahoo S4, Google’s Percolator

written by Nati Shalom at Gigaspaces http://natishalom.typepad.com/nati_shaloms_blog/2010/12/2011-cloud-paas-nosql-predictions.html

Page 45: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom
Page 46: Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Thank you.