Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Post on 20-May-2015

1.575 views 1 download

Tags:

description

This was presented for the subject, Rethinking the Cloud, at NEXCOM 2011.This presentation includes cloud business and strategy, and covers overall cloud technologies such as virtualization, storage, network, and cloud platform.

Transcript of Rethinking the cloud_-_limitations_and_oppotunities_-_2011_nexcom

Woohyun Kim

Cloud Platform Team

S-Core

2011-05-18

Rethinking the Cloud - A View of Virtualization, Storage, Network, and Platform

Cloud Success Stories

What is the Cloud?

• A computing environment to elastically provide virtualized resources as a

service over the Internet in a pay-as-you-go manner

Amazon’s Challenge and Paradigm Shift

Success Cases in Amazon SmugMug(http://www.smugmug.com/) • an online photo storage application that stores more than half a petabyte of data on S3

• estimates cost savings on service and storage to be close to $1 million

37Signals(http://37signals.com/) • maker of popular online project-management software Basecamp, uses S3 for storage

needs.

New York Times(http://www.nytimes.com) • use EC2 to process terabytes of archival data using hundreds of EC2 instances within 36

hours

Animoto(http://animoto.com/) • an online presentation video generator that needs gobs of computing power for video

processing

• recently successfully withstood a surge in Web traffic that would kill most companies’

systems by scaling up their processing power quickly using EC2 with RightScale

• Animoto ramped from 25,000 users to 250,000 users in three days, signing up

20,000 new users per hour at peak

• Using RightScale, EC2 instances automatically scaled out 40 to 4000 at that time

• For more detail, refer to http://blog.rightscale.com/2008/04/23/animoto-facebook-

scale-up/

Powerset had a great idea, “Natural Language Search”

It should index millions of pages of data and content

They knew that this would require a massively large datacenter and extensive computing power CPUs, terminal switches, cable, racks, datacenters, hosting, power,

maintenance, staffs

But they needed to keep infrastructure costs at a minimum

6

Start-up Company: Powerset

“By using Amazon EC2, Powerset is able to match the infrastructure of large scale search companies on a

startup budget.” - Barney Pell, Founder and CEO of Powerset

““Amazon EC2 is a complete game-changer. EC2 and Amazon Web Services make it easy for start-ups to build a complete infrastructure without having to spend much

on capital .”- Paul Hammann

Powerset had a great idea, “Natural Language Search”

It should index millions of pages of data and content

They knew that this would require a massively large datacenter and extensive computing power CPUs, terminal switches, cable, racks, datacenters, hosting, power,

maintenance, staffs

But they needed to keep infrastructure costs at a minimum

7

Start-up Company: Powerset

“By using Amazon EC2, Powerset is able to match the infrastructure of large scale search companies on a

startup budget.” - Barney Pell, Founder and CEO of Powerset

““Amazon EC2 is a complete game-changer. EC2 and Amazon Web Services make it easy for start-ups to build a complete infrastructure without having to spend much

on capital .”- Paul Hammann

$100 millions

The New York Times is a 150-year old company, and serves the largest newspaper Website, NYTimes.com 1 billion page views per month

20+ million monthly unique visitors

They tried to convert TIFF images to PDFs TIFF images(405,000),

Articles(3.3 million) in SGML PNG images(810,000)

XML files(405,000) mapping articles to TIFFs JavaScript files(405,000)

8

Temporary & Data-intensive : The New York Times

“I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2

instances, and generated another 1.5TB of data to store in S3. It just costs $3000.” - Derek Gottfrid

“I had was this: upload 4TB of source data into S3, write some code that would run on numerous EC2 instances to read the source data, create PDFs, and store the results back into S3.

S3 would then be used to serve the PDFs to the general public.” - Derek Gottfrid

The New York Times is a 150-year old company, and serves the largest newspaper Website, NYTimes.com 1 billion page views per month

20+ million monthly unique visitors

They tried to convert TIFF images to PDFs TIFF images(405,000),

Articles(3.3 million) in SGML PNG images(810,000)

XML files(405,000) mapping articles to TIFFs JavaScript files(405,000)

9

“I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2

instances, and generated another 1.5TB of data to store in S3. It just costs $3000.”

“I had was this: upload 4TB of source data into S3, write some code that would run on numerous EC2 instances to read the source data, create PDFs, and store the results back into S3.

S3 would then be used to serve the PDFs to the general public.”

Temporary & Data-intensive : The New York Times

\ 64,865,381,400

Cloud Skepticism

Amazon’s cloud outages receive a lot of exposure …

April 21 ~ 22, 2011 A networking glitch made its storage volumes automatically create back-ups of themselves, filling up storage capacity and causing connectivity issues, lasts two days Amazon’s customers include start-ups like the social networking site Foursquare but also big companies like Pfizer, Netflix and Nasdaq dkagh Affected web sites included Quora.com, Reddit.com, GroupMe.com and Scvngr.com

July 20, 2008 Failure due to stranded zombies, lasts 5 hours

Feb 15, 2008 Authentication overload leads to two-hour service outage

October 2007 Service failure lasts two days

October 2006 Security breach where users could see other users data

… and their current SLAs don’t match those(99.99%) of enterprises

Amazon EC2 99.95% Amazon S3 99.9%

The Cloud is Falling

Cloud Is NOT A Brand-New Technology Utility

Computing

Amazon EC2 (August 2006)

Amazon S3 (March 2006)

Google App Engine (April 2008)

Microsoft Azure (Oct 2008)

GFS MapReduce

BigTable Hadoop

Cloud is just Buzz, and Marketing Hype Campaign

• Cloud computing is simply a buzzword used to repackage grid computing and utility computing, both of which have existed for decades

- Definition of Cloud Computing, whatis.com

• What is it? What is it? ... Is it - 'Oh, I am going to access data on a server on the Internet.' That is cloud computing?

• The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do

- During Oracle’s Analyst Day, Larry Ellison

• .. cloud computing was simply a trap aimed at forcing more people to buy into locked, proprietary systems that would cost them more and more over time

• It's stupidity. It's worse than stupidity: it's a marketing hype campaign

- GNU founder, Richard Stallman

• Server revenue for public cloud computing will grow from $582 million in 2009 to $718 million in 2014

• Server revenue for the much larger private cloud market will grow from $7.3 billion to $11.8 billion in the same time period

- Worldwide Enterprise Server Cloud Computing 2010-2014 Forecast, IDC

Cloud Wars and Strategies

The Cloud Wars

EMC

VMware

SpringSource

CloudFoundry

iSilon GreenPlum

GemStone RabbitMQ

Hyperic

650M

450M

SalesForce.com

RedHat

Force.com

VMForce

Heroku 212M

OpenShift

JBoss 420M

Qumranet 107M KVM SPICE

Citrix

XenSource

Xen

NetScaler

300M

Oracle

Sun

Java MySQL

NexentaStor

ZFS

BtrFS

Ceph

OpenStack

Rackspace

rPath rBuilder

Quest vControl

3Tera Applogic

Infra is getting more Programmable

Cloud Disruptors

• RightScale & enStratus

Cloud Disruptors (cont’d)

• Cloud Mgmt. Functionality in enStratus

Cloudburst and Hybrid Cloud

• CloudSwitch

Who Moved My Cheese?

Disrupting or Disrupted??!

Virtualization

Virtualization on x86 Architecture

• VMM(Virtual Machine Monitor) or Hypervisor – Since VMM must perform in the privileged level(0) , OS is moved to non-privileged

level(1 or 3)

Virtual Machine

Virtual Machine

Virtual Machine

Operating System #1

(Win-XP)

Operating System #2

(Mac-OS)

Operating System #3

(Linux)

app app app app app app

Virtual Machine Manager

CPU Memory NIC Disk

• Problems on x86 Architecture – Privileged Instruction

• Trap when called from CPU user mode, and Emulate its effect by VMM

– Sensitive Non-privileged Instruction • Cause physical state of CPU to leak

– smsw %eax # reads CR0 into EAX – mov %cr0, %edx # reads CR0 into EDX – sub %eax, %edx # what’s the difference? – jnz emulation_flaw # it ought to be zero!!

No Trap, No Emulation => VMM is finally crashed

CPU Virtualization on x86 Architecture

• How to handle nonvirtualizable instructions – Full virtualization using binary translation – Paravirtualization using hypercalls – Hardware assisted virtualization using root/non-root mode

• VT-x : Virtualization Technology for 32bit CPU • VT-i : Virtualization Technology for 64bit CPU • VT-d : Virtualization Technology for Directed I/O • VT-c : Virtualization Technology for Connectivity

Virtualization on x86 Architecture

Hurdles in Server Virtualization

• Storage Allocation & Interfacing – On-demand, Pre-allocation

– NAS, iSCSI, Local Storage

• VM Management – Snapshot, Fast Cloning, Thin Provisioning, Live Migration, DRS

• Virtual Network – L2/L3 Network Design, Directed, Bridged, NAT, VLAN, Load-Balance, Firewall

• Resource Sharing – Resource Pool, High Availability, Scheduling, Workload Mgmt.

• Migration – P2V, V2V

• Hardware-Assisted Support – Privileged instruction virtualization

• De-privileging or ring compression to handle privileged instructuions

– Memory virtualization

• Memory partitioning and allocation of physical memory

– Device and I/O virtualization

• Routing I/O requests between virtual devices and physical hardware

Storage

File System with Shared Storage

Cluster File Systems • GFS2 – DLM, scaling to 100 • GlusterFS – fuse, poor performance • Lustre - dfs

Unified Storage using Virtual Block Pool

• NexentaStor based on ZFS

• GlusterFS

Cluster File System using Virtual Block Pool

A Feasible SAN File System

• IBM TotalStorage SAN File System

Pooled Storage

• GlusterFS • ZFS • Openfiler • iSCSI+GNBD+DRBD

Network

Basic Virtual Network

Tap vs. Tun • Tap – simulate an Ethernet device and operate with layer 2

packets such as Ethernet frames • Tun(nel) - simulate a network layer device and operate with layer

3 packets such as IP packets • TAP is used to create a network bridge, while TUN is used with

routing.

VDE(Virtual Distributed Ethernet) and VDE Switch

IPTables vs. Bridging • IPTables - let the host forward packets between each tap on its

own subnets • Bridging – let all the taps connect into a specific bridge to put

them on the same subnet

Physical Machine

VLAN-1

Physical Machine

NIC

VM

VM

VM

S/W bridge (br100)

eth

eth

eth

public IP

Physical Machine

NIC

VM

VM

VM

S/W bridge (br100)

eth

eth

eth

public IP

NIC

VM

VM

VM

S/W

bridge (br100)

eth

eth

eth

private IP from dnsmasq

supports VLAN

tagging

manual config

bridge (br101)

dnsmasq

dnsmasq

auto eth0

VPN VM

Nova users

dhcpdiscover

dhcpdiscover

dhcpdiscover

dhcpdiscover

dhcpdiscover

dhcpdiscover

① Flat Mode • manual config. of bridge • get fixed public IP from the

pool

② Flat DHCP Mode • auto config. of bridge • get fixed public IP

③ VLAN DHCP Mode (default) • auto config. of bridge, • auto config. of VLAN: range

of private IPs for project VLAN

• get fixed private IP: iptables + NAT (private/public)

• VLAN: cloudpipe (=openVPN VM template TAP/TUN)

OpenStack Nova Network Virtualization

CloudStack Network Virtualization

Virtual Private Network for Each Account

CloudStack Network Virtualization

Detail Virtual Private Network in Node A and Node B

Hurdles in Network Virtualization

• L2 Network – Problem: Scalability, Performance, Security – Solution: VLAN(for Scalability and Security), RBridge(for Scalability and STP Limitation), L2 over

L3

• Multi-tier Networking Design vs. Migration Limitation – Limitation of Spanning Tree Protocol

• Keep Layer 2 networks relatively small and join them together via Layer 3 segments

• But VM migration cannot be live across the multi-tier networks

– Port Consistency • Map the settings such as VLAN, ACL, QoS, and security profiles to all the

network ports • But some VMs are not able to meet required service levels

• L2(Switching) and L3(Routing) Networking Design – Scalability and Efficiency on the service provider side

• Amazon EC2 using L3 – 500,000 VM on 60,000 PM

– Legacy Support on the service consumer side • Amazon VPC, 3Tera AppLogic

– Define virtual network topology – Select IP address range – Create public subnets and private subnets – Configure route table and network gateway

Cloud Platform

Cloud Technologies

Anatomy of Cloud Tehcnologies

API Server • act as the Web services front end for the cloud controller

Compute Controller • compute server resources

Object Store • provide storage services

Auth Manager • provide authentication and authorization services

Volume Controller • provide fast and permanent block-level storage for the compute servers

Cloud Controller • represent the global state and interact with all other components

Network Controller • provide virtual networks to enable compute servers to interact with each other and with the public network

Scheduler • select the most suitable compute controller to host an instance

OpenStack Nova Architecture

Mgmt. Server VM

Host A Host B Mgmt. Server

Load Balancer

Management Servers

Zone

Guest VM

Volumes

attach

Host X

Guest VM

Host Y

Guest VM

Host Z

Primary Shared Storage

Computing Nodes

VM Image

Guest VM

live migration

Max(6*Volumes) per Guest VM

Guest VM

Volumes

attach

Host X

Guest VM

Host Y

Guest VM

Host Z

dynamically adding

Primary Shared Storage

Computing Nodes

VM Image

Guest VM

live migration Cluster

Max(16*Computing Nodes) per Cluster Computing Nodes should be in the same subnet, and homogeneous

Max(6*Volumes) per Guest VM

Pod

Computing Nodes should be in the same subnet, and have no limit to number of nodes

VM Image

snapshot

Templates

ISO images

Secondary Shared Storage

VM Image

copy, create,, boot, attach

CloudStack Architecture

Conclusion

Top 10 Cloud Obstacles and Opportunities

• A View of Cloud Computing, ACM, April, 2010

2011 Predictions of IaaS, PaaS, and NoSQL

• IaaS Prediction

• Hybrid is the way to go: The public-private cloud discussion isn’t relevant anymore

• Openstack will dominate the open IaaS offering

• PaaS Prediction

• 2011 is the year of PaaS

• CloudFoundry – Vmware

• OpenShift - Redhat

• A new PaaS category will emerge – Building your own PaaS

• CEAP(Cloud Enabled Application Platform) is being specifically designed to handle

multi-tenancy, scalability, and on-demand provisioning, but not higher degree of

flexibility and control

• Application servers will change their name to PaaS – But won’t change their stripes

• VMForce will fail to deliver on its promise => Already open Cloud

• NoSQL (+Big Data) predictions

• NoSQL will become compatible with SQL

• More applications will run entirely In-Memory

• Real-time /Stream-based analytics big will replace majority of the MapReduce batch

processing

• i.e., Yahoo S4, Google’s Percolator

written by Nati Shalom at Gigaspaces http://natishalom.typepad.com/nati_shaloms_blog/2010/12/2011-cloud-paas-nosql-predictions.html

Thank you.