Intro to Cloud Computing

65
Intro to Cloud Computing

description

Intro to Cloud Computing. Source: http://www.free-pictures-photos.com/. Cloud Computing. No longer the next big thing – the current big thing Began in 2007 – IBM and Google “Blue Cloud” Name cloud inspired by cloud symbol representing internet in diagrams. What is Cloud Computing?. - PowerPoint PPT Presentation

Transcript of Intro to Cloud Computing

Page 1: Intro to Cloud Computing

Intro to Cloud Computing

Page 2: Intro to Cloud Computing

Source: http://www.free-pictures-photos.com/

Page 3: Intro to Cloud Computing

Cloud Computing• No longer the next big thing – the current big

thing– Began in 2007 – IBM and Google “Blue Cloud”– Name cloud inspired by cloud symbol

representing internet in diagrams

Page 4: Intro to Cloud Computing

What is Cloud Computing?

• But what is it?• Everyone has a different opinion on what it is• Is it trendy?

• “The computer industry is the only industry that is more fashion-driven than women’s fashion”

– Larry Ellison

Page 5: Intro to Cloud Computing

Questions to answer

• What clouds have you used today (yesterday)?

• What is a cloud?

Page 6: Intro to Cloud Computing

Applications

• What does cloud computing actually do?– Consider applications you may currently be

running on laptop, desktop, phone, server– Cloud has them also, or can potentially bring them

to you– Brings applications, views, manipulates, shares

data

Page 7: Intro to Cloud Computing

Cloud Computing

• Everyone has an opinion on what to use a cloud for– Applications on the internet – email, tax prep– Storage for business, personal data– Web services for photos, maps, GPS– Rent a virtual server, load software on it, turn it

on /off, clone it if sudden workload demand– Store, secure data for authorized access (really?)– Use a platform including OS, Apache, MySQL,

Python, PHP

Page 8: Intro to Cloud Computing

Cloud Computing Characteristics

• So what are its characteristics?• AKA On-demand computing, pay as you go,

software as a service, utility computing• Typically access through the internet• Distributed and highly parallel approach• Usually costs $$$, but cost-effective• Virtualization• Elastic • Replication, replication, replication …

Page 9: Intro to Cloud Computing

Cloud Components

• 3 components– Clients

• Mobile, thin, thick

– Datacenter– Distributed servers

Page 10: Intro to Cloud Computing

Data Center

• Data Center– Collection of servers– In large room in your building, across world

• Distributed Servers– Distributed data centers

• geographically disparate• Robust if failure• Dynamic datacenter so can increase as needed

Page 11: Intro to Cloud Computing

Clouds

• Allow access to applications other than on local computer or internet connected device

• Instead, company hosts your application - Advantages?– No more licenses, service packs, etc.– Less hardware, etc.– Can access anywhere but– Works only as long as have internet connection– Lose control – can’t optimize

Page 12: Intro to Cloud Computing

Cloud Computing Characteristics

• Cost-effective – start-up company to use a cloud instead of buy

computers, hire IT people, etc.

• Elastic Computing – company has a temporary surge in business, use

cloud instead of invest in new computing equipment

Page 13: Intro to Cloud Computing

Virtualization• What is virtualization?

– Software implementation of a computer that executes programs like a physical machine

– Installation of one machine runs on another– All software in the cloud runs on a server within virtual

machine– AMD-Virtualization and Intel Virtualization Technologies

(IVT) extensions made it doable

Page 14: Intro to Cloud Computing

Virtualization• Virtual Machine VM

– isolated guest OS installation within a normal host OS– Object of deployment

• Virtual Machine Image – – Static data containing software (OS, apps, data files) the

VM will run once started– Used to create VM instance– Typically stored on disk

• Virtual Machine Instance – – Running virtual machine– Started from image, runs OS and processes, computes,

etc.– Dynamic object you can interact with

Page 15: Intro to Cloud Computing

Virtualization• Hypervisor – Virtual Machine Manager VMM

• One level higher than supervisory program• Installed on server hardware

• Easily create copies of existing environments • Can exist on same servers or different machines• Single server multiple OS instances, minimize CPU idle

time

Hardware

Operating System

App App App

Traditional Stack

Hardware

OS

App App App

Hypervisor

OS OS

Virtualized Stack

Page 16: Intro to Cloud Computing

Elastic - Cloud Computing Characteristics

• Use what you need– Hardware, platform (OS), software

• Cloud infrastructure used depends on application– Massive number of servers needed

OR– Only need one server to run small job

• Company has a temporary surge in business, use cloud instead of invest in new computing equipment

• Company has a decline in business, don’t have to maintain unused equipment

Page 17: Intro to Cloud Computing

Cloud Computing Characteristics• Redundancy

– Redundancy is the key to the success of clouds– Google approach – cheap components that fail, so

replicate all processing and storage

Page 18: Intro to Cloud Computing

What Motivated Cloud Computing

Initial motivation: – Web-scale problems – data intensive

Solutions: – Large data centers

How to access:– Highly-interactive Web applications (thin client)

Next Step: – Different models of computing

Page 19: Intro to Cloud Computing

Data Intensive - How much data?• CERN’s LHC will generate 15 PB a year • Facebook – 2.5 Pb, growing at 15TB per day in 2012?• 25 TB

1000 times volume of mail delivered by USPS

• Sloan Digital Sky Survey – 0.5 PB /month in 2015• “all words ever spoken by human beings” • ~ 5 EB – 1018

Page 20: Intro to Cloud Computing

Solution: Large Data Centers• Although Google famous for innovating web

searching, Google’s architecture as much a revolution– Instead of few expensive servers, use many cheap servers

($5000 instead of $100,000) • 1/2M servers in ~ 12 locations)

• With thin, wide network• Cloud – robust and self-healing

– Uses a lot of power• Need cheaper power solutions

Page 21: Intro to Cloud Computing

The Result:Different Computing Model

Software-as-a-Service (SaaS)

Infrastructure-as-a-Service (IaaS)

Platform-as-a-Service (PaaS)

“Why do it yourself if you can pay someone to do it for you?”

Page 22: Intro to Cloud Computing

IaaS

• Infrastructure as a Service (IaaS) – aka Hardware as a Service (HaaS) and Utility computing– Why buy machines when you can rent cycles?– Utility computing billing – based on what used– Provides basic storage and compute capabilities as

server• Servers, storage systems, CPU cycles, switches,

routers, etc.• Ex: Amazon’s EC2

Page 23: Intro to Cloud Computing

IaaS• Does not provide applications to customers

(SaaS and PaaS do)• Saves cost of purchasing• Infrastructure can be scaled up or down• Multiple tenants can use equipment at the

same time• Device independence – access systems on

different hardware• Low barriers to entry, example?

– e.g. Samba

Page 24: Intro to Cloud Computing

PaaS

• Platform as a Service (PaaS) aka cloudware– Supplies all resources needed to build apps and services

without having to download or install software– Provides a computing platform and solution stack– Customer interacts with platform through API– Layer of software encapsulated provided as service to

build higher level services

– Ex: Google Apps Engine

Page 25: Intro to Cloud Computing

PaaS provides

• Development teams across world to work together

• Merge web services from multiple sources• Cost savings from using built-in security,

scalability and failover• Cost-savings from using higher-level

programming abstractions

Page 26: Intro to Cloud Computing

SaaS

• Software as a Service (SaaS) – web based applications– Software available on cloud for use– Application hosted as a service to customers who

access via the internet – Single instance runs and services multiple end

users– Ex: salesforce.com, Gmail

Page 27: Intro to Cloud Computing

SaaS

• Pros/Cons– Customer doesn’t have to maintain or support SW– Out of customer’s hands when hosting service

changes it– Use software out of box– Instead of just paying for its once, billed– Don’t have to pay as much up front, cheaper more

reliable– Security (SSL used), don’t need VPNs (Virtual private

networks on back-end)

Page 28: Intro to Cloud Computing

Benefits to SaaS• Everyone knows WWW, little training needed• Smaller IT staff needed• Easier to customize• Better marketing by providers, accommodate more• Security (SSL used), don’t need VPNs (Virtual private

networks on back-end)• But:• Specific computational need not addressed – may

have to buy own• Lock-in – can’t move to new vendor without penalty

Page 29: Intro to Cloud Computing

Future of SaaS

• Move all processing power to the cloud and carry ultralight input device– Already happening?

• E-mail• Google Docs• Implications for Microsoft, software as purchasable

local application– Windows Live (Microsoft’s cloud)– Adobe web based photoshop

Page 30: Intro to Cloud Computing

IaaS, PaaS, SaaS

Page 31: Intro to Cloud Computing

When not to use a Cloud

• Legislative Issues– Laws and policy allow freer access to data on a cloud

than private server• FBI can access data without warrant or owner’s consent

• Geopolitical concerns– If in Canada, cannot store data on U.S. cloud – Why?

• (because of patriot act…)

– What about storing your data on clouds outside of USA?

Page 32: Intro to Cloud Computing

Types of Clouds

• Public, Private, Hybrid Clouds• Names do not necessarily dictate location• Type may depend on whether temporary or

permanent

Page 33: Intro to Cloud Computing

Data Bases in Cloud Environments

Based on:Md. Ashfakul Islam

Department of Computer ScienceThe University of Alabama

Page 34: Intro to Cloud Computing

Issues to Consider

• Distributed or Centralized application?• How can ACID guarantees be maintained?• CAPS theorem

– Consistency, Availability, Partition– Data availability (even if network partition) is

achieved by compromising consistency– Traditional consistency techniques become obsolete

• Consistency becomes bottleneck of data management deployment in cloud– Costly to maintain

Page 35: Intro to Cloud Computing

Analytical DBs - Data Warehousing

• Data Warehousing DW - Popular application of Hadoop• Typically DW is relational (OLAP)

– but also semi-structured, unstructured data• Can also be parallel DBs (teradata)

– column oriented– Can be expensive, e.g. TBs of data

• Hadoop for DW– Facebook abandoned Oracle for Hadoop (Hive)– Also Pig – for semi-structured

Page 36: Intro to Cloud Computing

Evaluation of Analytical DB• Analytical DB handles historical data with little or no

updates - no ACID properties.• Elasticity

– Since no ACID – easier• E.g. no updates, so locking not needed

– A number of commercial products support elasticity. • Security

– requirement of sensitive and detailed data– third party vendor store data– potential risk of data leakage and privacy violation

• Replication– Recent snapshot of DB serves purpose.– Strong consistency isn’t required.

Page 37: Intro to Cloud Computing

Transactional Data Management

Needed because:• Transactional Data Management

– heart of database industry– almost all financial transaction conducted

through it– rely on ACID guarantees

• ACID properties are main challenge in transactional DM deployment in Cloud.

Page 38: Intro to Cloud Computing

Relational Joins

• Hadoop is not a DB• Debate between parallel DBs and MR for

OLAPS– Dewitt/Stonebreaker call MR “step backwards”– Parallel faster because can create indexes

Page 39: Intro to Cloud Computing

Consistency in Clouds

• Consistent database must remain consistent after execution of successful operations.

• Inconsistency may cause to problems• Consistency is always sacrificed to achieve

availability and scalability.• Strong consistency maintenance in cloud is

very costly.

Page 40: Intro to Cloud Computing

DBs in the Cloud

• Slow start for DBs – why??• Considered Scalable Transactions for Web

Applications in the Cloud• Two important properties of Web applications

– all transactions are short-lived– data request can be responded to with a small set

of well-identified data items

• Eventual consistency acceptable

Page 41: Intro to Cloud Computing

Cloud Provider DB Options

Page 42: Intro to Cloud Computing

Windows Azure

Page 43: Intro to Cloud Computing
Page 44: Intro to Cloud Computing
Page 45: Intro to Cloud Computing

Data Management

• Can run SQL Server or another DBMS in a VM created with Azure Virtual Machines

• Free to run NoSQL technologies such as MongoDB and Cassandra

• Running your own database system is straightforward- also requires handling the administration of that DBMS

Page 46: Intro to Cloud Computing

Data Management Options

• Figure 3: For data management, Windows Azure provides relational storage, scalable NoSQL tables, and unstructured binary storage.

Page 47: Intro to Cloud Computing

Data Management Options

• Each of the three options addresses a different need:– relational storage– fast access to potentially large amounts of simple typed

data– unstructured binary storage.

• In all cases, data is automatically replicated across three different computers in an Azure datacenter

• All three options can be accessed either by Windows Azure applications or by applications running elsewhere, such as an on-premises datacenter, a laptop, or phone.

Page 48: Intro to Cloud Computing

Relational Storage – SQL Database

• Provides all of the key features of a relational database management system, including– atomic transactions, concurrent data access by

multiple users with data integrity, ANSI SQL queries, and a familiar programming model.

– If know SQL Server, using SQL Database is straightforward.

– can be accessed using Entity Framework, ADO.NET, JDBC

Page 49: Intro to Cloud Computing

SQL Database

• But SQL Database isn't just a DBMS in the cloud-it's a PaaS service.

• You control your data and who can access it and SQL Database takes care of the administrative grunt work– such as managing the hardware infrastructure and

automatically keeping the database and operating system software up to date.

• SQL Database provides a federation option that distributes data across multiple servers. – Spread data access requests across multiple servers for

better performance.

Page 50: Intro to Cloud Computing

Tables

• For application that needs fast access to lots of typed data, it, but doesn't need to perform complex SQL queries

• For storing data, and retrieving it in simple ways

• NOT relational• very scalable, with a single table can hold as

much as a terabyte of data

Page 51: Intro to Cloud Computing

Blobs

• Designed to store unstructured binary data.• Like Tables, Blobs provides inexpensive

storage• Single blob can be as large as one terabyte• Application sees ordinary Windows files, but

the contents are stored in a blob

Page 52: Intro to Cloud Computing

Amazon

Page 53: Intro to Cloud Computing

Amazon• Simple Storage Service S3

– Low-level put/get interface– Store items up to 5GB

• AWS MySQL – traditional model (non-cloud) on EC2• AWS MySQL/R – durability of the data guaranteed by the

Replication architecture– Application server maintains connection to Master copy and

connections to one DB server– Update transactions handled by Master– Read-only transactions issued to DB server associated with

application server

Page 54: Intro to Cloud Computing

Amazon• AWS RDS – relational database service, implements same as AWS

MySQL– RDS is pre-packaged, so users don’t have to worry about

managing deployment of VMs, SW upgrades, etc.• AWS Simple DB – retrieve records based on key values or ranges on

primary and secondary keys– Does not synchronize concurrent read/write access to different

copies of same data– Web service for running queries on structured data– Eventual data consistency is maintained data– Does not support SQL– Works with S3 and EC2 to store, process, query

Page 55: Intro to Cloud Computing

Google

Page 56: Intro to Cloud Computing

Google - App Engine (Megastore)

• Google has PaaS strategy• App Engine uses the data engine Megastore

– Scalable structured data store– Built on BigTable– Partitioned into space of small DBs, each with own log

• Log stored across Paxos cluster (Paxos – protocol for solving consensus in unreliable network

• full ACID semantics within partitions

– Adopted a combined Partitioning and Replication architecture– Lower consistency across partitions– 3B write, 20B read transactions per day as of 1/11– Tables can be arranged hierarchically– Support for secondary indexes

Page 57: Intro to Cloud Computing

Google - App Engine (Megastore)– 3 levels of read consistency

• Current – last committed value• Snapshot – value as of start of read transaction• Inconsistent reads – used for cross entity group reads

– Updates within entity group• Write updates to WAL of entity group, applies to data• Limited by: log contention - one winner, one loser

– Paxos accepts limited update rate (10**2 per sec)– Across entity groups

• 2PC– Support for Backup and recovery

• Synchronous replication, snapshots and incremental log backups

Page 58: Intro to Cloud Computing

Google - App Engine

• AppEngine supports Python, Java with embedded SQL– Used to support simplified SQL dialect, GQL

– GQL – no aggregate functions or joins

Page 59: Intro to Cloud Computing

AWS MySQL AWS MySQL/R

AWS RDS AWS SimpleDB AWS S3 Google AppEng

MS Azure

Business Model

IaaS IaaS PaaS PaaS IaaS PaaS PaaS

Cloud Provider

Flexible Flexible Amazon Amazon Flexible Google Microsoft

Web/app server

Tomcat Tomcat Tomcat Tomcat Tomcat AppEngine .Net Azure

Database MySQL MySQL Rep MySQL SimpleDB none DataStore SQL Azure

Storage / File Sys.

EBS EC2 & EBS - - S3 GFS Windows Azure

Consistency Repeatable Read

Repeatable Read

Repeatable Read

Eventual Consistency

Eventual Consistency

Snapshot Isolation

Snapshot Isolation

App-Language

Java Java Java Java Java Java/AppEngine

C#

DB-Language SQL SQL SQL SimpleDB Queries

low-level API GQL SQL

Architecture Classic Replication Classic Part.+Repl. Distr. Contol Part.+Repl.(+C)

Replication

HW Config. manual manual manual manual/automatic

manual automatic manual/automatic

Table 1: Overview of Cloud Services

Page 61: Intro to Cloud Computing

Cloud SQL

• Google Cloud SQL– Available – One of App Engine’s most requested feature –

• Simple way to develop traditional DB driven applications

– Quicker path to jump off App Engine platform– DB import/export so can move existing MySQL DBs to

cloud– Support for both Java JDBC, Python DB-API connections,

less code change required– No support for PHP on AppEngine, can put PHP apps in

cloud using Quercus

Page 62: Intro to Cloud Computing

Google - Spanner• Previous complaints – no cross row transactions• 2PC too expensive to support because of performance or

availability problems• What is a Spanner?• A huge Semi-Relational Database

– Built on top of Colossus (GFS2)

– Seriously, it's huge!– Scales up to millions of machines– Shards across multiple data centers– Data centers across multiple continents– Lock-free reads– Externally-consistent writes (transactions)– Relational Schema– SQL-like query language– Reasonable performance

Page 63: Intro to Cloud Computing

Google - Spanner

• A Layered System– Relational– Key-Value– Paxos TrueTime Colossus

• Google says that the biggest new idea is TrueTime API

Page 64: Intro to Cloud Computing

Google - Spanner• A table must have a primary key (ordered set of columns)• A table must be marked as a directory or be interleaved in

a parent table• Interleaved data is actually attached to a row in the parent

table• Data is actually stored as key-values

(heterogeneous/interleaved)• ON DELETE CASCADE means to delete when parent row is

deleted

Page 65: Intro to Cloud Computing

Google - Spanner

• Lock-free Read• Lock-free reads using timestamps• Read Transaction System uses latest non-

blocking timestamp• Special non-blocking write transaction