Infrastructure for cloud_computing

Infrastructure for Cloud Infrastructure for Cloud Computing

Dahai Li

2008/06/12

• About Cloud Computing

• Tools for Cloud Computing in Google

• Google’s partnerships with universities

Agenda

What’s new?

• Data safety and reliability

• Data synchronization between different devices

• Low requirement of end device

Advantages

• Unlimited potential of the cloud

Cloud for end user

Google Cloud

Cloud for web developer

Google Cloud

Example: Earthquake map based on Map API

Agenda

google.stanford.edu (circa 1997)

google.com (1999)

Google Data Center (circa 2000)

Google File System (GFS)

Why GFS?

• Google has unusual requirements

• Unfair advantage

• Fun and challenging to build large-scale systems

Client

GFS Architecture

Google

19%Replic

Masters

GFS Master

Client

Chunkserver 1

Chunkserver N

Chunkserver 2

Master

• Maintain Metadata:

– File namespace

– Access control info

– Maps files to chunks

• Control system activities:

– Monitor state of chunkservers

– Chunk allocation and placement

– Initiate chunk recovery and rebalancing

– Garbage collect dead chunks

– Collect and display stats, admin functions15

Client

• Protocol implemented by client library

• Read protocol

GFS Usage in Google Cloud

• 50+ clusters

• Filesystem clusters of up to 1000+ machines

• Pools of 1000+ clients

• 10+ GB/s read/write load

– in the presence of frequent hardware failures

MapReduce

What’s MapReduce

• A simple programming model that applies to many large-scale computing problems

• Hide messy details in MapReduce runtime library

Typical problem solved by MapReduce

• Read a lot of data

• Map: extract something you care about from each record

• Shuffle and Sort

• Reduce: aggregate, summarize, filter, or • Reduce: aggregate, summarize, filter, or transform

• Write the results

More specifically…

• Programmer specifies two primary methods:

– map(k, v) → <k', v'>*

– reduce(k', <v'>*) → <k', v'>*

• All v' with same k' are reduced together, in order.order.

Example: Word Frequencies in Web Pages

• Input is files with one document per record

• Specify a map function that takes a key/value pair

– key = document URL

– value = document contents

• Output of map function is (potentially many) key/value • Output of map function is (potentially many) key/value pairs.

– In our case, output (word, “1”) once per word in the document

<“网页1”, “是也不是”>

<“是”, “1”><“也”, “1”><“不”, “1”>…

• MapReduce library gathers together all pairs with the same key (shuffle/sort)

• The reduce function combines the values for a keyIn our case, compute the sum

Continued: word frequencies in web pages

key = “不”values = “1”

key = “是”values = “1”, “1”

key = “也”values = “1”

“是”, “2”“也”, “1”“不”, “1”

values = “1”

“1”

values = “1”, “1”

“2”

values = “1”

“1”

• Output of reduce (usually 0 or 1 value) paired with key and saved

Example: Pseudo-code

Map(String input_key, String input_value):

// input_key: document name

// input_value: document contents

for each word w in input_values:

EmitIntermediate(w, "1");

Reduce(String key, Iterator intermediate_values):

// key: a word, same for input and output

// intermediate_values: a list of counts

int result = 0;

for each v in intermediate_values:

result += ParseInt(v);

Emit(AsString(result));

Conclusion to MapReduce

• MapReduce has proven to be a remarkably-useful abstraction

• Greatly simplifies large-scale computations at Google

• Fun to use: focus on problem, let library deal with messy details

• Many thousands of parallel programs written by • Many thousands of parallel programs written by hundreds of different programmers in last few years

– Many had no prior parallel or distributed programming experience

BigTable

Overview

• Structure data storage, not database

• Wide applicability

• Scalability

• High performance• High performance

• High availability

Basic Data Model

• Distributed multi-dimensional sparse map

(row, column, timestamp) � cell contents

… t1

COLUMNS“contents”

“<html>…”

www.cnn.com

TIMESTAMPS

• Good match for most of our applications

BigTable API

• Metadata operations

– Create/delete tables, column families, change metadata

• Writes (atomic)

– Set(): write cells in a row

– DeleteCells(): delete cells in a row– DeleteCells(): delete cells in a row

– DeleteRow(): delete all cells in a row

• Reads

– Scanner: read arbitrary cells in a bigtable

System Structure

Bigtable master

performs metadata ops,load balancing

Bigtable cellBigtable client

Bigtable clientlibrary

Open()

Cluster Scheduling Master

handles failover, monitoring

holds tablet data, logs

Lock service

holds metadata,handles master-election

Bigtable tablet server

serves data

Current status of BigTable

• Design/initial implementation started beginning of 2004

• Currently ~100 BigTable cells

• Production use or active development for many projects:

– Google Print

– My Search History

– Orkut– Orkut

– Crawling/indexing pipeline

– Google Maps/Google Earth

– Blogger

– …

• Largest bigtable cell manages ~200TB of data spread over several thousand machines (larger cells planned)

Typical Cluster

Scheduling masters

Machine 1

GFS masterLock service

Machine N

UserUser

Machine 2

chunkserver

Scheduler

User app2

…GFS

chunkserver

Scheduler

User app2

chunkserver

Scheduler

Agenda

ACCI in Oct. 2007

• Stand for Academic Cloud Computing Initiative

• IBM and Google partnership

• Facilitate universities education with distributed system programming skillsdistributed system programming skills

• Started from University of Washington and scaling to many others

Google’s ACCI activities in Greater China

• Google Greater China has helped create a cloud computing course at Tsinghua in summer 2007

• Now scaling to other mainland China and Taiwan UniversitiesTaiwan Universities

Example: THU MR Course, Fall 2007

• “Massive Data Processing” course based on Google Cloud technology

• Google employees gave lectures during the course offering;

• Got interesting results from the smart studentsstudents

• http://hpc.cs.tsinghua.edu.cn/dpcourse/

Count: THU MR Course, Fall 2007

Students presenting course

project “simulating the operation

of solar system based on

MapReduce technology” at

Google office

Massive data processing to

simulate the operation of

the solar system

THANK YOU

More info on

http://code.google.com/intl/zh-CN/

Infrastructure for cloud_computing

Technology

Transcript of Infrastructure for cloud_computing

Saas and cloud_computing

THE CLOUD - bccd.dkmedia.bccd.dk/media/Events/2017/Cloud_computing/17.03.10... · 2017. 3. 10. · “The Cloud: Reshaping Industries and Society” Representatives from industry

Aspectos legales cloud_computing

Physical Infrastructure for a Resilient Converged ... Infrastructure... · Physical Infrastructure for a Resilient Converged Plantwide Ethernet Architecture ... physical infrastructure

Improving public investment efficiency for infrastructure ... Infrastructure... · Improving public investment efficiency for infrastructure development ... Appraisal Guidelines ...

Recycling our Infrastructure for Future Generations · Recycling our Infrastructure for Future ... How do you bring together infrastructure needs and long ... Recycling our Infrastructure

EVOLUTION OF CLOUD COMPUTING - gpwsirsa.edu.ingpwsirsa.edu.in/...Cloud_Computing...22_1566456169.pdfThe concept of Cloud Computing came into existence in the year 1950 with implementation

For further information: Infrastructure and Information Science Infrastructure: –Civil Infrastructure Engineering –Transportation.

Roadmap for Your Infrastructure The Gartner Infrastructure

Inter-Cloud Infrastructure · Inter-Cloud Infrastructure Cloud Infrastructure for Big Data Analysis Inter-Cloud Infrastructure for Big Data Analysis

Structuring Innovative Sukuk for Infrastructure … INNOVATIVE SUKUK FOR INFRASTRUCTURE FINANCING ... Murabaha 59 Murabaha –Mudaraba ... Structuring Innovative Sukuk for Infrastructure

Structuring Ddebt-financing-for-infrastructure-projectsebt Financing for Infrastructure Projects

An Infrastructure Modelling Tool for Cloud … Infrastructure Modelling Tool for Cloud ... such as DevOps. ... infRastructure modellinG tool for clOud provisioNing),

Public-Private Partnerships for Highway Infrastructure · PDF filepublic-private partnerships for highway infrastructure. ... Public-Private Partnerships for Highway Infrastructure:

Internet and Web Programming - Uttarakhand …uru.ac.in/uruonlinelibrary/Cloud_Computing/Internet and...10.10 Abstract Classes Method 10.11 Final and Inheritance 10.12 Object Class

Transforming Your IT Infrastructure for Improved ROI...Converged infrastructure reduced network cost Effect of Converged Infrastructure Infrastructure capital costs for virtualized

BASIC CONCEPTS OF COMPUTERuru.ac.in/uruonlinelibrary/Cloud_Computing/Basics of Computer.pdf · computer system. It is the place in a computer where the operating system, application

GridKa School 2011 Cloud Computing Workshopgridka-school.scc.kit.edu/2011/...Cloud_Computing...3. On-Demand Self-Service: The on-demand and self-service aspects of cloud computing

Microsoft (ECI) Enrollment for Core Infrastructure · Enrollment for Core Infrastructure ... Microsoft (ECI) Enrollment for Core Infrastructure ... •System Center Data Protection

6 -cloud_computing