How is OpenID helping Google? Steven Bazyl Developer Advocate .
Google Cloud Computing on Google Developer 2008 Day
-
Upload
programmermag -
Category
Documents
-
view
113 -
download
3
description
Transcript of Google Cloud Computing on Google Developer 2008 Day
![Page 1: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/1.jpg)
Cloud Computing
Ping YehJune 14, 2008
![Page 2: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/2.jpg)
Evolution of Computing with the Network
Cluster Computing
Network Computing
Grid Computing
Utility Computing
Network is computer (client - server)
Separation of Functionalities
Cluster and grid images are from Fermilab and CERN, respectively.
![Page 3: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/3.jpg)
Evolution of Computing with the Network
Cluster Computing
Network Computing
Grid Computing
Utility Computing
Network is computer (client - server)
Tightly coupled computing resources:CPU, storage, data, etcUsually connected within a LANManaged as a single resource
Separation of Functionalities
Commodity, Open Source
Cluster and grid images are from Fermilab and CERN, respectively.
![Page 4: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/4.jpg)
Evolution of Computing with the Network
Cluster Computing
Network Computing
Grid Computing
Utility Computing
Network is computer (client - server)
Tightly coupled computing resources:CPU, storage, data, etcUsually connected within a LANManaged as a single resource
Resource sharing across administrative domains Decentralized, open
standards, non-trivial service
Separation of Functionalities
Commodity, Open Source
Global Resource Sharing
Cluster and grid images are from Fermilab and CERN, respectively.
![Page 5: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/5.jpg)
Evolution of Computing with the Network
Cluster Computing
Network Computing
Grid Computing
Utility Computing
Network is computer (client - server)
Tightly coupled computing resources:CPU, storage, data, etcUsually connected within a LANManaged as a single resource
Don't buy computers, lease computing power
Upload, run, download
Resource sharing across administrative domains Decentralized, open
standards, non-trivial service
Separation of Functionalities
Commodity, Open Source
Global Resource Sharing
Ownership Model
Cluster and grid images are from Fermilab and CERN, respectively.
![Page 6: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/6.jpg)
The Next Step: Cloud Computing
Services and data are in the cloud,accessible with any device
connected to the cloudwith a browser
![Page 7: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/7.jpg)
The Next Step: Cloud Computing
Services and data are in the cloud,accessible with any device
connected to the cloudwith a browser
A key technical issue for developers:
Scalability
![Page 8: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/8.jpg)
Applications on the Web
internet splat map: http://flickr.com/photos/jurvetson/916142/, CC-by 2.0baby picture: http://flickr.com/photos/cdharrison/280252512/, CC-by-sa 2.0
Your user
internet splat map: http://flickr.com/photos/jurvetson/916142/, CC-by 2.0baby picture: http://flickr.com/photos/cdharrison/280252512/, CC-by-sa 2.0
Your CoolestWeb Application
![Page 9: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/9.jpg)
Applications on the Web
internet splat map: http://flickr.com/photos/jurvetson/916142/, CC-by 2.0baby picture: http://flickr.com/photos/cdharrison/280252512/, CC-by-sa 2.0
Your user
internet splat map: http://flickr.com/photos/jurvetson/916142/, CC-by 2.0baby picture: http://flickr.com/photos/cdharrison/280252512/, CC-by-sa 2.0
The Cloud
Your CoolestWeb Application
![Page 10: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/10.jpg)
松下問童子
言師採藥去
只在此山中
雲深不知處
賈島《尋隱者不遇》
I asked the kid under the pine tree,"Where might your master be?""He is picking herbs in the mountain," he said,"the cloud is too deep to know where."
Jia Dao, "Didn't meet the master,"written around 800AD
![Page 11: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/11.jpg)
The Cloud
How many users do you want to have?
Your CoolestWeb Application
![Page 12: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/12.jpg)
The Cloud
How many users do you want to have?
Your CoolestWeb Application
![Page 13: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/13.jpg)
Nov. '98: 10,000 queries on 25 computersApr. '99: 500,000 queries on 300 computersSep. '99: 3,000,000 queries on 2100 computers
Google Growth
![Page 14: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/14.jpg)
Scalability matters
![Page 15: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/15.jpg)
Counting the numbers
Client / Server
One : Many
Personal Computer
One : One
![Page 16: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/16.jpg)
Counting the numbers
Client / Server
One : Many
Personal Computer
One : One
Cloud Computing
Many : Many
Developer transition
![Page 17: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/17.jpg)
What Powers Cloud Computing?
Performance: single machine not interesting
Reliability:o Most reliable hardware will still fail:
fault-tolerant software neededo Fault-tolerant software enables use
of commodity components Standardization: use standardized
machinesto run all kinds of applications
Commodity Hardware
Infrastructure Software
Distributed storage:Google File System (GFS)
Distributed semi-structured datasystem: BigTable
Distributed data processingsystem: MapReduce
chunk ...chunk ...chunk ...chunk ...
/foo/bar
![Page 18: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/18.jpg)
google.stanford.edu (circa 1997)
![Page 19: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/19.jpg)
google.com (1999)
“cork boards"
![Page 20: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/20.jpg)
Google Data Center (circa 2000)
![Page 21: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/21.jpg)
google.com (new data center 2001)
![Page 22: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/22.jpg)
google.com (3 days later)
![Page 23: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/23.jpg)
Current Design
• In-house rack design• PC-class motherboards• Low-end storage and
networking hardware• Linux• + in-house software
![Page 24: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/24.jpg)
How to develop a web application that scales?
Storage Database Serving
Google's solution/replacement
GoogleFile
SystemBigTableMapReduce Google
AppEngine
DataProcessing
![Page 25: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/25.jpg)
How to develop a web application that scales?
Storage Database Serving
Google's solution/replacement
GoogleFile
SystemBigTableMapReduce Google
AppEngine
Published papers
Opened on2008/5/28
DataProcessing
hadoop: open source implementation
![Page 26: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/26.jpg)
Google File System
GFS Client
ApplicationReplicas
Masters
GFS Master
GFS Master
C0 C1
C2C5
Chunkserver
C0
C2
C5
Chunkserver
C1
Chunkserver
…
File namespace
chunk 2ef7chunk ... chunk ... chunk ...
/foo/bar
GFS Client
Application
C5 C3
• Files broken into chunks (typically 64 MB)
• Chunks triplicated across three machines for safety (tunable)
• Master manages metadata
• Data transfers happen directly between clients and chunkservers
![Page 27: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/27.jpg)
GFS Usage @ Google
• 200+ clusters• Filesystem clusters of
up to 5000+ machines• Pools of 10000+ clients• 5+ PB Filesystems
All in the presence of frequent HW failures
![Page 28: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/28.jpg)
BigTable
“www.cnn.com”
“contents:”
Rows
Columns
Timestamps
t3t11
t17
“<html>…”
• Distributed multi-level sparse map: fault-tolerant, persistent
• Scalable:o Thousands of serverso Terabytes of in-memory data,
petabytes of disk-based data• Self-managing
o Servers can be added/removed dynamically
o Servers adjust to load imbalance
Data model:(row, column, timestamp) cell contents
![Page 29: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/29.jpg)
Why not just use commercial DB?
• Scale is too large or cost is too high for most commercial databases
• Low-level storage optimizations help performance significantlyo Much harder to do when running on top of a database layero Also fun and challenging to build large-scale systems :)
![Page 30: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/30.jpg)
System Structure
Lock service
Bigtable master
Bigtable tablet server Bigtable tablet serverBigtable tablet server
GFSCluster scheduling system
…
holds metadata,handles master-electionholds tablet data, logshandles failover, monitoring
performs metadata ops +load balancing
serves data serves dataserves data
BigTable Cell
![Page 31: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/31.jpg)
System Structure
Lock service
Bigtable master
Bigtable tablet server Bigtable tablet serverBigtable tablet server
GFSCluster scheduling system
…
holds metadata,handles master-electionholds tablet data, logshandles failover, monitoring
performs metadata ops +load balancing
serves data serves dataserves data
Bigtable client
Bigtable clientlibrary
Open()read/write
metadata ops
BigTable Cell
![Page 32: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/32.jpg)
BigTable Summary
• Data model applicable to broad range of clientso Actively deployed in many of Google’s services
• System provides high performance storage system on a large scaleo Self-managingo Thousands of serverso Millions of ops/secondo Multiple GB/s reading/writing
• Currently ~500 BigTable cells• Largest bigtable cell manages ~3PB of data spread over
several thousand machines (larger cells planned)
![Page 33: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/33.jpg)
Distributed Data Processing
How do you process 1 month of apache logs to find the usage pattern numRequest[minuteOfTheWeek]?• Input files: N rotated logs• Size: O(TB) for popular sites – multiple physical disks• Processing phase 1: launch M processes
o input: N/M log fileso output: one file of numRequest[minuteOfTheWeek]
• Processing phase 2: merge M output files of step 1
![Page 34: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/34.jpg)
Pseudo Codes for Phase 1 and 2
def findBucket(requestTime):# return minute of the week
numRequest = zeros(1440*7) # an array of 1440*7 zerosfor filename in sys.argv[2:]:for line in open(filename):minuteBucket = findBucket(findTime(line))numRequest[minuteBucket] += 1outFile = open(sys.argv[1], 'w')for i in range(1440*7):outFile.write("%d %d\n" % (i, numRequest[i]))outFile.close()
numRequest = zeros(1440*7) # an array of 1440*7 zerosfor filename in sys.argv[2:]:for line in open(filename):col = line.split()[i, count] = [int(col[0]), int(col[1])]numRequest[i] += count# write out numRequest[] like phase 1
![Page 35: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/35.jpg)
Task Management
• Logistics:o Decide which computers to run phase 1, make sure the log
files are accessible (NFS-like or copy)o Similar for phase 2
• Execution:o Launch the phase 1 programs with appropriate command
line flags, re-launch failed tasks until phase 1 is doneo Similar for phase 2
• Automation: build task scripts on top of existing batch system (PBS, Condor, GridEngine, LoadLeveler, etc)
![Page 36: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/36.jpg)
Technical Issues
• File management: where to store files?o Store all logs on the same file server Bottleneck!➔o Distributed file system: opportunity to run locally
• Granularity: how to decide N and M?o Performance when M until M == N if no I/O contention➚ ➚o Can M > N? Yes! Careful log splitting. Is it faster?
• Job allocation: assign which task to which node?o Prefer local job: knowledge of file system
• Fault-recovery: what if a node crashes?o Redundancy of data a musto Crash-detection and job re-allocation necessary Performance
Robustness
Reusability
![Page 37: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/37.jpg)
MapReduce – A New Model and System
• Map: (in_key, in_value) { (➔ keyj, valuej) | j = 1, ..., K }• Reduce: (key, [value1, ... valueL]) (➔ key, f_value)
Two phases of data processing
![Page 38: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/38.jpg)
MapReduce Programming Model
• Borrowed from functional programming
map(f, [x1, x2, ...]) = [f(x1), f(x2), ...]reduce(f, x0, [x1, x2, x3,...])= reduce(f, f(x0, x1), [x2,...])= ... (continue until the list is exausted)• Users implement two functions:
map (in_key, in_value) ➔(keyj, valuej) list
reduce [value1, ... valueL] ➔ f_value
![Page 39: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/39.jpg)
MapReduce Version of Pseudo Code
def findBucket(requestTime):# return minute of the week
class LogMinuteCounter(MapReduction):def Map(key, value, output): # key is locationminuteBucket = findBucket(findTime(value))output.collect(str(minuteBucket), "1") def Reduce(key, iter, output):sum = 0while not iter.done():sum += 1output.collect(key, str(sum))
• Look! mom, no file I/O!• Only data processing logic...
... and gets much more than that!
![Page 40: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/40.jpg)
MapReduce Framework
For certain classes of problems, the MapReduce framework provides:• Automatic & efficient parallelization/distribution• I/O scheduling: Run mapper close to input data (same
node or same rack when possible, with GFS)• Fault-tolerance: restart failed mapper or reducer tasks on
the same or different nodes• Robustness: tolerate even massive failures, e.g. large-
scale network maintenance: once lost 1800 out of 2000 machines
• Status/monitoring
![Page 41: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/41.jpg)
Task Granularity And Pipelining
• Fine granularity tasks: many more map tasks than machineso Minimizes time for fault recoveryo Can pipeline shuffling with map executiono Better dynamic load balancing
• Often use 200,000 map/5000 reduce tasks w/ 2000 machines
![Page 42: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/42.jpg)
![Page 43: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/43.jpg)
![Page 44: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/44.jpg)
![Page 45: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/45.jpg)
![Page 46: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/46.jpg)
![Page 47: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/47.jpg)
![Page 48: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/48.jpg)
![Page 49: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/49.jpg)
![Page 50: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/50.jpg)
![Page 51: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/51.jpg)
![Page 52: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/52.jpg)
![Page 53: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/53.jpg)
MapReduce: Adoption at Google
MapReduce Programs in Google’s Source Tree
Summer intern effectNew MapReduce Programs Per Month
![Page 54: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/54.jpg)
MapReduce: Uses at Google
• Typical configuration: 200,000 mappers, 500 reducers on 2,000 nodes
• Broad applicability has been a pleasant surpriseo Quality experiments, log analysis, machine translation, ad-
hoc data processing, …o Production indexing system: rewritten w/ MapReduce
~10 MapReductions, much simpler than old code
![Page 55: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/55.jpg)
MapReduce Summary
• MapReduce has proven to be a useful abstraction• Greatly simplifies large-scale computations at Google• Fun to use: focus on problem, let library deal with messy
details• Published
![Page 56: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/56.jpg)
A Data Playground
• Substantial fraction of internet available for processing• Easy-to-use teraflops/petabytes, quick turn-around• Cool problems, great colleagues
MapReduce + BigTable + GFS = Data playground
![Page 57: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/57.jpg)
Query Frequency Over Time
![Page 58: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/58.jpg)
Learning From Data
Searching for Britney Spears...
![Page 59: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/59.jpg)
Open Source Cloud Software: Project Hadoop
• Google published papers on GFS ('03), MapReduce ('04) and BigTable ('06)
• Project Hadoopo An open source project with the Apache Software
Foundationo Implement Google's Cloud technologies in Javao HDFS ("GFS") and Hadoop MapReduce are available,
Hbase ("BigTable") is being developed.• Google is not directly involved in the development: avoid
conflict of interest.
![Page 60: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/60.jpg)
Industrial interest in Hadoop
• Yahoo! hired core Hadoop developerso Announced that their Webmap is produced on a Hadoop
cluster with 2000 hosts (dual/quad cores) on Feb 19, 2008.• Amazon EC2 (Elastic Compute Cloud) supports Hadoop
o Write your mapper and reducer, upload your data and program, run and pay by resource utilisation
o Tiff-to-PDF conversion of 11 million scanned New York Times articles (1851-1922) done in 24 hours on Amazon S3/EC2 with Hadoop on 100 EC2 machines. http://open.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
o Many silicon valley startups are using EC2 and starting to use Hadoop for their coolest ideas on internet-scale of data
• IBM announced "Blue Cloud," will include Hadoop among other software components
![Page 61: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/61.jpg)
Industrial interest in Hadoop
• Yahoo! hired core Hadoop developerso Announced that their Webmap is produced on a Hadoop
cluster with 2000 hosts (dual/quad cores) on Feb 19, 2008.• Amazon EC2 (Elastic Compute Cloud) supports Hadoop
o Write your mapper and reducer, upload your data and program, run and pay by resource utilisation
o Tiff-to-PDF conversion of 11 million scanned New York Times articles (1851-1922) done in 24 hours on Amazon S3/EC2 with Hadoop on 100 EC2 machines. http://open.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
o Many silicon valley startups are using EC2 and starting to use Hadoop for their coolest ideas on internet-scale of data
• IBM announced "Blue Cloud," will include Hadoop among other software components
![Page 62: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/62.jpg)
Industrial interest in Hadoop
• Yahoo! hired core Hadoop developerso Announced that their Webmap is produced on a Hadoop
cluster with 2000 hosts (dual/quad cores) on Feb 19, 2008.• Amazon EC2 (Elastic Compute Cloud) supports Hadoop
o Write your mapper and reducer, upload your data and program, run and pay by resource utilisation
o Tiff-to-PDF conversion of 11 million scanned New York Times articles (1851-1922) done in 24 hours on Amazon S3/EC2 with Hadoop on 100 EC2 machines. http://open.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
o Many silicon valley startups are using EC2 and starting to use Hadoop for their coolest ideas on internet-scale of data
• IBM announced "Blue Cloud," will include Hadoop among other software components
![Page 63: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/63.jpg)
AppEngine
• Run your applications on Google infrastructure and data centerso Focus on your application, forget about machines, operating
systems, web server software, database setup / maintenance, load balancing, etc.
• Opened for public sign-up on 2008/5/28• Python API to Datastore (on top of BigTable) and Users• Free to start, pay as you expand• More details can be found in the AppEngine talks.• http://code.google.com/appengine/
![Page 64: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/64.jpg)
Academic Cloud Computing Initiative
• Google works with top universities on teaching GFS, BigTable and MapReduce in courseso UW, MIT, Stanford, Berkeley, CMU, Marylando First wave in Taiwan: NTU, NCTU
"Parallel Programming" by Professor Pangfeng Liu (NTU) "Web Services and Applications" by Professors Wen-Chih
Peng and Jiun-Lung Huang (NCTU) o Google offers course materials, technical seminars and
student mentoring by Google engineers.• Google and IBM provides a data center for academic use
o Software stack: Linux + Hadoop + IBM's cluster management software
![Page 65: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/65.jpg)
References
• “The Google File System,” Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, Proceedings of the 19th ACM Symposium on Operating Systems Principles, 2003, pp. 20-43. http://research.google.com/archive/gfs-sosp2003.pdf
• “MapReduce: Simplified Data Processing on Large Clusters,” Jeffrey Dean, Sanjay Ghemawat, Communications of the ACM, vol. 51, no. 1 (2008), pp. 107-113. http://labs.google.com/papers/mapreduce-osdi04.pdf
• “Bigtable: A Distributed Storage System for Structured Data,” Fay Chang et al, 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006, pp. 205-218. http://research.google.com/archive/bigtable-osdi06.pdf
• Distributed systems course materials (slides, videos): http://code.google.com/edu/parallel
![Page 66: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/66.jpg)
Summary
• Cloud Computing is about scalable web applications (and data processing needed to make apps interesting)
• Lots of commodity PCs: good for scalability and cost• Build web applications to be scalable from the start
o AppEngine allows developers to use Google's scalable infrastructure and data centers
o Hadoop enables scalable data processing
![Page 67: Google Cloud Computing on Google Developer 2008 Day](https://reader033.fdocuments.us/reader033/viewer/2022051314/54c7f5dc4a7959fe278b4588/html5/thumbnails/67.jpg)
The era of Cloud Computing is here!
Photo by mr.hero on panoramio (http://www.panoramio.com/photo/1127015)
news
peoplebook search
photo
product search
videomaps
e-mails
mobileblogs
groups
calendarscholar
Earth
Sky
web
desktop
translate
messages