Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
-
Upload
ie-group -
Category
Technology
-
view
2.872 -
download
2
description
Transcript of Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Google Confidential and Proprietary
Woulda, Coulda, Shoulda The World of Tera, Peta & Exa
Stephen McHenry
Chancellor of Site Reliability Engineering
April 22, 2009
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
Google’s Mission
To organize the world’s information and make it universally accessible and useful
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
One of our earliest storage systems
Lego Disk Case
Google Confidential and Proprietary
Peak of google.stanford.edu (circa 1997)
Google Confidential and Proprietary
The Infamous “Corkboard”
Google Confidential and Proprietary
Many Corkboards (1999)
Google Confidential and Proprietary
A Data Center in 1999…
Google Confidential and Proprietary
Another Data Center, Spring 2000
Note the Cooling
Google Confidential and Proprietary
google.com (new data center 2001)
Google Confidential and Proprietary
google.com (3 days later)
Google Confidential and Proprietary
Current Data Center
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • The Challenge • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
Just For Reference
Terabyte – 1012 Bytes -1,000,000,000,000 Bytes
Petabyte – 1015 Bytes – 1000 Terabytes 1,000,000,000,000,000 Bytes
Exabyte – 1018 Bytes – 1 Million Terabytes 1,000,000,000,000,000,000 Bytes
Zettabyte – 1021 Bytes – 1 Billion Terabytes 1,000,000,000,000,000,000,000 Bytes
Yottabyte – 1024 Bytes – 1 Trillion Terabytes 1,000,000,000,000,000,000,000,000 Bytes
Google Confidential and Proprietary
How much information is out there?
How large is the Web? • Tens of billions of documents? Hundreds?
• ~10KB/doc => 100s of Terabytes
Then there’s everything else • Email, personal files, closed databases, broadcast media, print, etc.
Estimated 5 Exabytes/year (growing at 30%)*
800MB/year/person – ~90% in magnetic media
Web is just a tiny starting point
Source: How much information 2003
Google Confidential and Proprietary
Google takes its mission seriously
Started with the Web (html)
Added various document formats • Images • Commercial data: ads and shopping (Froogle) • Enterprise (corporate data) • News • Email (Gmail) • Scholarly publications • Local information • Maps • Yellow pages • Satellite images • Instant messaging and VoIP • Communities (Orkut) • Printed media • …
Google Confidential and Proprietary
Ever-Increasing Computation Needs
more queries
better results
more data Every Google service sees
continuing growth in computational needs • More queries
More users, happier users
• More data Bigger web, mailbox, blog, etc.
• Better results Find the right information, and
find it faster
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • The Challenge • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
When Your Data Center Reaches 170o F o
Google Confidential and Proprietary
The Joys of Real Hardware
Typical first year for a new cluster:
~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover) ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back) ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours) ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~5 racks go wonky (40-80 machines see 50% packetloss) ~8 network maintenances (4 might cause ~30-minute random connectivity losses) ~12 router reloads (takes out DNS and external vips for a couple minutes) ~3 router failures (have to immediately pull traffic for an hour) ~dozens of minor 30-second blips for dns ~1000 individual machine failures ~thousands of hard drive failures
slow disks, bad memory, misconfigured machines, flaky machines, etc.
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • The Challenge • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
Components of Web Search
Crawler (Spider): Collects the documents • Tradeoff between size and speed • High networking bandwidth requirements • Be gentle to serving hosts while doing it
Indexer: Generates the index - similar to the back of a book (but big!) Requires several days on thousands of computers More than 20 billion web documents
• Web, Images, News, Usenet messages, …
Pre-compute query-independent ranking (PageRank, etc)
Query serving: Processes user queries Finding all relevant documents
• Search over tens of Terabytes, 1000s of times/second
Scoring - Mix of query dependent and independent factors
List of
links to
explore
Expired pages from index
Get link from list
Fetch page
Add to queue
Parses page to
extract links
Crawling process
Add URL
Google Confidential and Proprietary
Google Web Server
Spell checker
Ad Server
I0 I1 I2 IN
I0 I1 I2 IN
I0 I1 I2 IN
Rep
licas
…
…
Index shards
D0 D1 DM
D0 D1 DM
D0 D1 DM R
eplic
as …
…
Doc shards
query Misc. servers
Index servers Doc servers
Elapsed time: 0.25s, machines involved: 1000+
Google Query Serving Infrastructure
Google Confidential and Proprietary
Ads System
As challenging as search • But with some transactional semantics
Problem: find useful ads based on what the user is interested in at that moment • A form of mind reading
Two systems • Ads for search results pages (search for tires or restaurants)
• Ads for web browsing/email (or ‘content ads’) Extract a contextual meaning from web pages Do the same thing for data from a gazillion advertisers
Match those up and score them
Do it faster than the original content provider can respond to the web page!
Google Confidential and Proprietary
Example: Sunday NY Times
Google Confidential and Proprietary
Language Translation (by Machine)
Information is more useful if more people can understand it
Translation is a long-standing, challenging Artificial Intelligence problem
Key insight: • Transform it into a statistical modeling problem
• Train it with tons of data!
Arabic-English Chinese-English Doubling training corpus size
~0.5% higher score
Google Confidential and Proprietary
Data + CPUs = Playground
Substantial fraction of internet available for processing
Easy-to-use teraflops/petabytes
Cool problems, great fun…
Google Confidential and Proprietary
Searching for Britney Spears…
Learning From Data
Google Confidential and Proprietary
Query Frequency Over Time
Queries containing “eclipse”
Queries containing “full moon”
Queries containing “watermelon” Queries containing “opteron”
Queries containing “summer olympics”
Queries containing “world series”
Google Confidential and Proprietary
WhiteHouse.gov/openforquestions
Google Confidential and Proprietary
A Simple Challenge For Our Computing Platform
1. Create the world’s largest computing infrastructure
2. Make sure we can afford it
Need to drive efficiency of the computing infrastructure to unprecedented levels indices containing more documents
updated more often
faster queries
faster product development cycles
…
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • The Challenge • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
Systems Infrastructure
Google File System (GFS)
Map Reduce
Big Table
Google Confidential and Proprietary
GFS: Google File System
Planning – For unprecedented quantities of data storage & failure(s)
Google has unique FS requirements • Huge read/write bandwidth • Reliability over thousands of nodes • Mostly operating on large data blocks • Need efficient distributed operations
GFS Usage @ Google • Many clusters • Filesystem clusters of up to 5000+ machines • Pools of 10000+ clients • 5+ PB Filesystems • 40 GB/s read/write load in single cluster • (in the presence of frequent HW failures)
Google Confidential and Proprietary
GFS Setup
• Master manages metadata • Data transfers happen directly between clients/
machines
Client
Client
Misc. servers
Client Repl
icas
Masters
GFS Master
GFS Master
C0 C1
C2 C5
Machine 1
C0
C2
C5
Machine N
C1
C3 C5
Machine 2
…
Google Confidential and Proprietary
MapReduce – Large Scale Processing
Okay, GFS lets us store lots of data… now what?
We need to process that data in new and interesting ways! • Fast: locality optimization, optimized sorter, lots of tuning work done... • Robust: handles machine failure, bad records, … • Easy to use: little boilerplate, supports many formats, … • Scalable: can easily add more machines to handle more data or reduce the
run-time • Widely applicable: can solve a broad range of problems • Monitoring: status page, counters, …
The Plan – Develop a robust compute infrastructure that allows rapid development of complex analyses, and is tolerant to failure(s)
Google Confidential and Proprietary
MapReduce – Large Scale Processing
MapReduce: • a framework to simplify large-scale computations on large clusters
• Good for batch operations • User writes two simple functions: map and reduce
• Underlying library/framework takes care of messy details
• Greatly simplifies large, distributed data processing
Sawmill (Logs Analysis) Search My History
Search quality Spelling
Web search indexing …many other internal projects ...
Ads Froogle
Google Earth Google Local Google News Google Print
Machine Translation
Lots of uses inside Google
Google Confidential and Proprietary
Large Scale Processing – (semi) Structured Data
Why not just use commercial DB? • Scale is too large for most commercial databases • Even if it weren’t, cost would be very high
Building internally means system can be applied across many projects for low incremental cost
• Low-level storage optimizations help performance significantly Much harder to do when running on top of a database layer
Okay, traditional relational databases are woefully inadequate at this scale… now what?
The Plan – Build a large scale, distributed solution for semi-structured data, that is resistant to failure(s)
Google Confidential and Proprietary
Large Scale Processing – (semi) Structured Data
BigTable:
• A large-scale storage system for semi-structured data • Database-like model, but data stored on thousands of machines.. • Fault-tolerant, persistent • Scalable
Thousands of servers Terabytes of in-memory data Petabytes of disk-based data Millions of reads/writes per second, efficient scans billions of URLs, many versions/page (~20K/version) Hundreds of millions of users, thousands of queries/sec 100TB+ of satellite image data
• Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance
• Design/initial implementation started beginning of 2004
Google Confidential and Proprietary
BigTable Usage
Useful for structured/semi-structured data URLs - Contents, crawl metadata, links, anchors, pagerank, … Per-user data - User preference settings, recent queries/search results, … Geographic data - Physical entities, roads, satellite imagery, annotations, …
Production use or active development for ~70 projects: Google Print My Search History Orkut Crawling/indexing pipeline Google Maps/Google Earth Blogger …
Currently ~500 BigTable cells Largest bigtable cell manages ~3000TB of data spread over several
thousand machines (larger cells planned)
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • The Challenge • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
A Simple Challenge For Our Computing Platform
1. Create the world’s largest computing infrastructure
2. Make sure we can afford it
Need to drive efficiency of the computing infrastructure to unprecedented levels indices containing more documents
updated more often
faster queries
faster product development cycles
…
Google Confidential and Proprietary
Innovative Solutions Needed In Several Areas
Server design and architecture
Power efficiency
System software
Large scale networking
Performance tuning and optimization
System management and repairs automation
Google Confidential and Proprietary
• Brainstorming Circa 2003
• Container-based data centers
• Battery per server instead of traditional UPS
o 99.9% efficient backup power!
• Application of best practices leads to PUE below 1.2
Pictorial History
Google Confidential and Proprietary
Prototype arriving at Google, Jan 2005
Pictorial History
Google Confidential and Proprietary
Pictorial History
The first crane was too small -- Take 2
Google Confidential and Proprietary
Pictorial History
Google prototypes first airborne data center
Google Confidential and Proprietary
Pictorial History
And into the parking garage we go
Google Confidential and Proprietary
Data Center Vitals
• Capacity: 10 MW IT load
• Area: 75000 sq ft total under roof
• Overall power density: 133W/sq ft
• Prototype container delivered January 2005
• Data center built 2004-2005
• Construction completed September, 2005
• Went live November 21, 2005
Google Confidential and Proprietary
Additional Vitals
• 45 containers, approx. 40000 servers
• Single and 2-story on facing sides of hangar
• Bridge crane for container handling
Google Confidential and Proprietary
Overview
• Mission Statement • Some History • The Challenge • Planning for
• Failure • Expansion
• Applications • Infrastructure • Hardware
• The Future
Google Confidential and Proprietary
Planning for the Future
• Manage Total Cost of Ownership
• Reduce Water Usage
• Reduce Power Consumption
• Manage E-Waste
Google Confidential and Proprietary
Total Cost of Ownership - TCO
Earnings and sustainability are (often) aligned • Careful application of best practices leads
to much lower energy use which leads to lower TCO for facilities – Examples:
o Manage air flow - avoid hot/cold mixing
o Raise the inlet temperature
o Use free cooling (Belgium has no chillers!)
o Optimize power distribution • Don't need exotic technologies • But: need to break down traditional silos
o Between capex and opex
o Between facilities and IT
o Manage everyone by impact on TCO
Google Confidential and Proprietary
Water resources management is the next "elephant in the room" we are all
going to have to address.
Google Confidential and Proprietary
Lake Powell 53% full
Shasta Lake
(from ESPN!)
A Great Wave Rising: The coming U.S. crisis in water policy
Google Confidential and Proprietary
Lake Oroville - new docks
Lake Mead historical levels
Lake Mead - 45% full
* Scripps Institution of Oceanography, UCSD,
Feb 2008.
Lake Mead water could dry up by 2021*
Google Confidential and Proprietary
February 11, 2008 March 4, 2007
Georgia’s Lake Lanier
Google Confidential and Proprietary
Lake Hartwell, GA – November 2008
Google Confidential and Proprietary
References:
U.S. Dept. of Energy – Energy Demands On Water Resources – Dec., 2006
National Renewable Energy Laboratory - Consumptive Water Use for U.S. Power Production - Dec., 2003
USGS - Water Use At Home - Jan., 2009
Water – The Next “Big Elephant”
Why?
• Water resources are becoming (a lot) scarcer and more variable
How do data centers fit in?
• For every 10 MW consumed, the average data center uses ~150,000 gallons of water per day for cooling.
• Upstream of the data center, the same 10 MW of delivered power consumes 480,000 gallons of water per day to generate that power.
Google Confidential and Proprietary
Factoid: The typical 'water-less' DC uses about a third more water than the evaporatively cooled Google DC
Using less power is the most significant factor for reducing water consumption
Water Consumption (gpd) by DC Type
Google Confidential and Proprietary
Our data center in St. Ghislain, Belgium
Google's data center in Belgium uses 100% reclaimed
water from an industrial canal
Water Recycling:
Google Confidential and Proprietary
Fact: The typical PC wastes half the electricity it uses
Fact: Over 60% of all corporate PCs are left on overnight ________________________________________________
• End-user devices are the largest portion of IT footprint • Power efficiency is critical as billions of devices are deployed • The technology exists today to save energy and money
Buy power efficient laptops / PCs / servers Google saves $30 per server every year
Enable power management Power management suites: ROI < 1 year
Transition to lightweight devices Reduce power from 150W to less than 5W
Potential: 50% emissions reduction
Power - Cutting waste / Smarter computing
Google Confidential and Proprietary
• Hazardous
• High volume because of obsolescence
• Ubiquitous (computers, appliances, consumer electronics, cell phones) Solutions
• 4 R's: Reduce, reuse, repair, recycle
• Dispose of remainder responsibly
E-waste is a Growing Problem
Google Confidential and Proprietary
Thank you!