Open Talk Series
presents
A series of illuminating talks and
interactions that open our minds to new
ideas and concepts; that makes us look for
newer or better ways of doing what we
did; or point us to exciting things we have
never done before. A range of topics on
Technology, Business, Fun and Life.
Be part of the learning experience at Aditi.
Join the talks. Its free. Free as in freedom at work, not free-beer.
Its not training. Its mind-opener.
Speak at these events. Or bring an
expert/friend to talk.
Mail [email protected] with topic and
availability.
Usually at 4.30PM Wednesdays.
Learning and Development
HOW TO ENJOY AN TALK
Switch OFF mobile Switch ON mind
Sign attendance sheet
Bring coffee & friends
THANK the Talker
SHARE your wisdom QUESTION notions
SPREAD the good word
Aditi Technologies | Partnering Innovation
New Champion
Sahil Sagar
Aditi Technologies | Partnering Innovation
4
Agenda
• We are not talking about crawler
• No discussion on PageRank… maybe?
Aditi Technologies | Partnering Innovation
5
The art of scale
10-50 users 100-500 users 500-10000
Aditi Technologies | Partnering Innovation
6
Scale ????
Largest Linux Base
800,000 Machines
Aditi Technologies | Partnering Innovation
7
• What gives us this scale?
Good Code?
More servers?
Powerful Servers?
Aditi Technologies | Partnering Innovation
8
• Lets see what gives Google the scale
The apps on top of it.
The Secret Sauce
Infrastructure
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
Aditi Technologies | Partnering Innovation
9
Scale in Google
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
1. The first touch
2. Size does matter
3. The Safe
4. Operating System Implementation
5. Interior Network Architecture
Aditi Technologies | Partnering Innovation
10
The first touch to the services
Aditi Technologies | Partnering Innovation
11
The first touch to the service
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
CellInterior Network
GFS II etc
Firewall80/443
NetScalarhttp multiplexing
SquidReverse Proxy
GWSWeb Server Farm
FirewallDMZPerimeter
Client Browser80/443
Aditi Technologies | Partnering Innovation
12
The touch is not always real
• Uses Squid Reverse Proxy • Perimeter Cache hit rates 30-60% = Huge!
• Dependent on search complexity/user preferences/traffic
type
• All Image Thumbnails caches, much Multimedia cached
• Expensive common queries cached (common words like ‘Obama‘) as they require significant back-end processing.
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ Squid
Reverse Proxy
80/443 80/443
Aditi Technologies | Partnering Innovation
13
Size does matter
Aditi Technologies | Partnering Innovation
14
Worldwide Data Centres
Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of 800K machines.
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
Aditi Technologies | Partnering Innovation
15
The Modular Data Centre
Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power Consumption in 30 racks (40U).
This is the “Atomic“ Data Centre Building Block of Google.
A Data Centre would consist of 100‘s of Modular Cells.
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
Aditi Technologies | Partnering Innovation
16
THE Safe
How is a server stored in the Data Centre?
Aditi Technologies | Partnering Innovation
17
Google Rack (GOOG rack)
EVERYTHING custom!
• Optimized Motherboards
• Have their own HW builds
• Build redundancy on top of failure
• Motherboard directly mounted into Rack
• Servers have no casing - just bare boards
• Assist with heat dispersal issues
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
Aditi Technologies | Partnering Innovation
18
THE OPERATING SYSTEM
The Core Software on each of those servers
Aditi Technologies | Partnering Innovation
19
OPERATING SYSTEM
-100% Redhat Linux Based since 1998 inception - RHEL
- 2.6.X Kernel - PAE - Custom glibc.. rpc... ipvs... - Custom FS (GFS II) - Custom Kerberos - Custom NFS - Custom CUPS - Custom gPXE bootloader - Custom EVERYTHING.....
Kernel/Subsystem Modifications tcmalloc – replaces glibc 2.3 malloc – much faster! works very well with threads... rpc – the rpc layer extensively modified to provide > perf increase < latency (52%/40%) Significantly modified Kernel and Subsystems – all IPv6 enabled
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
Aditi Technologies | Partnering Innovation
20
THE Secret Sauce
Aditi Technologies | Partnering Innovation
21
Section II – Googles Major Glue
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
1. Google File System Architecture – GFS II 2. Google Database - Bigtable 3. Google Computation - Mapreduce
Aditi Technologies | Partnering Innovation
22
GOOGLE FILE SYSTEM
Manages the underlying Data on behalf of the upper layers and ultimately the applications
Aditi Technologies | Partnering Innovation
GFS versus NFS
• Single machine makes part of its file system available to other machines
• Sequential or random access
• PRO: Simplicity, generality, transparency
• CON: Storage capacity and throughput limited by single server
23 University of Pennsylvania
Single virtual file system spread over many machines
Optimized for sequential read and local accesses
PRO: High throughput, high capacity
"CON": Specialized for particular types of applications
Network File System (NFS) Google File System (GFS)
Aditi Technologies | Partnering Innovation
24
FILE SYSTEM I – GFS II
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable Mapreduce
BigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
Elegant Master Failover Chunk Size is now 1MB Only ever lost one 64MB chunk (in GFS I) during its entire production deployment so assumed extremely reliable
Aditi Technologies | Partnering Innovation
CAP Theorem (Brewer's theorem)
• Consistency: All nodes see the same data at the same time
• Availability: Node failures do not prevent survivors from continuing to operate
• Partition tolerance: The system continues to operate despite arbitrary message loss
25
Aditi Technologies | Partnering Innovation
26
GOOGLE DATABASE
Accesses the underlying Data on behalf of the upper layers and ultimately the applications
Aditi Technologies | Partnering Innovation
Why not commercial DB?
• Scale is too large for most commercial databases
• Cost would be very high – Building internally means system can be applied
across many projects for low incremental cost
• Low-level storage optimizations help performance significantly – Much harder to do when running on top of a database
layer
“Also fun and challenging to build large-scale systems”
27
Aditi Technologies | Partnering Innovation
BigTable
• A distributed storage system for managing structured data. • Scalable
– Thousands of servers
– Terabytes of in-memory data
– Petabyte of disk-based data
– Millions of reads/writes per second, efficient scans • Self-managing
– Servers can be added/removed dynamically
– Servers adjust to load imbalance • Used for many Google projects
– Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …
28
Aditi Technologies | Partnering Innovation
BigTable
• Physically sorted on row-key – like a row-store
• Column families - like column-stores
• Variable (record-by-record) columns within a column family
• Column-values versioned; stored in reverse chronological order
• Designed to store hyperlink structure of web
Aditi Technologies | Partnering Innovation
30
GOOGLE MAPREDUCE
Computes the underlying Data on behalf of the applications
Aditi Technologies | Partnering Innovation
31
Mapreduce I
Map Reduction can be seen as a way to exploit massive parallelism by breaking a task down into constituent parts and executing on multiple processors The Major Functions are MAP & REDUCE (with a number of intermediatary steps) MAP Break task down into parallel steps REDUCE Combine results into final output
Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline) Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!)
SERVER HARDWARE
RHEL 2.6.X PAE
RACK
INTERIOR NETWORK IPv6
GFS / GFS II
BigTable MapreduceBigTable
Chubby Lock
GOOGLE APP
ENGINE
Python, Java, C++,
Sawzall, other
DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Architecture
Python. Java.
C++
Exterior Network
GWQ
Aditi Technologies | Partnering Innovation
Word-Count using MapReduce
Problem: determine the frequency of each word in a large document collection
Aditi Technologies | Partnering Innovation
What runs on top of all this
33
Aditi Technologies | Partnering Innovation
PageRank: Intuition
• Imagine a contest for The Web's Best Page – Initially, each page has one vote
– Each page votes for all the pages it has a link to
– To ensure fairness, pages voting for more than one page must split their vote equally between them
– Voting proceeds in rounds; in each round, each page has the number of votes it received in the previous round
– In practice, it's a little more complicated - but not much! 34
A
B E
C
D F
G
H
I
J
Shouldn't E's vote be worth more than F's?
How many levels should we consider?
Aditi Technologies | Partnering Innovation
Random Surfer Model
• PageRank has an intuitive basis in random walks on graphs
• Imagine a random surfer, who starts on a random page and, in each step,
– with probability d, clicks on a random link on the page
– with probability 1-d, jumps to a random page (bored?)
• The PageRank of a page can be interpreted as the fraction of steps the surfer spends on the corresponding page
35
Aditi Technologies | Partnering Innovation
36
BUILD YOUR OWN GOOGLE
The Basic Open Source Tools
Aditi Technologies | Partnering Innovation
37
The Google Stack (vs Yahoo‘ish/Open Source)
SERVER HARDWARE SERVER HARDWARE
RHEL 2.6.X PAE CentOS 2.6.X PAE
RACK RACK
INTERIOR NETWORK IPv6 INTERIOR NETWORK IPv6
GFS / GFS II HDFS (hadoop)
Hadoop FrameworkMapreduce
Hbase (Bigtable equiv.)
Mapreduce
BigTable
Chubby Lock
Pig Latin, Python, PHP, Java ....
anything
Python, Java, C++,
Sawzall, other
CLIENT APPLICATION
DC DC
GOOGLE APPS
SEARCH
INDEX
CRAWL
GMAIL...
Conceptual Overview
Google vs. Open Source
Architecture
Open Source(Yahoo’ish)
Architecture
Exterior Network Exterior Network
GWQ Job Tracker
Googles
Secret Sauce
Hadoop
Open Source(Other Tools such as crawlers, indexers readily available)
BigTable
Python, Java,
C++,
APP ENGINE
Task Queue
Aditi Technologies | Partnering Innovation
38
END
(Thankyou)
Aditi Technologies | Partnering Innovation
39
Pre Presentation The Google Philosophy (according to ed)
• Jedis build their own lightsabres (the MS Eat your own Dog Food)
• Parallelize Everything
• Distribute Everything (to atomic level if possible)
• Compress Everything (CPU cheaper than bandwidth)
• Secure Everything (you can never be too paranoid)
• Cache (almost) Everything
• Redundantize Everything (in triplicate usually)
• Latency is VERY evil
Aditi Technologies | Partnering Innovation
The Anatomy of the Google Architecture “The unofficial Version“
V1.0 November 2009
• Ed Austin • {ed, edik} @i-dot.com
Special Thanks to ….
Keep Learning
For any suggestions on topics/ feedbacks etc., Contact [email protected]
Top Related