BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than...
Transcript of BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than...
![Page 1: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/1.jpg)
BigTableCS 452
![Page 2: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/2.jpg)
Announcements
Thursday’s sections are help sessions for Lab 3
No class on Friday so that you can focus on Lab 3
![Page 3: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/3.jpg)
BigTable
In the early 2000s, Google had way more data than anybody else did
Traditional databases couldn’t scale
Want something better than a filesystem (GFS)
BigTable optimized for:
- Lots of data, large infrastructure
- Relatively simple queries
Relies on Chubby, GFS
![Page 4: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/4.jpg)
Chubby
![Page 5: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/5.jpg)
Chubby
Distributed coordination service
Goal: allow client applications to synchronize and manage dynamic configuration state
Intuition: only some parts of an app need consensus!
- Lab 2: Highly available view service
- Master election in a distributed FS (e.g. GFS)
- Metadata for sharded services
Implementation: (Multi-)Paxos SMR
![Page 6: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/6.jpg)
Why Chubby?
Many applications need coordination (locking, metadata, etc).
Every sufficiently complicated distributed system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of Paxos
Paxos is a known good solution
(Multi-)Paxos is hard to implement and use
![Page 7: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/7.jpg)
How to do consensus as a serviceChubby provides:
- Small files
- Locking
- “Sequencers”
Filesystem-like API
- Open, Close, Poison
- GetContents, SetContents, Delete
- Acquire, TryAcquire, Release
- GetSequencer, SetSequencer, CheckSequencer
![Page 8: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/8.jpg)
Back to BigTable
![Page 9: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/9.jpg)
Uninterpreted strings in rows and columns
(r : string) -> (c : string) -> (t : int64) -> string
Mostly schema-less; column “families” for access
Data sorted by row name
- lexicographically close names likely to be nearby
Each piece of data versioned via timestamps
- Either user- or server-generated
- Control garbage-collection
![Page 10: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/10.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 11: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/11.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 12: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/12.jpg)
Tablets
Each table composed of one or more tablets
Starts at one, splits once it’s big enough
- Split at row boundaries
Tablets ~100MB-200MB
a datab datac datad data
![Page 13: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/13.jpg)
Tablets
Each table composed of one or more tablets
Starts at one, splits once it’s big enough
- Split at row boundaries
Tablets ~100MB-200MB
a datab datac datad datae data
![Page 14: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/14.jpg)
Tablets
Each table composed of one or more tablets
Starts at one, splits once it’s big enough
- Split at row boundaries
Tablets ~100MB-200MB
a datab data
c datad datae data
![Page 15: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/15.jpg)
Tablets
A tablet is indexed by its range of keys
- <START> - “c”
- “c” - <END>
Each tablet lives on at most one tablet server
Master coordinates assignments of tablets to servers
![Page 16: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/16.jpg)
Tablets
Tablet locations stored in METADATA table
Root tablet stores locations of METADATA tablets
Root tablet location stored in Chubby
![Page 17: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/17.jpg)
Tablet serving
Tablet data persisted to GFS
- GFS writes replicated to 3 nodes
- One of these nodes should be the tablet server!
Three important data structures:
- memtable: in-memory map
- SSTable: immutable, on-disk map
- Commit log: operation log used for recovery
![Page 18: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/18.jpg)
Tablet serving
Writes go to the commit log, then to the memtable
Reads see a merged view of memtable + SSTables
- Data could be in memtable or on disk
![Page 19: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/19.jpg)
Compaction and compression
Memtables spilled to disk once they grow too big
- “minor compaction”: converted to SSTable
Periodically, all SSTables for a tablet compacted
- “major compaction”: many SSTables -> one
Compression: each block of an SSTable compressed
- Can get enormous ratios with text data
- Locality helps—similar web pages in same block
![Page 20: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/20.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 21: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/21.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 22: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/22.jpg)
Master
Tracks tablet servers (using Chubby)
Assigns tablets to servers
Handles tablet server failures
![Page 23: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/23.jpg)
Master startup
- Acquire master lock in Chubby
- Find live tablet servers (each tablet server writes its identity to a directory in Chubby)
- Communicate with live servers to find out who has which tablet
- Scan METADATA tablets to find unassigned tablets
![Page 24: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/24.jpg)
Master operation
Detect tablet server failures
- Assign tablets to other servers
Merge tablets (if they fall below a size threshold)
Handle split tablets
- Splits initiated by tablet servers
- Master responsible for assigning new tablet
Clients never read from master
![Page 25: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/25.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 26: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/26.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 27: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/27.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Where is the root tablet?
![Page 28: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/28.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Tablet server 2
![Page 29: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/29.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 30: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/30.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Where is the METADATA tablet for table T row R?
![Page 31: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/31.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Tablet server 1
![Page 32: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/32.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 33: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/33.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Where is table T row R?
![Page 34: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/34.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Tablet server 3
![Page 35: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/35.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
![Page 36: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/36.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Read table T row R
![Page 37: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/37.jpg)
BigTable components
Client
Master
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
Tablet Server
GFS
Row
![Page 38: BigTable - courses.cs.washington.eduBigTable In the early 2000s, Google had way more data than anybody else did Traditional databases couldn’t scale Want something better than a](https://reader033.fdocuments.us/reader033/viewer/2022060310/5f0a8b387e708231d42c2741/html5/thumbnails/38.jpg)
Optimizations
Clients cache tablet locations
Tablet servers only respond if Chubby session active, so this is safe
Locality groups
Put column families that are infrequently accessed together in separate SSTables
Smart caching on tablet servers
Bloom filters on SSTables