Bigtable a distributed storage system
-
Upload
devyani-vaidya -
Category
Education
-
view
79 -
download
0
Transcript of Bigtable a distributed storage system
Bigtable: A Distributed Storage System
Presenter: Ku. Devyani B.Vaidya
Dr. Panjabrao
Deshamukh,Amravati
(CO-6G)
Dec 8th , 2011 Dec 8th , 2011
Bigtable: A Distributed Storage System
1. Introduction2. What is a Bigtable? 3. Why not A DBMS? 4. Data model: Row
Column Timestamps
5. APIs
6. Building Blocks
8. Conclusion7.Real Applications
Dec 8th , 2011 Dec 8th , 2011
Introduction
• BigTable is a distributed storage system for managing structured data.
• Designed to scale to a very large size - Petabytes of data across thousands of servers
• Used for many Google projects - Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …
• Flexible, high-performance solution for all of Google’s products
Dec 8th , 2011 Dec 8th , 2011
What is a Bigtable?
• “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, a column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”
Dec 8th , 2011 Dec 8th , 2011
Why not A DBMS?• Few DBMS’s support the requisite scale
– Required DB with wide scalability, wide applicability, high performance and high availability
• Couldn’t afford it if there was one– Most DBMSs require very expensive
infrastructure• DBMSs provide more than Google needs
– E.g., full transactions, SQL• Google has highly optimized lower-level
systems that could be exploited– GFS, Chubby, MapReduce, Job scheduling
Dec 8th , 2011 Dec 8th , 2011
Data model: Row• Row keys are arbitrary strings • Row is the unit of transactional consistency• Data is maintained in lexicographic order by
row key• Rows with consecutive keys (Row Range) are
grouped together as “tablets”.
Dec 8th , 2011 Dec 8th , 2011
Data model: Column• Column keys are grouped into sets called
“column families”, which form the unit of access control.
• Column key is named using the following syntax: family :qualifier
• Access control and disk/memory accounting are performed at column family level
Dec 8th , 2011 Dec 8th , 2011
Data model: timestamps• Each cell in Bigtable can contain multiple
versions of data, each indexed by timestamp• Timestamps are 64-bit integers• Assigned by:
– Bigtable– Client application
• Data is stored in decreasing timestamp order, so that most recent data is easily accessed– Application specifies how many versions (n) of data
items are maintained in a cell - Bigtable garbage-collects cell versions automatically.
Dec 8th , 2011 Dec 8th , 2011
Data ModelExample: Web Indexing
Dec 8th , 2011 Dec 8th , 2011
Data Model
Dec 8th , 2011 Dec 8th , 2011
Data Model
Row
Dec 8th , 2011 Dec 8th , 2011
Data Model
Columns
Dec 8th , 2011 Dec 8th , 2011
Data Model
Cells
Dec 8th , 2011 Dec 8th , 2011
Data Model
timestamps
Dec 8th , 2011 Dec 8th , 2011
Data Model
Column family
Dec 8th , 2011 Dec 8th , 2011
Data Model
Column family
family: qualifier
Dec 8th , 2011 Dec 8th , 2011
Data Model
Column family
family: qualifier
Dec 8th , 2011 Dec 8th , 2011
APIs
• The Bigtable API provides functions :
- Creating and deleting tables and column families.
- Changing cluster , table and column family metadata.
- Support for single row transactions
- Allows cells to be used as integer counters
Dec 8th , 2011 Dec 8th , 2011
Building Blocks
. Bigtable uses the distributed Google File System (GFS) to store log and data files
• The Google SSTable file format is used internally to store Bigtable data
• An SSTable provides a persistent , ordered immutable map from keys to values
Dec 8th , 2011 Dec 8th , 2011
Real Applications • Google Analytics
http://analytics.google.com
• Google Earth & Google Maps http://earth.google.com
• Personalized Search www.google.com/psearch
• Web Indexing• Google Finance• Orkut• Writely
Dec 8th , 2011 Dec 8th , 2011
Conclusion
• Bigtable has achieved its goals of high performance, data availability and scalability.
It has been successfully deployed in real apps (Personalized Search, Orkut, GoogleMaps, …)
• Significant advantages of building own storage system like flexibility in designing data model, control over implementation and other infrastructure on which Bigtable relies on.
Dec 8th , 2011
©2007 The Board of Regents of the University of Nebraska. All rights reserved.
Thanks