Bigtable a distributed storage system

23
Bigtable: A Distributed Storage System Presenter: Ku. Devyani B.Vaidya Dr. Panjabrao Deshamukh,Amravati (CO-6G)

Transcript of Bigtable a distributed storage system

Page 1: Bigtable a distributed storage system

Bigtable: A Distributed Storage System

Presenter: Ku. Devyani B.Vaidya

Dr. Panjabrao

Deshamukh,Amravati

(CO-6G)

Page 2: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Bigtable: A Distributed Storage System

1. Introduction2. What is a Bigtable? 3. Why not A DBMS? 4. Data model: Row

Column Timestamps

5. APIs

6. Building Blocks

8. Conclusion7.Real Applications

Page 3: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Introduction

• BigTable is a distributed storage system for managing structured data.

• Designed to scale to a very large size - Petabytes of data across thousands of servers

• Used for many Google projects - Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …

• Flexible, high-performance solution for all of Google’s products

Page 4: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

What is a Bigtable?

• “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, a column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”

Page 5: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Why not A DBMS?• Few DBMS’s support the requisite scale

– Required DB with wide scalability, wide applicability, high performance and high availability

• Couldn’t afford it if there was one– Most DBMSs require very expensive

infrastructure• DBMSs provide more than Google needs

– E.g., full transactions, SQL• Google has highly optimized lower-level

systems that could be exploited– GFS, Chubby, MapReduce, Job scheduling

Page 6: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data model: Row• Row keys are arbitrary strings • Row is the unit of transactional consistency• Data is maintained in lexicographic order by

row key• Rows with consecutive keys (Row Range) are

grouped together as “tablets”.

Page 7: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data model: Column• Column keys are grouped into sets called

“column families”, which form the unit of access control.

• Column key is named using the following syntax: family :qualifier

• Access control and disk/memory accounting are performed at column family level

Page 8: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data model: timestamps• Each cell in Bigtable can contain multiple

versions of data, each indexed by timestamp• Timestamps are 64-bit integers• Assigned by:

– Bigtable– Client application

• Data is stored in decreasing timestamp order, so that most recent data is easily accessed– Application specifies how many versions (n) of data

items are maintained in a cell - Bigtable garbage-collects cell versions automatically.

Page 9: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data ModelExample: Web Indexing

Page 10: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

Page 11: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

Row

Page 12: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

Columns

Page 13: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

Cells

Page 14: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

timestamps

Page 15: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

Column family

Page 16: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

Column family

family: qualifier

Page 17: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Data Model

Column family

family: qualifier

Page 18: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

APIs

• The Bigtable API provides functions :

- Creating and deleting tables and column families.

- Changing cluster , table and column family metadata.

- Support for single row transactions

- Allows cells to be used as integer counters

Page 19: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Building Blocks

. Bigtable uses the distributed Google File System (GFS) to store log and data files

• The Google SSTable file format is used internally to store Bigtable data

• An SSTable provides a persistent , ordered immutable map from keys to values

Page 20: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Real Applications • Google Analytics

http://analytics.google.com

• Google Earth & Google Maps http://earth.google.com

• Personalized Search www.google.com/psearch

• Web Indexing• Google Finance• Orkut• Writely

Page 21: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Conclusion

• Bigtable has achieved its goals of high performance, data availability and scalability.

It has been successfully deployed in real apps (Personalized Search, Orkut, GoogleMaps, …)

• Significant advantages of building own storage system like flexibility in designing data model, control over implementation and other infrastructure on which Bigtable relies on.

Page 22: Bigtable a distributed storage system

Dec 8th , 2011 Dec 8th , 2011

Source

1. www.google.com

2. www.studymafia.org

Page 23: Bigtable a distributed storage system

Dec 8th , 2011

©2007 The Board of Regents of the University of Nebraska. All rights reserved.

Thanks