Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Version Database

Internet-scale Distributed Systems

Google Spanner a

Synchronously-Replicated Globally-Distributed

Multi-Version Database

22.01.2013 Maciej Jozwiak

Presented by: Maciej Jozwiak


Agenda • Problem description

• Overview of available solutions

• Globally-distributed database

• Architecture

• How is data replicated?

• Data model

• TrueTime API

• Transactions

• Summary



Problem – Need for Scalable MySQL • Google’s advertising backend

– Based on MySQL • Relations

• Query language

– Manually sharded • Resharding is very costly

– Global distribution


SHARDING:

Sharding is another name for "horizontal partitioning" of a database. Rows of a database table are held separately, form a partition which can be located on a separate database server or physical location.

Internet-scale Distributed Systems 22.01.2013 Maciej Jozwiak

• Replicated ACID transactions • Schematized semi-relational tables • Synchronous replication support across data-centers • Performance • Lack of query language

• Scalability • Throughput • Performance • Eventually-consistent replication support across data-centers

Overview of Available Solutions

Google Megastore


Bridging the gap between Megastore and Bigtable


Google Megastore

• Removes the need to manually partition data • Synchronous replication and automatic failover • Strong transactional semantics • SQL based query language • Semi-relational, schematized tables

Solution: Google Spanner


Globally-Distributed Database


Future scale: • one million to 10 million servers • 100s to 1000s locations around the world • 1013 directories • 1018 bytes of storage

cross-datacenter replicated data management: • high availability • minimize latency of data reads and writes • replication configuration dynamically controlled at a fine grain by applications


Spanner Deployment - Universe


Universe master (status + interactive debugging)

Placement driver (move data across

zones automatically)


How Is Data Replicated?


Paxos: protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures.

Spanserver software stack


Replication Configuration

• Replication configurations for data can be dynamically controllered at a fine grain by applications

• Applications can specify constraints to control:

– which datacenters contain which data

– how far data is from user (to control read latency)

– how far replicas are from each other (to control write latency)

– how many replicas are maintained (to control durability, availability, and read performance) • North America: 5 replicas, Europe 2 replicas



Hierarchical Data Model • Universe (Spanner deployment)

– Database

• Tables – Rows and columns

– Must have an ordered set one or more primary key columns

– Primary key uniquely identifies each row

• Hierarchies of tables – Tables must be partioned by client into one or more

hierarchies of tables (INTERLEAVE IN)

– Table in the top – directory table



Storing Photo Metadata





directory table

directory table

Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Version Database

Software

Transcript of Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Version Database