pddb

Parallel and distributed databases

R & G Chapter 22

What is a distributed database?

Why distribute a database Scalability and performance

Resilience to failures

versusX X

Why distribute a database Data is already distributed

Or needs to be distributed

Data is in multiple systems

Why not distribute a database

You must earn your complexity! Communication needed

Must build a complex infrastructure Unpredictable latencies must be masked

More types of failures More components to fail Network failures Congestion, timeouts

More complex planning Communication cost plus I/O cost

May have to deal with heterogeneity Different types of systems Different schemas, possibly incompatible Different administrative domains

Types of distributed databases

The old days: mainframes

Definitely not distributed!

Client-server

User interaction

Data processingNetwork

Parallel database

Primary/secondary

Multidatabase

How do they work? What is shared? How to distribute the data? How to process the data? How to update the data?

What is shared? Memory

CPUs RAM Disk

Most modern DBMSs

What is shared? Disk

Oracle RAC

What is shared? Nothing

Search engines, Teradata

Server 1 Server 2 Server 3 Server 4

Bike $866/2/07 636353

Chair $106/5/07 662113

How to distribute the data?Couch $5706/1/07 424252

Car $11236/1/07 256623

Lamp $196/7/07 121113

Bike $566/9/07 887734

Scooter $186/11/07 252111

Hammer $80006/11/07 116458

How to distribute the data?

Hash partitioning Range partitioning(key,value)

Hash()

(key,value)

<= X > X

Server 1 Server 2 Server 3 Server 4

How to distribute the data?

Scooter

Hammer

6/2/07

6/5/07

6/1/07

6/7/07

6/9/07

6/11/07

636353

662113

424252

256623

121113

887734

252111

116458

Query processing Intra-operator parallelism

Inter-operator parallelism

Parallel scanning

filter filter filter filter filter filter

Result

Sorting

Parallel hash join

Hash()

Semi-join

Inter-operator parallelism

Updating distributed data Synchronous: read-any-write-all

Reads are fast

Updating distributed data Synchronous: voting

Writes tolerant to disconnection

Consistency of distributed data

Should provide ACID

Primary/secondary

Two-phase commit

PREPARE

PREPARED PREPARED

COMMIT

Two-phase commit

PREPARE

PREPARED ABORT

Two-phase commit

PREPARE

PREPARED

Two-phase commit

PREPARE

PREPARED PREPARED

Conclusion Parallelism and distribution very

useful Performance Fault tolerance Scale

But complex! Rethink lots of aspects of the system Must earn the complexity

pddb

Documents

Transcript of pddb

United States Constitution

Bhagavad Gita

I Am a Holocaust Denier and I Am Unafraid

Physical Modelling Synthesis Overview

Acetone Peroxide

Chapter-01

European Colinization of Latin America

Iron Mills Essay

Effective Parenting: Establishing Boundaries

Fortran

Life Is Just A Dream - Or Is It?

Personality Development

The Last Carnival I Ever Saw

Barclays1

Who Killed God

The Best American Humorous Short Stories

(Tesla) - The Tesla Magnetic Car Engine

Star Wars Trivia!

The Dutch Republic In International Trade

Chapter 23