pddb

36
Parallel and distributed databases R & G Chapter 22

description

Database Design

Transcript of pddb

Page 1: pddb

Parallel and distributed databases

R & G Chapter 22

Page 2: pddb

What is a distributed database?

Page 3: pddb

Why distribute a database Scalability and performance

Resilience to failures

Thro

ughp

ut

Data

size

versusX X

Page 4: pddb

Why distribute a database Data is already distributed

Or needs to be distributed

Data is in multiple systems

Page 5: pddb

Why not distribute a database

You must earn your complexity! Communication needed

Must build a complex infrastructure Unpredictable latencies must be masked

More types of failures More components to fail Network failures Congestion, timeouts

More complex planning Communication cost plus I/O cost

May have to deal with heterogeneity Different types of systems Different schemas, possibly incompatible Different administrative domains

Page 6: pddb

Types of distributed databases

Page 7: pddb

The old days: mainframes

Definitely not distributed!

Page 8: pddb

Client-server

User interaction

Data processingNetwork

Page 9: pddb

Parallel database

Page 10: pddb

Primary/secondary

X

Page 11: pddb

Multidatabase

Page 12: pddb

How do they work? What is shared? How to distribute the data? How to process the data? How to update the data?

Page 13: pddb

What is shared? Memory

CPUs RAM Disk

Most modern DBMSs

Page 14: pddb

What is shared? Disk

RAM

Oracle RAC

Page 15: pddb

What is shared? Nothing

RAM

Search engines, Teradata

Page 16: pddb

Server 1 Server 2 Server 3 Server 4

Bike $866/2/07 636353

Chair $106/5/07 662113

How to distribute the data?Couch $5706/1/07 424252

Car $11236/1/07 256623

Lamp $196/7/07 121113

Bike $566/9/07 887734

Scooter $186/11/07 252111

Hammer $80006/11/07 116458

Page 17: pddb

How to distribute the data?

Hash partitioning Range partitioning(key,value)

Hash()

(key,value)

<= X > X

Page 18: pddb

Server 1 Server 2 Server 3 Server 4

How to distribute the data?

Bike

Chair

Couch

Car

Lamp

Bike

Scooter

Hammer

$86

$10

$570

$1123

$19

$56

$18

$8000

6/2/07

6/5/07

6/1/07

6/1/07

6/7/07

6/9/07

6/11/07

6/11/07

636353

662113

424252

256623

121113

887734

252111

116458

Page 19: pddb

Query processing Intra-operator parallelism

Inter-operator parallelism

Page 20: pddb

Parallel scanning

filter filter filter filter filter filter

Result

Page 21: pddb

Sorting

Page 22: pddb

Sorting

Page 23: pddb

Parallel hash join

Hash()

Page 24: pddb

Join

Page 25: pddb

Semi-join

Page 26: pddb

Inter-operator parallelism

Page 27: pddb

Updating distributed data Synchronous: read-any-write-all

Reads are fast

Page 28: pddb

Updating distributed data Synchronous: voting

Page 29: pddb

Updating distributed data Synchronous: voting

Writes tolerant to disconnection

Page 30: pddb

Consistency of distributed data

Should provide ACID

Page 31: pddb

Primary/secondary

Page 32: pddb

Two-phase commit

PREPARE

PREPARED PREPARED

COMMIT

Page 33: pddb

Two-phase commit

PREPARE

PREPARED ABORT

ABORT

Page 34: pddb

Two-phase commit

PREPARE

PREPARED

ABORT

Page 35: pddb

Two-phase commit

PREPARE

PREPARED PREPARED

X

Page 36: pddb

Conclusion Parallelism and distribution very

useful Performance Fault tolerance Scale

But complex! Rethink lots of aspects of the system Must earn the complexity