Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

39
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

Page 1: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

Peer-to-peer archival data trading

Brian Cooper and Hector Garcia-Molina

Stanford University

Page 2: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

2 Data trading

Problem: Fragile Data

Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business

Page 3: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

3 Data trading

Replication-based preservation

Page 4: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

4 Data trading

Replication-based preservation

Page 5: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

5 Data trading

Motivation

Several systems use replication Preserve digital collections SAV, others

Archival part of digital library Individual organizations cooperate Not a lot of money to spend

Page 6: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

6 Data trading

Goal Reliable replication of digital collections Given that

Resources are limited Sites are autonomous Not all sites are equal

Traditional methods Central control Random Replicate popular

Metric Reliability Not necessarily “efficiency”

Page 7: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

7 Data trading

Our solution

Data trading “I’ll store a copy of your collection if you’ll store

a copy of mine” Sites make local decisions

Who to trade with How many copies to make How much space to provide Etc.

Page 8: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

8 Data trading

Trading network A series of binary, peer-to-peer trading

links

A

D

B

H

C

E

G

F

Page 9: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

9 Data trading

Reliability layer

Archived data

Architecture

Users Users

Filesystem

InfoMonitor

SAV ArchiveSAV Archive

Archived data

Internet

Local archive

Remote archive

Reliability layer

Page 10: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

10 Data trading

Overview

Trading model Trading algorithm Simulating trading Simulation results

Page 11: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

11 Data trading

Trading model

Page 12: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

12 Data trading

Trading model Archive site: an autonomous archiving

provider

Page 13: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

13 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials

Page 14: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

14 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections

Page 15: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

15 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials

Page 16: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

16 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials Data reliability: probability that data is not

lost

Page 17: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

17 Data trading

Deeds

A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred

Trading algorithm Sites trade deeds Sites exercise deeds to

replicate collections

Deed for spaceFor use by: Library of Congress

or for transfer

623 gigabytes

Stanford University

Page 18: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

18 Data trading

C

A B

Deed trading

Collection 1

Collection 1

Collection 2

Collection 2 Collectio

n 3Collection 3

Page 19: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

19 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2

Collection 1

Collection 2

Collection 3

Page 20: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

20 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 21: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

21 Data trading

Alternative solutions

Are there other ways besides trading?

Page 22: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

22 Data trading

Other solutions: central control

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 23: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

23 Data trading

Other solutions: client-based

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 24: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

24 Data trading

Other solutions: random

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 25: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

25 Data trading

Why is trading good?

High reliability Framework for replication

Site autonomy Make local decisions No submission to external authority

Fairness Contribute more = more reliability Must contribute resources

A

D

B

H

C

E

G

F

Page 26: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

26 Data trading

Decisions facing an archive

Who to trade with Providing space Advertising space Picking a number of copies Joining a cluster Coping with varying site

reliabilities

Page 27: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

27 Data trading

How do we evaluate policies?

Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy

Page 28: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

28 Data trading

Simulation parameters

Number of sites 2 to 15

Site reliability 0.5 to 0.8

Collections per site

4 to 25

Data per collection

50 Gb to 1000 Gb

Space per site 2x data to 7x data

Replication goal 2 to 15 copies

Scenarios per simulation

200

Page 29: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

29 Data trading

Reliability

Site reliability Will a site fail? Example: 0.9 = 10% chance of failure

Data reliability How safe is the data? Despite site failures Example: 320 year MTTF

Page 30: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

30 Data trading

Example: trading strategy

Who should we try to trade with? The most reliable sites? Sites with reliability close to ours? The sites we have traded with before? Some other policy (like random)?

Page 31: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

31 Data trading

1

10

100

1000

10000

0.5 0.6 0.7 0.8 0.9

Local site reliability

Av

era

ge

loc

al d

ata

MT

TF

Clustering MostReliable ClosestReliability

Example: trading strategy

R=0.8

Page 32: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

35 Data trading

Results

Clusters of sites?

Social or political clusters E.g. all universities within a particular state Is the cluster big enough? What if it isn’t?

Result A few archives are sufficient E.g. 5 archives to make 3 copies Too many sites is counter-productive

Page 33: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

36 Data trading

Trading clusters

Page 34: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

39 Data trading

Current and future work Bidding versus direct trading

Local site holds an auction Bids = size of local site’s deed

“Deviant” sites Greedy sites Follow protocol but do not play nice

Access Support searching over collections Distribute indexes via trading

Page 35: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

40 Data trading

Current and future work

Security Will sites actually preserve data? Will they give it to others? Can I protect sensitive information? What if I fail and lose my keys? Can I authenticate myself?

Page 36: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

41 Data trading

Other parts of SAV project SAV data model

Write-once objects Signature-based naming

How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.)

Modeling archival repositories Arturo Crespo Choose best components and design

Page 37: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

42 Data trading

Related work Peer-to-peer replication

SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems

RAID, mirrored disks, replicated databases

Caching systems (Andrew, Coda) Barter/auction based systems

ContractNet Distributed resource allocation

File Allocation Problem

Page 38: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

43 Data trading

Conclusion Important, exciting area

Preservation critical Difficult to accomplish

Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions

Trading networks replicate data Model for trading networks Trading algorithm Simulation results

A

D

B

H

C

E

G

F

Page 39: Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

44 Data trading

For more information

[email protected] http://www-diglib.stanford.edu/