Data Staging on Untrusted Surrogates

24
Data Staging on Untrusted Surrogates Jason Flinn Shafeeq Sinnamohideen Niraj Tolia Mahadev Satyanarayanan Intel Research Pittsburgh, University of Michigan, Carnegie Mellon University

description

Data Staging on Untrusted Surrogates. Jason Flinn Shafeeq Sinnamohideen Niraj Tolia Mahadev Satyanarayanan Intel Research Pittsburgh, University of Michigan, Carnegie Mellon University. Mobile Data Access: Expectation vs. Reality. Mobile computers increasingly connected - PowerPoint PPT Presentation

Transcript of Data Staging on Untrusted Surrogates

Page 1: Data Staging on Untrusted Surrogates

Data Staging on Untrusted Surrogates

Jason Flinn

Shafeeq Sinnamohideen

Niraj Tolia

Mahadev Satyanarayanan

Intel Research Pittsburgh, University of Michigan,

Carnegie Mellon University

Page 2: Data Staging on Untrusted Surrogates

Mobile Data Access: Expectation vs. Reality

Mobile computers increasingly connected expectation of ubiquitous data access distributed file systems can help

Does reality match expectations? Size, weight, energy constaints Less storage, processing power, etc.

How to match reality and expectations?Use untrusted, unmanaged infrastructure!

Page 3: Data Staging on Untrusted Surrogates

Problem: Limited Storage

Latency often the real performance-killer File systems: many sequential RPCs

Network latency not improving (much)!

What if one can’t cache all files of interest? Borrow storage from nearby surrogate Use as a “L2 file cache”

ClientSurrogate File server

Page 4: Data Staging on Untrusted Surrogates

Problem: Limited Battery Energy

File system consumes a lot of energy: Network communication Storage (disk spin-ups, reads, writes)

Surrogate helps preserve client battery Use surrogate cache to avoid disk spin-ups Prefetch updates to surrogate, not client

Page 5: Data Staging on Untrusted Surrogates

Problem: Limited Bandwidth

How to fetch large updates in a short window? Example: passing through airport gate 11 Mbps (or more) local wireless bandwidth Wide-area Internet bandwidth often less

InfoStation (Wu, Badrinath, et al.) Cache updates before mobile user arrives Blast data as user passes through cell

Surrogate: mechanism for caching file data.

Page 6: Data Staging on Untrusted Surrogates

Location, Location, Location

Requirement: surrogate located near the client! Must be opportunistic (use what’s there)

Vision: surrogates ubiquitously deployed Computers getting ever cheaper Already 802.11b wireless networks in cafes Can’t trust or assume good behavior!

Page 7: Data Staging on Untrusted Surrogates

Outline

Motivation Architecture and design Implementation Evaluation Related work and conclusions

Page 8: Data Staging on Untrusted Surrogates

Data Staging Architecture

Surrogate

Data Pump

StagingServer

Modifications &Unstaged reads

file

s

Encrypted files

Sta

ged

read

s

File key

s and hash

es

(via sec

ure ch

annel)

FileClient

Desktop

Pro

xy

FileServer

FileClient

Wimpy ClientServerHigh Latency

CodaCoda

Coda

File system traffic

Page 9: Data Staging on Untrusted Surrogates

Trust (or Lack Thereof)

Trusted: client, file server, desktop, file system

Untrusted: surrogate, network

How to deal with untrusted surrogate? End-to-end encryption (privacy) Cryptographic hashes (authenticity) Read-only data (can’t “lose” updates) Monitor performance (mitigate DoS)

Page 10: Data Staging on Untrusted Surrogates

Ease of Management

Can’t require a system administrator!

1. Build on commodity software Apache with Perl scripts (643 LoC)

2. No long-term state OK to trip over power cord!

3. Allow file system diversity Minimalist API Currently support Coda and NFS

Page 11: Data Staging on Untrusted Surrogates

Surrogate API

Register() Get lease, quota for surrogate

Renew() Renew a lease

Deregister() Explicitly stop using surrogate

Stage() Put data on the surrogate

Unstage() Remove data from surrogate

Get() Retrieve data from surrogate

Page 12: Data Staging on Untrusted Surrogates

Which Files to Stage?

Must predict the files most likely to be accessed

Prediction orthogonal to data staging Client proxy has hooks for prediction code Hoarding: user manually specifies files, dirs Clustering: per-activity LRU caching

ManualCopy

CodaHoarding

User-DrivenClustering SEER

LessTransparent

MoreTransparent

Page 13: Data Staging on Untrusted Surrogates

Client Proxy Data Structures

Client proxy final arbiter of validity

For each staged file, maintains: Valid bit Data length Encryption key and secure hash

File id Valid? Length Key Hash

0x3fdc Yes 32,558 0xeabc… 0xea67…

0x3fe6 No 23,458 0xabc3… 0x7345…

Page 14: Data Staging on Untrusted Surrogates

Staging Data

Client proxy sends list of files to data pump

For each file, data pump:

1. Reads file and attributes from file system

2. Encrypts file, generates hash over data

3. Sends encrypted data to surrogate

4. Sends key, hash, length to client

Staging asynchronous with client file accesses If file staged, client gets it from surrogate Otherwise, gets it from file server

Page 15: Data Staging on Untrusted Surrogates

Outline

Motivation Architecture and design Implementation Evaluation Related work and conclusions

Page 16: Data Staging on Untrusted Surrogates

Experimental Setup

Coda fileserver

Ethernet

Client: IPAQ 385064 MB Coda cache

802.11bWirelessAccess Point

30 msdelay

Surrogate

Cold cache: no data on client or surrogateWarm cache: data initially on client and surrogate

Page 17: Data Staging on Untrusted Surrogates

Benchmark: Image Trace

Record accesses to digital photo library in Coda Take the first 10,148 accesses 150 MB unique data, 401 MB total data read Replay trace as fast as possible (DFSTrace)

Variables: Wastage ratio: extra data prefetched Miss ratio: amount of data never prefetched Assume wastage ratio 33%, miss ratio 0% Then do sensitivity analysis

Page 18: Data Staging on Untrusted Surrogates

Baseline Image Results

0

200

400

600

800

1000

1200

1400

1600

Cold Warm

Tim

e (

se

co

nd

s)

No Staging

Staging

Staging reduces execution time 45-48%!

Page 19: Data Staging on Untrusted Surrogates

Sensitivity Analysis

0

400

800

1200

1600

0 0.25 0.5 0.75 1

Wastage Ratio

Tim

e (

se

co

nd

s)

0

400

800

1200

1600

0 0.25 0.5 0.75 1

Miss Ratio

Tim

e (

se

co

nd

s)

Higher miss ratio has relatively greater effect

Page 20: Data Staging on Untrusted Surrogates

Longer-Duration File Traces

Used Mummert’s Coda file system traces Traces of client activity (open, mkdir, etc.) Duration: 16-55 hours Working set size: 57-254 MB

Methodology: Keep inter-request delays when prefetching Eliminate delays afterwards

Page 21: Data Staging on Untrusted Surrogates

File Trace Results

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Purcell Messiaen Robin Berlioz

Cu

mu

lati

ve D

elay

(se

con

ds)

Cold / No Staging

Cold / Staging

Warm / No Staging

Warm / Staging

Up to 48% reduction in cumulative file access delay

Page 22: Data Staging on Untrusted Surrogates

Request Latency Breakdown

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250Operation Latency (ms)

Cu

mu

lati

ve

Fra

cti

on Warm / Staging

Cold / Staging

Warm / No Staging

Cold / No Staging

Page 23: Data Staging on Untrusted Surrogates

Related Work

Web Caching (Akamai, Squid) Different data access patterns, consistency

Fluid Replication (Kim02) Assume more trust and management

OceanStore (Kubiatowicz02) Staging minimalist, file-system agnostic

Builds on work in file prefetching, InfoStations

Page 24: Data Staging on Untrusted Surrogates

Conclusion

Possible to significantly improve distributed file system performance with untrusted, unmanaged infrastructure!

Future work: Grow set of supported file systems Surrogate discovery and migration Support for energy-awareness

http://info.pittsburgh.intel-research.net