Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks...

87
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected] http://www.ecse.rpi.edu/Homepages/shivkuma Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung- Chang Hsiao, Chung-Ta King

description

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 P2P: Key Idea q Share the content, storage and bandwidth of individual (home) users Internet

Transcript of Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks...

Page 1: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

Peer-to-Peer (P2P) and Sensor Networks

Shivkumar KalyanaramanRensselaer Polytechnic Institute

[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma

Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung-Chang Hsiao, Chung-Ta King

Page 2: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

P2P networks: Napster, Gnutella, Kazaa Distributed Hash Tables (DHTs) Database perspectives: data-centricity, data-

independence Sensor networks and its connection to P2P

Overview

Page 3: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

P2P: Key Idea Share the content, storage and bandwidth of

individual (home) users

Internet

Page 4: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Page 5: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

Page 6: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

What is P2P (Peer-to-Peer)? P2P as a mindset

Slashdot P2P as a model

Gnutella P2P as an implementation choice

Application-layer multicast P2P as an inherent property

Ad-hoc networks

Page 7: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

P2P Application Taxonomy

P2P Systems

Distributed ComputingSETI@home

File SharingGnutella

CollaborationJabber

PlatformsJXTA

Page 8: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

How to Find an Object in a Network?

Network

Page 9: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

A Straightforward Idea

Use a BIG server

Store the object

Provide a directoryNetworkHow to do it in

a distributed way?

Page 10: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Why Distributed? Client-server model:

Client is dumb Server does most things (compute, store, control) Centralization makes things simple, but introduces

Single point of failure, performance bottleneck, tighter control, access fee and manage cost, …

ad hoc participation? Estimate of net PCs

10 billions of Mhz CPUs 10000 terabytes of storage

Clients are not that dumb after all Use the resources in the clients (at net edges)

Page 11: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Page 12: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

Page 13: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

First Idea: Napster Distributing objects, centralizing directory:

Network

Page 14: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Page 15: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

Page 16: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

Page 17: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

Today: P2P Video traffic is dominant

Source: cachelogic; Video, bittorrent, edonkey !

Page 18: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

40-60%+ P2P traffic

Page 19: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

2006 p2p Data Between 50 and 65 percent of all download traffic is P2P related.

Between 75 and 90 percent of all upload traffic is P2P related. And it seems that more people are using p2p today

In 2004 1 CacheLogic-server registered 3 million IP-addresses in 30 daysIn 2006 1 CacheLogic-server registered 3 million IP-addresses in 8 days

So what do people download? 61,4 percent video

11,3 percent audio27,2 percent is games/software/etc.

The average filesize of shared files is 1 gigabyte! Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/

Page 20: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

Page 21: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

A More Aggressive Idea Distributing objects and directory:

Network

How to findobjects w/odirectory?Blind

flooding!

Page 22: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Page 23: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Gnutella Distribute file location Idea: flood the request Hot to find a file:

Send request to all neighbors Neighbors recursively multicast the request Eventually a machine that has the file receives the request,

and it sends back the answer Advantages:

Totally decentralized, highly robust Disadvantages:

Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)

Page 24: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Ad-hoc topology Queries are flooded for bounded number of hops No guarantees on recall

Gnutella: Unstructured P2P

Query: “xyz”

xyz

xyz

Page 25: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

Now Bittorrent & Edonkey2000! (2006)

Page 26: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

Lessons and Limitations Client-Server performs well

But not always feasible Ideal performance is often not the key issue!

Things that flood-based systems do well Organic scaling Decentralization of visibility and liability Finding popular stuff Fancy local queries

Things that flood-based systems do poorly Finding unpopular stuff [Loo, et al VLDB 04] Fancy distributed queries Vulnerabilities: data poisoning, tracking, etc. Guarantees about anything (answer quality, privacy, etc.)

Page 27: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Detour …. Bittorrent

Page 28: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Page 29: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Page 30: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

new leecher

BitTorrent – joining a torrent

Peers divided into: seeds: have the entire file leechers: still downloading

datarequest

peer list

metadata file

join

1

2 3

4seed/leecher

website

tracker

1. obtain the metadata file2. contact the tracker 3. obtain a peer list (contains seeds & leechers)4. contact peers from that list for data

Page 31: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

!

BitTorrent – exchanging data

I have leecher A

● Verify pieces using hashes● Download sub-pieces in parallel● Advertise received pieces to the entire peer list● Look for the rarest pieces

seed

leecher B

leecher C

Page 32: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

BitTorrent - unchoking

leecher A

seed

leecher B

leecher Cleecher D

● Periodically calculate data-receiving rates● Upload to (unchoke) the fastest downloaders● Optimistic unchoking ▪ periodically select a peer at random and upload to it ▪ continuously look for the fastest partners

Page 33: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

End of Detour ….

Page 34: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

34

Back to … P2P Structures Unstructured P2P architecture

Napster, Gnutella, Freenet No “logically” deterministic structures to organize the

participating peers No guarantee objects be found

How to find objects within some no. of hops? Extend hashing

Structured P2P architecture CAN, Chord, Pastry, Tapestry, Tornado, … Viewed as a distributed hash table for directory

Page 35: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

35

How to Bound Search Quality? Many ideas …, again

Network

Work onplacement!

Page 36: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

36

High-Level Idea: Indirection Indirection in space

Logical (content-based) IDs, routing to those IDs “Content-addressable” network

Tolerant of churn nodes joining and leaving the network

Indirection in time Want some scheme to temporally decouple send and receive Persistence required. Typical Internet solution: soft state

Combo of persistence via storage and via retry “Publisher” requests TTL on storage Republishes as needed

Metaphor: Distributed Hash Table

to hz

h=z

Page 37: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

37

Basic Idea

Hash key

Object “y”

Objects have hash keys

Peer “x”Peer nodes also have hash keys in the same hash space

P2P Network

y xH(y) H(x)

Join (H(x))Publish (H(y))

Place object to the peer with closest hash keys

Page 38: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

38

Distributed Hash Tables (DHTs) Abstraction: a distributed hash-table data structure

insert(id, item); item = query(id); (or lookup(id);) Note: item can be anything: a data object, document, file, pointer to

a file… Proposals

CAN, Chord, Kademlia, Pastry, Tapestry, etc Goals:

Make sure that an item (file) identified is always found Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes

Page 39: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

39

Viewed as a Distributed Hash Table

Hashtable

0 2128-1

Peernode

Each is responsible for a range of the hash table,according to the peer hash key

Objects are placed in the peer with the closest keyNote thatpeers areInternetedges

Internet

Page 40: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

40

How to Find an Object?

Hashtable

0 2128-1

Peernode

Simplest idea:Everyone knows everyone else!

one hop tofind the objectWant to keep only

a few entries!

Page 41: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

41

Distributed Hash Tables (DHTs) Hash table interface: put(key,item), get(key) O(log n) hops Guarantees on recall

Structured Networks

K I

K I

K I

K I

K I

K I

K I

K I

K I

put(K1,I1)

(K1,I1)

get (K1)

I1

Page 42: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

42

Content Addressable Network, CAN Distributed hash table Hash table as in a Cartesian coordinate space A peer only needs to know its logical neighbors Dimensional-ordered multihop routing

Hashtable

0 2128-1

Peernode

Page 43: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

43

Content Addressable Network (CAN)

Associate to each node and item a unique id in an d-dimensional Cartesian space on a d-torus

Properties Routing table size O(d) Guarantees that a file is

found in at most d*n1/d steps, where n is the total number of nodes

Page 44: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

44

CAN Example: Two Dimensional Space

Space divided between nodes All nodes cover the entire space Each node covers either a

square or a rectangular area of ratios 1:2 or 2:1

Example: Node n1:(1, 2) first node that

joins cover the entire space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1

Page 45: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

45

CAN Example: Two Dimensional Space

Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

Page 46: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

46

CAN Example: Two Dimensional Space

Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3

Page 47: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

47

CAN Example: Two Dimensional Space

Nodes n4:(5, 5) and n5:(6,6) join

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

Page 48: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

48

CAN Example: Two Dimensional Space

Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)

Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Page 49: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

49

CAN Example: Two Dimensional Space

Each item is stored by the node who owns its mapping in the space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Page 50: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

50

CAN: Query Example

Each node knows its neighbors in the d-space

Forward query to the neighbor that is closest to the query id

Example: assume n1 queries f4 Can route around some failures

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Page 51: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

51

Another Design: Chord Node and object keys:

random location around a circle

Neighbors: nodes 2-i around the circle found by routing to desired key

Routing: greedy pick nbr closest to destination

Storage: “own” interval node owns key range between

her key and previous node’s key

Ownershiprange

Page 52: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

52

OpenDHT A shared DHT service

The Bamboo DHT Hosted on PlanetLab Simple RPC API You don’t need to deploy

or host to play with a real DHT!

Page 53: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

53

Review: DHTs vs Unstructured P2P DHTs good at:

exact match for “rare” items DHTs bad at:

keyword search, etc. [can’t construct DHT-based Google] tolerating extreme churn

Gnutella etc. (unstructured P2P) good at: general search finding common objects very dynamic environments

Gnutella etc. bad at: finding “rare” items

Page 54: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

54

Distributed Systems Pre-Internet Connected by LANs (low loss and delay)

Small scale (10s, maybe 100s per server)

PODC literature focused on algorithms to achieve strict semantics in the face of failures Two-phase commits Synchronization Byzantine agreement Etc.

Page 55: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

55

Distributed Systems Post-Internet Very different context:

Huge scales (thousands if not millions) Highly variable connectivity Failures common Organic growth

Abandoned distributed strict semantics Adaptive apps rather than “guaranteed” infrastructure

Adopted pairwise client-server approach Server is centralized (even if server farm) Relatively primitive approach (no sophisticated dist.

algms.) Little support from infrastructure or middleware

Page 56: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

56

A Database viewpoint on DHTs…: Towards Data-centricity, Data

Independence

Page 57: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

57

Host-centric Protocols Protocols defined in terms of IP addresses:

Unicast: IP address = hostMulticast: IP address = set of hosts

Destination address is given to protocol

Protocol delivers data from one host to anotherunicast: conceptually trivialmulticast: address is logical, not physical

Page 58: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

58

Host-centric Applications Classic applications: destination is “intrinsic”

telnet: target machine FTP: location of files electronic mail: email address turns into mail server multimedia conferencing: machines of participants

Destination is specified by user (not network) Usually specified by hostname not address

DNS translates names into addresses

Page 59: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

59

Domain Name System (DNS)

DNS is built around recursive delegation Top level domains (TLDs): .com, .net, .edu, etc. TLDs delegate authority to subdomains

berkeley.edu Subdomains can further delegate

cs.berkeley.edu

Hierarchy fits host administrative structure Local decentralized control Crucial to efficient hostname resolution

Page 60: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

60

Modern Web Data-Centricity URLs often function as names of data

users think of www.cnn.com as data, not a host Fact that www.cnn.com is a hostname is irrelevant

Users want data, not access to particular host

The web is now data-centric

Page 61: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

61

Data-centric App in Host-centric World Data still associated with host names (URLs)

administrative structure of data same as hosts weak point in current web

Key enabler: search engines Searchable databases map keywords to URLs Allowed users to find desired data

Networkers focused on technical problems: HTTP, persistence (URNs), replication (CDNs), ...

Page 62: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

62

A DNS for Data? DHTs… Can we map data names into addresses?

a data-centric DNS, distributed and scalable doesn’t alter net protocols, but aids data location not just about stolen music, but a general facility

A formidable challenge: Data does not have a clear administrative hierarchy Likely need to support a flat namespace Can one do this scalably?

Data-centrism requires scalable flat lookups => DHTs

Page 63: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

63

Data Independence In DB Design Decouple app-level API from data organization

Can make changes to data layout without modifying applications

Simple version: location-independent names

Fancier: declarative queries

“As clear a paradigm shift as we can hope to find in computer science”- C. Papadimitriou

Page 64: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

64

The Pillars of Data Independence Indexes

Value-based lookups have to compete with direct access

Must adapt to shifting data distributions

Must guarantee performance

Query Optimization Support declarative queries

beyond lookup/search Must adapt to shifting data

distributions Must adapt to changes in

environment

DBMSB-Tree

Join Ordering, AM Selection, etc.

Page 65: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

65

Generalizing Data Independence A classic “level of indirection” scheme

Indexes are exactly that Complex queries are a richer indirection

The key for data independence: It’s all about rates of change

Hellerstein’s Data Independence Inequality: Data independence matters when

d(environment)/dt >> d(app)/dt

Page 66: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

66

Data Independence in Networks

d(environment)/dt >> d(app)/dt

In databases, the RHS is unusually small This drove the relational database revolution

In extreme networked systems, LHS is unusually high And the applications increasingly complex and data-driven Simple indirections (e.g. local lookaside tables) insufficient

Page 67: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

67

Hierarchical Networks (& Queries) IP

Hierarchical name space (www.vldb.org, 141.12.12.51) Hierarchical routing

Autonomous Systems correlate with name space (though not perfectly)

DNS Hierarchical name space (“clients” + hierarchy of servers) Hierarchical routing w/aggressive caching

13 managed “root servers”

Traditional pros/cons of Hierarchical data mgmt Works well for things aligned with the hierarchy

Esp. physical locality a la Astrolabe Inflexible

No data independence!

Page 68: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

68

The Pillars of Data Independence Indexes

Value-based lookups have to compete with direct access

Must adapt to shifting data distributions

Must guarantee performance

Query Optimization Support declarative queries

beyond lookup/search Must adapt to shifting data

distributions Must adapt to changes in

environment

DBMS P2PB-Tree Content-

Addressable Overlay Networks (DHTs)

Join Ordering, AM Selection, etc.

Multiquery dataflow sharing?

Page 69: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

69

Sensor Networks: The Internet Meets the Environment

Page 70: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

70

Today: Internet meets Mobile Wireless Computing

Computing: smaller, faster Disks: larger size, small

form Communications: wireless

voice, data Multimedia integration:

voice, data, video, games

Samsung Cameraphone w/ camcorder

iPoD: impact of disk size/cost

Blackberry: phone + PDA

SONY PSP: mobile gaming

Page 71: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

71

Tomorrow: Embedded Networked Sensing Apps Micro-sensors, on-board

processing, wireless interfaces feasible at very small scale--can monitor phenomena “up close”

Enables spatially and temporally dense environmental monitoring

Embedded Networked Sensing will reveal previously unobservable phenomena

Seismic Structure response

Contaminant Transport

Marine Microorganisms

Ecosystems, Biocomplexity

Page 72: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

72

Embedded Networked Sensing: Motivation

Imagine: high-rise buildings self-detect structural faults (e.g., weld cracks) schools detect airborn toxins at low concentrations, trace

contaminant transport to source buoys alert swimmers to dangerous bacterial levels earthquake-rubbled building infiltrated with robots and sensors:

locate survivors, evaluate structural damage ecosystems infused with chemical, physical, acoustic, image

sensors to track global change parameters battlefield sprinkled with sensors that identify track friendly/foe

air, ground vehicles, personnel

Page 73: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

73

Embedded Sensor Nets: Enabling Technologies

Embedded Networked

Sensing

Control system w/Small form factorUntethered nodes

ExploitcollaborativeSensing, action

Tightly coupled to physical world

Embed numerous distributed devices to monitor and interact with physical world

Network devices to coordinate and perform higher-level tasks

Exploit spatially/temporally dense, in situ/remote, sensing/actuation

Page 74: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

74

Sensornets Vision:

Many sensing devices with radio and processor Enable fine-grained measurements over large areas Huge potential impact on science, and society

Technical challenges: untethered: power consumption must be limited unattended: robust and self-configuring wireless: ad hoc networking

Page 75: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

75

Similarity w/ P2P Networks Sensornets are inherently data-centric

Users know what data they want, not where it is Estrin, Govindan, Heidemann (2000, etc.)

Centralized database infeasible vast amount of data, constantly being updated small fraction of data will ever be queried sending to single site expends too much energy

Page 76: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

76

Sensor Nets: New Design Themes Self configuring systems that adapt to unpredictable

environment dynamic, messy (hard to model), environments preclude pre-

configured behavior

Leverage data processing inside the network exploit computation near data to reduce communication collaborative signal processing achieve desired global behavior with localized algorithms

(distributed control)

Long-lived, unattended, untethered, low duty cycle systems energy a central concern communication primary consumer of scarce energy resource

Page 77: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

77

From Embedded Sensing to Embedded Control

embedded in unattended “control systems” control network, and act in environment

critical app’s extend beyond sensing to control and actuation transportation, precision agriculture, medical monitoring and

drug delivery, battlefield app’s concerns extend beyond traditional networked systems and

app’s: usability, reliability, safety

need systems architecture to manage interactions current system development: one-off, incrementally tuned,

stove-piped repercussions for piecemeal uncoordinated design: insufficient

longevity, interoperability, safety, robustness, scaling

Page 78: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

78

Why cant we simply adapt Internet protocols, “end to end” architecture?

Internet routes data using IP Addresses in Packets and Lookup tables in routers humans get data by “naming data” to a search

engine many levels of indirection between name and IP

address embedded, energy-constrained (un-tethered,

small-form-factor), unattended systems cant tolerate communication overhead of indirection

special purpose system function(s): don’t need want Internet general purpose functionality designed for elastic applications.

Page 79: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

79

Sample Layered Architecture

Resource constraints call for more tightly integrated layers

Open Question:

What are definingArchitecturalPrinciples?

In-network: Application processing, Data aggregation, Query processing

Adaptive topology, Geo-Routing

MAC, Time, Location

Phy: comm, sensing, actuation, SP

User Queries, External Database

Data dissemination, storage, caching

Page 80: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

80

Coverage measures area coverage: fraction of

area covered by sensors detectability: probability

sensors detect moving objects

node coverage: fraction of sensors covered by other sensors

control: where to add new nodes

for max coverage how to move existing

nodes for max coverage

S

D

x

Given: sensor field (either known sensor locations, or spatial density)

Page 81: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

81

In Network Processing communication expensive

when limited power bandwidth

perform (data) processing in network close to (at) data forward

fused/synthesized results e.g., find max. of data

distributed data, distributed computation

Page 82: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

82

Distributed Representation and Storage

Data Centric Protocols, In-network Processing goal: Interpretation of spatially distributed data (Per-

node processing alone is not enough) network does in-network processing based on

distribution of data Queries automatically directed towards nodes

that maintain relevant/matching data

pattern-triggered data collection Multi-resolution data storage and retrieval Distributed edge/feature detection Index data for easy temporal and spatial

searching Finding global statistics (e.g., distribution)

K V

K VK V

K VK V

K V

K V K VK V

K VK V

Tim

e

Page 83: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

83

Directed Diffusion: Data Centric Routing

Basic idea name data (not nodes) with externally relevant

attributes: data type, time, location of node, SNR, diffuse requests and responses across network using

application driven routing (e.g., geo sensitive or not) support in-network aggregation and processing

data sources publish data, data clients subscribe to data however, all nodes may play both roles

node that aggregates/combines/processes incoming sensor node data becomes a source of new data

node that only publishes when combination of conditions arise, is client for triggering event data

true peer to peer system?

Page 84: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

84

Traditional Approach: Warehousing data extracted from sensors, stored on server Query processing takes place on server

Warehouse

Front-end

Sensor Nodes

Page 85: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

85

Sensor Database System Sensor Database System supports distributed query

processing over sensor network

SensorDB

SensorDB

SensorDB

SensorDB Sensor

DB

SensorDB

SensorDB

SensorDB

Front-end

Sensor Nodes

Page 86: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

86

Sensor Database System Characteristics of a Sensor

Network: Streams of data Uncertain data Large number of nodes Multi-hop network No global knowledge

about the network Node failure and

interference is common Energy is the scarce

resource Limited memory No administration, …

• Can existing database techniques be reused? What are the new problems and solutions? Representing sensor data Representing sensor

queries Processing query

fragments on sensor nodes

Distributing query fragments

Adapting to changing network conditions

Dealing with site and communication failures

Deploying and Managing a sensor database system

Page 87: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Peer-to-Peer (P2P) and Sensor Networks Shivkumar…

Shivkumar KalyanaramanRensselaer Polytechnic Institute

87

Summary

P2P networks: Napster, Gnutella, Kazaa Distributed Hash Tables (DHTs) Database perspectives: data-centricity, data-

independence Sensor networks and its connection to P2P