CS6601 DISTRIBUTED SYSTEMS

47
CS6601 DISTRIBUTED SYSTEMS UNIT III Dr.A.Kathirvel, Professor, Computer Science and Engg. M N M Jain Engineering College, Chennai

Transcript of CS6601 DISTRIBUTED SYSTEMS

Page 1: CS6601 DISTRIBUTED SYSTEMS

CS6601 DISTRIBUTED SYSTEMS

UNIT – III

Dr.A.Kathirvel, Professor, Computer Science and Engg.

M N M Jain Engineering College, Chennai

Page 2: CS6601 DISTRIBUTED SYSTEMS

Unit - III

PEER TO PEER SERVICES AND FILE

SYSTEM

Peer-to-peer Systems – Introduction – Napster and its legacy – Peer-to-

peer – Middleware – Routing overlays. Overlay case studies: Pastry,

Tapestry- Distributed File Systems –Introduction – File service

architecture – Andrew File system. File System: Features-File model

-File accessing models – File sharing semantics Naming: Identifiers,

Addresses, Name Resolution – Name Space

Implementation – Name Caches – LDAP.

Coulouris, Dollimore, Kindberg and Blair

Distributed Systems:Concepts and Design Edition 5, Addison-Wesley 2012

Page 3: CS6601 DISTRIBUTED SYSTEMS

Fig10.1: Distinctions between IP and overlay routing for peer-to-peer applications

Peer-to-Peer Systems

3

Page 4: CS6601 DISTRIBUTED SYSTEMS

Fig10.2: Napster: peer-to-peer file sharing

with a centralized, replicated index

Napster server

Index1. File location

2. List of peers

request

offering the file

peers

3. File request

4. File delivered5. Index update

Napster server

Index

4

Page 5: CS6601 DISTRIBUTED SYSTEMS

Fig10.3: Distribution of information in

a routing overlay

Object:

Node:

D

CÕs routing knowledge

DÕs routing knowledgeAÕs routing knowledge

BÕs routing knowledge

C

A

B

5

Page 6: CS6601 DISTRIBUTED SYSTEMS

Fig10.4: Basic programming interface for a distributed

hash table (DHT) as implemented by the PAST API over

Pastry

put(GUID, data)

The data is stored in replicas at all nodes responsible for the

object identified by GUID.

remove(GUID)

Deletes all references to GUID and the associated data.

value = get(GUID)

The data associated with GUID is retrieved from one of the nodes

responsible it.

6

Page 7: CS6601 DISTRIBUTED SYSTEMS

Fig10.5: Basic programming interface for distributed

object location and routing (DOLR) as implemented by

Tapestry

publish(GUID )

GUID can be computed from the object (or some part of it, e.g. its

name). This function makes the node performing a publish operation

the host for the object corresponding to GUID.

unpublish(GUID)

Makes the object corresponding to GUID inaccessible.

sendToObj(msg, GUID, [n])

Following the object-oriented paradigm, an invocation message is

sent to an object in order to access it. This might be a request to open

a TCP connection for data transfer or to return a message containing

all or part of the object’s state. The final optional parameter [n], if

present, requests the delivery of the same message to n replicas of the

object.

7

Page 8: CS6601 DISTRIBUTED SYSTEMS

Figure 10.6: Circular routing alone is correct but inefficient

The dots depict live nodes. The

space is considered as circular:

node 0 is adjacent to node

(2128-1). The diagram

illustrates the routing of a

message from node 65A1FC to

D46A1C using leaf set

information alone, assuming

leaf sets of size 8 (l = 4). This

is a degenerate type of routing

that would scale very poorly; it

is not used in practice.

0 FFFFF....F (2128-1)

65A1FC

D13DA3

D471F1

D467C4

D46A1C

8

Page 9: CS6601 DISTRIBUTED SYSTEMS

Fig10.7:First four rows of a Pastry routing table

The routing table is located at a node whose GUID begins 65A1. Digits are in hexadecimal. The n’s

represent [GUID, IP address] pairs specifying the next hop to be taken by messages addressed to GUIDs

that match each given prefix. Grey- shaded entries indicate that the prefix matches the current GUID up

to the given value of p: the next row down or the leaf set should be examined to find a route. Although

there are a maximum of 128 rows in the table, only log16 N rows will be populated on average in a

network with N active nodes.

9

Page 10: CS6601 DISTRIBUTED SYSTEMS

Fig10.8: Pastry routing example

0 FFFFF....F (2128-1)

65A1FC

D13DA3

D4213F

D462BA

D471F1

D467C4

D46A1C

Routing a m essage from node 65A1FC to D46A1C.

With the aid of a well-populated routing table the

message can be delivered in ~ log16(N ) hops.

10

Page 11: CS6601 DISTRIBUTED SYSTEMS

Fig10.9: Pastry’s routing algorithm

To handle a message M addressed to a node D (where R[p,i] is the element at column i,

row p of the routing table):

1. If (L-l < D < Ll) { // the destination is within the leaf set or is the current node.

2. Forward M to the element Li of the leaf set with GUID closest to D or the current

node A.

3. } else { // use the routing table to despatch M to a node with a closer GUID

4. find p, the length of the longest common prefix of D and A. and i, the ( p+1)th

hexadecimal digit of D .

5. If (R[p,i] ° null) forward M to R[p,i] // route M to a node with a longer common

prefix.

6. else { // there is no entry in the routing table

7. Forward M to any node in L or R with a common prefix of length i, but a

GUID that is numerically closer.

}

}

11

Page 12: CS6601 DISTRIBUTED SYSTEMS

Fig10.10: Tapestry routing

4228

4377

437A

4361

43FE

4664

4B4F

E791

4A6D

AA9357EC

4378PhilÕsBooks

4378PhilÕsBooks

(Root for 4378)

publish path

Tapestry routingsfor 4377

Location mappingfor 4378

Routes actually taken by send(4378)

Replicas of the file PhilÕs Books (G=4378) are hosted at nodes 4228 and AA93. Node 4377 is the root node

for object 4378. The Tapestry routings shown are some of the entries in routing tables. The publish paths show

routes followed by the publish m essages laying down cached location mappings for object 4378. The location

mappings are subsequently used to route messages sent to 4378.

12

Page 13: CS6601 DISTRIBUTED SYSTEMS

Fig10.11:Structured vs unstructured

peer-to-peer systems

13

Page 14: CS6601 DISTRIBUTED SYSTEMS

Fig10.12:Key elements in the Gnutella protocol

14

Page 15: CS6601 DISTRIBUTED SYSTEMS

Fig10.13:Storage organization of OceanStore

objects

d1 d2 d3 d5 d4

root block

version i indirection blocks

d2

version i+1

d1 d3

certificate VGUID of current

version

VGUID of

version i

AGUID

VGUID of version i-1

data blocks

BG

UID

(co

py o

n w

rite

)

Version i+1 has been updated in blocks d1,

d2 and d3. The certificate and the root

blocks include some metadata not shown.

All unlabelled arrows are BGUIDs.

15

Page 16: CS6601 DISTRIBUTED SYSTEMS

Fig10.14:Types of identifier used in

OceanStore

Name Meaning Description

BGUID block GUID Secure hash of a data block

VGUID version GUID BGUID of the root block of a version

AGUID active GUID Uniquely identifies all the versions of an object

16

Page 17: CS6601 DISTRIBUTED SYSTEMS

Fig10.15: Performance evaluation of

the Pond prototype emulating NFS

LAN WAN Predominant operations in

benchmark Phase Linux NFS Pond Linux NFS Pond

1 0.0 1.9 0.9 2.8 Read and write

2 0.3 11.0 9.4 16.8 Read and write

3 1.1 1.8 8.3 1.8 Read

4 0.5 1.5 6.9 1.5 Read

5 2.6 21.0 21.5 32.0 Read and write

Total 4.5 37.2 47.0 54.9

17

Page 18: CS6601 DISTRIBUTED SYSTEMS

Fig10.16: Ivy system architecture

DHash server

ModifledNFS Client

module

Ivy server DHash server

Application

Kernel

Ivy node

DHash server

DHash server

DHash server

Application

18

Page 19: CS6601 DISTRIBUTED SYSTEMS

Fig12.1 Storage systems and their properties

Sharing Persis- tence

Distributed cache/replicas

Consistency maintenance

Example

Main memory RAM

File system UNIX file system

Distributed file system Sun NFS

Web Web server

Distributed shared memory Ivy (DSM, Ch. 18)

Remote objects (RMI/ORB) CORBA

Persistent object store 1 CORBA Persistent Object Service

Peer-to-peer storage

system

OceanStore (Ch. 10)

1

1

1

2

Types of consistency:

1: strict one-copy. : slightly weaker guarantees. 2: considerably weaker guarantees.

Distributed File Systems

19

Page 20: CS6601 DISTRIBUTED SYSTEMS

Fig12.2:File system modules

Directory module: relates file names to file IDs

File m odule: relates file IDs to particular files

Access control module: checks permission for operation requested

File access module: reads or writes file data or attributes

Block module: accesses and allocates disk blocks

Device module: disk I/O and buffering

20

Page 21: CS6601 DISTRIBUTED SYSTEMS

Fig12.3:File attribute record structure

File length

Creation timestamp

Read timestamp

Write timestamp

Attribute timestamp

Reference count

Owner

File type

Access control list

21

Page 22: CS6601 DISTRIBUTED SYSTEMS

Fig12.4:UNIX file system operations

filedes = open(name, mode)

filedes = creat(name, mode)

Opens an existing file with the given name.

Creates a new file with the given name.

Both operations deliver a file descriptor referencing the open

file. The mode is read, write or both.

status = close(filedes) Closes the open file filedes.

count = read(filedes, buffer, n)

count = write(filedes, buffer, n)

Transfers n bytes from the file referenced by filedes to buffer.

Transfers n bytes to the file referenced by filedes from buffer.

Both operations deliver the number of bytes actually transferred

and advance the read-write pointer.

pos = lseek(filedes, offset,

whence)

Moves the read-write pointer to offset (relative or absolute,

depending on whence).

status = unlink(name) Removes the file name from the directory structure. If the file

has no other names, it is deleted.

status = link(name1, name2) Adds a new name (name2) for a file (name1).

status = stat(name, buffer) Gets the file attributes for file name into buffer.

22

Page 23: CS6601 DISTRIBUTED SYSTEMS

Fig12.5:File service architecture

Client computer Server computer

Application

program

Application

program

Client module

Flat file service

Directory service

23

Page 24: CS6601 DISTRIBUTED SYSTEMS

Fig12.6:Flat file service operations

Read(FileId, i, n) -> Data

— throws BadPosition

If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items

from a file starting at item i and returns it in Data.

Write(FileId, i, Data)

— throws BadPosition

If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a

file, starting at item i, extending the file if necessary.

Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.

Delete(FileId) Removes the file from the file store.

GetAttributes(FileId) -> Attr Returns the file attributes for the file.

SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not

shaded in Figure 12.3).

24

Page 25: CS6601 DISTRIBUTED SYSTEMS

Fig12.7:Directory service operations

Lookup(Dir, Name) -> FileId

— throws NotFound

Locates the text name in the directory and returns the

relevant UFID. If Name is not in the directory, throws an

exception.

AddName(Dir, Name, FileId)

— throws NameDuplicate

If Name is not in the directory, adds (Name, File) to the

directory and updates the file’s attribute record.

If Name is already in the directory: throws an exception.

UnName(Dir, Name)

— throws NotFound

If Name is in the directory: the entry containing Name is

removed from the directory.

If Name is not in the directory: throws an exception.

GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the

regular expression Pattern.

25

Page 26: CS6601 DISTRIBUTED SYSTEMS

Fig12.8:NFS architecture

UNIX kernel

protocol

Client computer Server computer

system calls

Local Remote

UNIX

file

system

NFS

client

NFS

server

UNIX

file

system

Application

program

Application

program

NFS

UNIX

UNIX kernel

Virtual file system Virtual file system

Oth

er

fil

e sy

stem

26

Page 27: CS6601 DISTRIBUTED SYSTEMS

Fig12.9:NFS server operations–1 lookup(dirfh, name) -> fh, attr Returns file handle and attributes for the file name in the directory

dirfh.

create(dirfh, name, attr) ->

newfh, attr

Creates a new file name in directory dirfh with attributes attr and

returns the new file handle and attributes.

remove(dirfh, name)

status

Removes file name from directory dirfh.

getattr(fh) -> attr Returns file attributes of file fh. (Similar to the UNIX stat system

call.)

setattr(fh, attr) -> attr Sets the attributes (mode, user id, group id, size, access time and

modify time of a file). Setting the size to 0 truncates the file.

read(fh, offset, count) -> attr, data Returns up to count bytes of data from a file starting at offset.

Also returns the latest attributes of the file.

write(fh, offset, count, data) -> attr Writes count bytes of data to a file starting at offset. Returns the

attributes of the file after the write has taken place.

rename(dirfh, name, todirfh, toname)

-> Status

Changes the name of file name in directory dirfh to toname in

directory to todirfh .

link(newdirfh, newname, dirfh, name)

-> status

Creates an entry newname in the directory newdirfh which refers to

file name in the directory dirfh.

Continues on next slide ...

27

Page 28: CS6601 DISTRIBUTED SYSTEMS

Fig12.9:NFS server operations–2

symlink(newdirfh, newname, string)

-> status

Creates an entry newname in the directory newdirfh of type

symbolic link with the value string. The server does not interpret

the string but makes a symbolic link file to hold it.

readlink(fh) -> string Returns the string that is associated with the symbolic link file

identified by fh.

mkdir(dirfh, name, attr) ->

newfh, attr

Creates a new directory name with attributes attr and returns the

new file handle and attributes.

rmdir(dirfh, name) -> status Removes the empty directory name from the parent directory dirfh.

Fails if the directory is not empty.

readdir(dirfh, cookie, count) ->

entries Returns up to count bytes of directory entries from the directory

dirfh. Each entry contains a file name, a file handle, and an opaque

pointer to the next directory entry, called a cookie. The cookie is

used in subsequent readdir calls to start reading from the following

entry. If the value of cookie is 0, reads from the first entry in the

directory.

statfs(fh) -> fsstats Returns file system information (such as block size, number of

free blocks and so on) for the file system containing a file fh.

28

Page 29: CS6601 DISTRIBUTED SYSTEMS

Fig12.10:Local and remote file

systems accessible on an NFS client

jim jane joeann

userss tudents

usrvm unix

Client Server 2

. . . nfs

Remote

mounts taff

big bobjon

people

Server 1

export

(root)

Remote

mount

. . .

x

(root) (root)

Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in

Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server

2.

29

Page 30: CS6601 DISTRIBUTED SYSTEMS

Fig12.11:Distribution of processes in

the Andrew File System

Venus

Workstations Servers

Venus

VenusUserprogram

Network

UNIX kernel

UNIX kernel

Vice

Userprogram

Userprogram

ViceUNIX kernel

UNIX kernel

UNIX kernel

30

Page 31: CS6601 DISTRIBUTED SYSTEMS

Fig12.12:File name space seen by

clients of AFS

/ (root)

tmp bin cmuvm unix. . .

bin

SharedLocal

Symbolic

links

31

Page 32: CS6601 DISTRIBUTED SYSTEMS

Fig12.13:System call interception in AFS

UNIX filesys tem calls

Non-local fileoperations

Workstation

Localdisk

Userprogram

UNIX kernel

Venus

UNIX file sys tem

Venus

32

Page 33: CS6601 DISTRIBUTED SYSTEMS

Fig12.14:Implementation of file system

calls in AFS User process UNIX kernel Venus Net Vice

open(FileNam e,m ode)

IfFileNam e refers t o afile in shared file space ,pass the request toVenus.

Open the local f ile andret urn the filedesc ript or to t heapplicat ion.

Check list of f iles inloca l cache. If notpresent or t here is novalid callback prom ise,send a request for t hefile t o the Vice se rve r

t hat is custodian of t hevolume cont aining t hefile.

P lace the copy of t hefile in t he loca l filesyst em, ente r it s localname in the local cache

list and ret urn the localname to U NIX.

Transfer a copy of the

file and acallbackprom ise t o theworkst at ion. Log t heca llback promise.

read(FileDescriptor,Buffer, length)

P erform a normalUN IX read ope rat ion

on t he loca l copy.

write (FileD escriptor,Buffer, length)

P erform a normal

UN IX writ e operat ionon t he loca l copy.

close(FileDescriptor) Close the local copyand notify Venus tha tt he file has been closed. If t he local copy has

been changed, send acopy t o t he Vice server

t hat is t he custodian oft he file.

Replace t he filecont ent s and send acallback t o all ot herclients holdingcallbackprom ise s on t he file.

33

Page 34: CS6601 DISTRIBUTED SYSTEMS

Fig12.15:The main components of the

Vice service interface

Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file

identified by the fid and records a callback promise on it.

Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified

file.

Create() -> fid Creates a new file and records a callback promise on it.

Remove(fid) Deletes the specified file.

SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the

lock may be shared or exclusive. Locks that are not removed

expire after 30 minutes.

ReleaseLock(fid) Unlocks the specified file or directory.

RemoveCallback(fid) Informs server that a Venus process has flushed a file from its

cache.

BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels

the callback promise on the relevant file.

34

Page 35: CS6601 DISTRIBUTED SYSTEMS

Fig13.1Composed naming domains used to access a resource from a URL

http://www.cdk3.net:8888/WebExamples/earth.html

URL

Resource ID (IP number, port number, pathname)

Network address

2:60:8c:2:b0:5a file

Web server

55.55.55.55 WebExamples/earth.html 8888

DNS lookup

Socket

http://www.cdk5.net:8888/WebExamples/earth.html

2:60:8c:2:b0:5a

Name Services

35

Page 36: CS6601 DISTRIBUTED SYSTEMS

Fig13.2:Iterative navigation

Client 1

2

3

A client iteratively contacts name servers NS1–NS3 in order to

resolve a name

NS2

NS1

NS3

Name servers

36

Page 37: CS6601 DISTRIBUTED SYSTEMS

Fig13.3:Non-recursive and recursive

server-controlled navigation

1

2

3

5

1

2

3 4

4

A name server NS1 communicates with other name servers on

behalf of a client

client client

Recursive server-controlled

NS2

NS1

NS3

NS2

NS1

NS3

Non-recursive server-controlled

37

Page 38: CS6601 DISTRIBUTED SYSTEMS

Fig13.4:DNS name servers

Note: Name server names

are in italics, and the

corresponding domains

are in parentheses.

Arrows denote name

server entries

a.root-servers.net (root)

ns0.ja.net (ac.uk)

dns0.dcs.qmul.ac.uk (dcs.qmul.ac.uk)

alpha.qmul.ac.uk (qmul.ac.uk)

dns0-doc.ic.ac.uk (ic.ac.uk)

ns.purdue.edu (purdue.edu)

uk purdue.edu

ic.ac.uk

qmul.ac.uk

dcs.qmul.ac.uk *.qmul.ac.uk

*.ic.ac.uk *.dcs.qmwul.ac.uk

* .purdue.edu

ns1.nic.uk (uk)

ac.uk

co.uk

yahoo.com

38

Page 39: CS6601 DISTRIBUTED SYSTEMS

Fig13.5:DNS resource records

Record type Meaning Main contents

A A computer address IP number

NS An authoritative name server Domain name for server

CNAME The canonical name for an alias Domain name for alias

SOA Marks the start of data for a zone Parameters governing the zone

WKS A well-known service description List of service names and protocols

PTR Domain name pointer (reverse lookups)

Domain name

HINFO Host information Machine architecture and operating system

MX Mail exchange List of < preference, host > pairs

TXT Text string Arbitrary text

39

Page 40: CS6601 DISTRIBUTED SYSTEMS

Fig13.6:DNS zone data records

domain name time to live class type value

1D IN NS dns0

1D IN NS dns1

1D IN NS cancer.ucs.ed.ac.uk

1D IN MX 1 mail1.qmul.ac.uk

1D IN MX 2 mail2.qmul.ac.uk

domain name ti m e to l i ve class type value

www 1D IN CNAME apr i cot

apr i cot 1D IN A 138.37.88.248

dcs 1D IN NS dns0.dcs

dns0.dc s 1D IN A 138.37.88.249

dcs 1D IN NS dns1.dcs

dns1.dc s 1D IN A 138.37.94.248

dcs.qmul.ac.uk

dcs.qmul.ac.uk

dcs.qmul.ac.uk

dcs.qmul.ac.uk

dcs.qmul.ac.uk

40

Page 41: CS6601 DISTRIBUTED SYSTEMS

Fig13.7:GNS directory tree and value

tree for user Peter.Smith

UK FR

AC

QMW DI: 322

Peter.Smith

password mailboxes

DI: 599 (EC)

DI: 574 DI: 543

DI: 437

Alpha Gamma Beta

41

Page 42: CS6601 DISTRIBUTED SYSTEMS

Fig13.8:Merging trees under a new root

EC

UK FR

DI: 599

DI: 574 DI: 543

NORTH AMERICA

US

DI: 642

DI: 457 DI: 732

#599 = #633/EC #642 = #633/NORTH AMERICA

Well-known directories:

CANADA

DI: 633 (WORLD)

42

Page 43: CS6601 DISTRIBUTED SYSTEMS

Fig13.9:Restructuring the directory

EC

UK FR

DI: 599

DI: 574 DI: 543

NORTH AMERICA

US

DI: 642

DI: 457 DI: 732 CANADA

DI: 633 (WORLD)

#633/EC/US

US

#599 = #633/EC #642 = #633/NORTH AMERICA

Well-known directories:

43

Page 44: CS6601 DISTRIBUTED SYSTEMS

Fig13.10:X.500 service architecture

DSA

DSA

DSA

DSA

DSA DSA DUA

DUA

DUA

44

Page 45: CS6601 DISTRIBUTED SYSTEMS

Fig13.11:Part of the X.500 Directory

Information Tree

... France (country) Great Britain (country) Greece (country) ...

BT Plc (organization) University of Gormenghast (organization) ... ...

Department of Computer Science (organizationalUnit)

Computing Service (organizationalUnit)

Engineering Department (organizationalUnit)

...

...

X.500 Service (root)

Departmental Staff (organizationalUnit)

Research Students (organizationalUnit)

ely (applicationProcess)

...

...

Alice Flintstone (person) Pat King (person) James Healey (person) ... ... Janet Papworth (person) ...

45

Page 46: CS6601 DISTRIBUTED SYSTEMS

Fig13.12:An X.500 DIB Entry

info

Alice Flintstone, Departmental Staff, Department of Computer Science, University of Gormenghast, GB

commonName

Alice.L.Flintstone Alice.Flintstone Alice Flintstone A. Flintstone

surname

Flintstone

telephoneNumber

+44 986 33 4604

uid

alf

mail

[email protected]

[email protected]

roomNumber

Z42

userClass

Research Fellow

46

Page 47: CS6601 DISTRIBUTED SYSTEMS

Questions?