CS6601 DISTRIBUTED SYSTEMS
-
Upload
anand-institute-of-higher-technology-chennai -
Category
Engineering
-
view
87 -
download
0
Transcript of CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
UNIT – III
Dr.A.Kathirvel, Professor, Computer Science and Engg.
M N M Jain Engineering College, Chennai
Unit - III
PEER TO PEER SERVICES AND FILE
SYSTEM
Peer-to-peer Systems – Introduction – Napster and its legacy – Peer-to-
peer – Middleware – Routing overlays. Overlay case studies: Pastry,
Tapestry- Distributed File Systems –Introduction – File service
architecture – Andrew File system. File System: Features-File model
-File accessing models – File sharing semantics Naming: Identifiers,
Addresses, Name Resolution – Name Space
Implementation – Name Caches – LDAP.
Coulouris, Dollimore, Kindberg and Blair
Distributed Systems:Concepts and Design Edition 5, Addison-Wesley 2012
Fig10.1: Distinctions between IP and overlay routing for peer-to-peer applications
Peer-to-Peer Systems
3
Fig10.2: Napster: peer-to-peer file sharing
with a centralized, replicated index
Napster server
Index1. File location
2. List of peers
request
offering the file
peers
3. File request
4. File delivered5. Index update
Napster server
Index
4
Fig10.3: Distribution of information in
a routing overlay
Object:
Node:
D
CÕs routing knowledge
DÕs routing knowledgeAÕs routing knowledge
BÕs routing knowledge
C
A
B
5
Fig10.4: Basic programming interface for a distributed
hash table (DHT) as implemented by the PAST API over
Pastry
put(GUID, data)
The data is stored in replicas at all nodes responsible for the
object identified by GUID.
remove(GUID)
Deletes all references to GUID and the associated data.
value = get(GUID)
The data associated with GUID is retrieved from one of the nodes
responsible it.
6
Fig10.5: Basic programming interface for distributed
object location and routing (DOLR) as implemented by
Tapestry
publish(GUID )
GUID can be computed from the object (or some part of it, e.g. its
name). This function makes the node performing a publish operation
the host for the object corresponding to GUID.
unpublish(GUID)
Makes the object corresponding to GUID inaccessible.
sendToObj(msg, GUID, [n])
Following the object-oriented paradigm, an invocation message is
sent to an object in order to access it. This might be a request to open
a TCP connection for data transfer or to return a message containing
all or part of the object’s state. The final optional parameter [n], if
present, requests the delivery of the same message to n replicas of the
object.
7
Figure 10.6: Circular routing alone is correct but inefficient
The dots depict live nodes. The
space is considered as circular:
node 0 is adjacent to node
(2128-1). The diagram
illustrates the routing of a
message from node 65A1FC to
D46A1C using leaf set
information alone, assuming
leaf sets of size 8 (l = 4). This
is a degenerate type of routing
that would scale very poorly; it
is not used in practice.
0 FFFFF....F (2128-1)
65A1FC
D13DA3
D471F1
D467C4
D46A1C
8
Fig10.7:First four rows of a Pastry routing table
The routing table is located at a node whose GUID begins 65A1. Digits are in hexadecimal. The n’s
represent [GUID, IP address] pairs specifying the next hop to be taken by messages addressed to GUIDs
that match each given prefix. Grey- shaded entries indicate that the prefix matches the current GUID up
to the given value of p: the next row down or the leaf set should be examined to find a route. Although
there are a maximum of 128 rows in the table, only log16 N rows will be populated on average in a
network with N active nodes.
9
Fig10.8: Pastry routing example
0 FFFFF....F (2128-1)
65A1FC
D13DA3
D4213F
D462BA
D471F1
D467C4
D46A1C
Routing a m essage from node 65A1FC to D46A1C.
With the aid of a well-populated routing table the
message can be delivered in ~ log16(N ) hops.
10
Fig10.9: Pastry’s routing algorithm
To handle a message M addressed to a node D (where R[p,i] is the element at column i,
row p of the routing table):
1. If (L-l < D < Ll) { // the destination is within the leaf set or is the current node.
2. Forward M to the element Li of the leaf set with GUID closest to D or the current
node A.
3. } else { // use the routing table to despatch M to a node with a closer GUID
4. find p, the length of the longest common prefix of D and A. and i, the ( p+1)th
hexadecimal digit of D .
5. If (R[p,i] ° null) forward M to R[p,i] // route M to a node with a longer common
prefix.
6. else { // there is no entry in the routing table
7. Forward M to any node in L or R with a common prefix of length i, but a
GUID that is numerically closer.
}
}
11
Fig10.10: Tapestry routing
4228
4377
437A
4361
43FE
4664
4B4F
E791
4A6D
AA9357EC
4378PhilÕsBooks
4378PhilÕsBooks
(Root for 4378)
publish path
Tapestry routingsfor 4377
Location mappingfor 4378
Routes actually taken by send(4378)
Replicas of the file PhilÕs Books (G=4378) are hosted at nodes 4228 and AA93. Node 4377 is the root node
for object 4378. The Tapestry routings shown are some of the entries in routing tables. The publish paths show
routes followed by the publish m essages laying down cached location mappings for object 4378. The location
mappings are subsequently used to route messages sent to 4378.
12
Fig10.11:Structured vs unstructured
peer-to-peer systems
13
Fig10.12:Key elements in the Gnutella protocol
14
Fig10.13:Storage organization of OceanStore
objects
d1 d2 d3 d5 d4
root block
version i indirection blocks
d2
version i+1
d1 d3
certificate VGUID of current
version
VGUID of
version i
AGUID
VGUID of version i-1
data blocks
BG
UID
(co
py o
n w
rite
)
Version i+1 has been updated in blocks d1,
d2 and d3. The certificate and the root
blocks include some metadata not shown.
All unlabelled arrows are BGUIDs.
15
Fig10.14:Types of identifier used in
OceanStore
Name Meaning Description
BGUID block GUID Secure hash of a data block
VGUID version GUID BGUID of the root block of a version
AGUID active GUID Uniquely identifies all the versions of an object
16
Fig10.15: Performance evaluation of
the Pond prototype emulating NFS
LAN WAN Predominant operations in
benchmark Phase Linux NFS Pond Linux NFS Pond
1 0.0 1.9 0.9 2.8 Read and write
2 0.3 11.0 9.4 16.8 Read and write
3 1.1 1.8 8.3 1.8 Read
4 0.5 1.5 6.9 1.5 Read
5 2.6 21.0 21.5 32.0 Read and write
Total 4.5 37.2 47.0 54.9
17
Fig10.16: Ivy system architecture
DHash server
ModifledNFS Client
module
Ivy server DHash server
Application
Kernel
Ivy node
DHash server
DHash server
DHash server
Application
18
Fig12.1 Storage systems and their properties
Sharing Persis- tence
Distributed cache/replicas
Consistency maintenance
Example
Main memory RAM
File system UNIX file system
Distributed file system Sun NFS
Web Web server
Distributed shared memory Ivy (DSM, Ch. 18)
Remote objects (RMI/ORB) CORBA
Persistent object store 1 CORBA Persistent Object Service
Peer-to-peer storage
system
OceanStore (Ch. 10)
1
1
1
2
Types of consistency:
1: strict one-copy. : slightly weaker guarantees. 2: considerably weaker guarantees.
Distributed File Systems
19
Fig12.2:File system modules
Directory module: relates file names to file IDs
File m odule: relates file IDs to particular files
Access control module: checks permission for operation requested
File access module: reads or writes file data or attributes
Block module: accesses and allocates disk blocks
Device module: disk I/O and buffering
20
Fig12.3:File attribute record structure
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list
21
Fig12.4:UNIX file system operations
filedes = open(name, mode)
filedes = creat(name, mode)
Opens an existing file with the given name.
Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open
file. The mode is read, write or both.
status = close(filedes) Closes the open file filedes.
count = read(filedes, buffer, n)
count = write(filedes, buffer, n)
Transfers n bytes from the file referenced by filedes to buffer.
Transfers n bytes to the file referenced by filedes from buffer.
Both operations deliver the number of bytes actually transferred
and advance the read-write pointer.
pos = lseek(filedes, offset,
whence)
Moves the read-write pointer to offset (relative or absolute,
depending on whence).
status = unlink(name) Removes the file name from the directory structure. If the file
has no other names, it is deleted.
status = link(name1, name2) Adds a new name (name2) for a file (name1).
status = stat(name, buffer) Gets the file attributes for file name into buffer.
22
Fig12.5:File service architecture
Client computer Server computer
Application
program
Application
program
Client module
Flat file service
Directory service
23
Fig12.6:Flat file service operations
Read(FileId, i, n) -> Data
— throws BadPosition
If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items
from a file starting at item i and returns it in Data.
Write(FileId, i, Data)
— throws BadPosition
If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a
file, starting at item i, extending the file if necessary.
Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.
Delete(FileId) Removes the file from the file store.
GetAttributes(FileId) -> Attr Returns the file attributes for the file.
SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not
shaded in Figure 12.3).
24
Fig12.7:Directory service operations
Lookup(Dir, Name) -> FileId
— throws NotFound
Locates the text name in the directory and returns the
relevant UFID. If Name is not in the directory, throws an
exception.
AddName(Dir, Name, FileId)
— throws NameDuplicate
If Name is not in the directory, adds (Name, File) to the
directory and updates the file’s attribute record.
If Name is already in the directory: throws an exception.
UnName(Dir, Name)
— throws NotFound
If Name is in the directory: the entry containing Name is
removed from the directory.
If Name is not in the directory: throws an exception.
GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the
regular expression Pattern.
25
Fig12.8:NFS architecture
UNIX kernel
protocol
Client computer Server computer
system calls
Local Remote
UNIX
file
system
NFS
client
NFS
server
UNIX
file
system
Application
program
Application
program
NFS
UNIX
UNIX kernel
Virtual file system Virtual file system
Oth
er
fil
e sy
stem
26
Fig12.9:NFS server operations–1 lookup(dirfh, name) -> fh, attr Returns file handle and attributes for the file name in the directory
dirfh.
create(dirfh, name, attr) ->
newfh, attr
Creates a new file name in directory dirfh with attributes attr and
returns the new file handle and attributes.
remove(dirfh, name)
status
Removes file name from directory dirfh.
getattr(fh) -> attr Returns file attributes of file fh. (Similar to the UNIX stat system
call.)
setattr(fh, attr) -> attr Sets the attributes (mode, user id, group id, size, access time and
modify time of a file). Setting the size to 0 truncates the file.
read(fh, offset, count) -> attr, data Returns up to count bytes of data from a file starting at offset.
Also returns the latest attributes of the file.
write(fh, offset, count, data) -> attr Writes count bytes of data to a file starting at offset. Returns the
attributes of the file after the write has taken place.
rename(dirfh, name, todirfh, toname)
-> Status
Changes the name of file name in directory dirfh to toname in
directory to todirfh .
link(newdirfh, newname, dirfh, name)
-> status
Creates an entry newname in the directory newdirfh which refers to
file name in the directory dirfh.
Continues on next slide ...
27
Fig12.9:NFS server operations–2
symlink(newdirfh, newname, string)
-> status
Creates an entry newname in the directory newdirfh of type
symbolic link with the value string. The server does not interpret
the string but makes a symbolic link file to hold it.
readlink(fh) -> string Returns the string that is associated with the symbolic link file
identified by fh.
mkdir(dirfh, name, attr) ->
newfh, attr
Creates a new directory name with attributes attr and returns the
new file handle and attributes.
rmdir(dirfh, name) -> status Removes the empty directory name from the parent directory dirfh.
Fails if the directory is not empty.
readdir(dirfh, cookie, count) ->
entries Returns up to count bytes of directory entries from the directory
dirfh. Each entry contains a file name, a file handle, and an opaque
pointer to the next directory entry, called a cookie. The cookie is
used in subsequent readdir calls to start reading from the following
entry. If the value of cookie is 0, reads from the first entry in the
directory.
statfs(fh) -> fsstats Returns file system information (such as block size, number of
free blocks and so on) for the file system containing a file fh.
28
Fig12.10:Local and remote file
systems accessible on an NFS client
jim jane joeann
userss tudents
usrvm unix
Client Server 2
. . . nfs
Remote
mounts taff
big bobjon
people
Server 1
export
(root)
Remote
mount
. . .
x
(root) (root)
Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in
Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server
2.
29
Fig12.11:Distribution of processes in
the Andrew File System
Venus
Workstations Servers
Venus
VenusUserprogram
Network
UNIX kernel
UNIX kernel
Vice
Userprogram
Userprogram
ViceUNIX kernel
UNIX kernel
UNIX kernel
30
Fig12.12:File name space seen by
clients of AFS
/ (root)
tmp bin cmuvm unix. . .
bin
SharedLocal
Symbolic
links
31
Fig12.13:System call interception in AFS
UNIX filesys tem calls
Non-local fileoperations
Workstation
Localdisk
Userprogram
UNIX kernel
Venus
UNIX file sys tem
Venus
32
Fig12.14:Implementation of file system
calls in AFS User process UNIX kernel Venus Net Vice
open(FileNam e,m ode)
IfFileNam e refers t o afile in shared file space ,pass the request toVenus.
Open the local f ile andret urn the filedesc ript or to t heapplicat ion.
Check list of f iles inloca l cache. If notpresent or t here is novalid callback prom ise,send a request for t hefile t o the Vice se rve r
t hat is custodian of t hevolume cont aining t hefile.
P lace the copy of t hefile in t he loca l filesyst em, ente r it s localname in the local cache
list and ret urn the localname to U NIX.
Transfer a copy of the
file and acallbackprom ise t o theworkst at ion. Log t heca llback promise.
read(FileDescriptor,Buffer, length)
P erform a normalUN IX read ope rat ion
on t he loca l copy.
write (FileD escriptor,Buffer, length)
P erform a normal
UN IX writ e operat ionon t he loca l copy.
close(FileDescriptor) Close the local copyand notify Venus tha tt he file has been closed. If t he local copy has
been changed, send acopy t o t he Vice server
t hat is t he custodian oft he file.
Replace t he filecont ent s and send acallback t o all ot herclients holdingcallbackprom ise s on t he file.
33
Fig12.15:The main components of the
Vice service interface
Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file
identified by the fid and records a callback promise on it.
Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified
file.
Create() -> fid Creates a new file and records a callback promise on it.
Remove(fid) Deletes the specified file.
SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the
lock may be shared or exclusive. Locks that are not removed
expire after 30 minutes.
ReleaseLock(fid) Unlocks the specified file or directory.
RemoveCallback(fid) Informs server that a Venus process has flushed a file from its
cache.
BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels
the callback promise on the relevant file.
34
Fig13.1Composed naming domains used to access a resource from a URL
http://www.cdk3.net:8888/WebExamples/earth.html
URL
Resource ID (IP number, port number, pathname)
Network address
2:60:8c:2:b0:5a file
Web server
55.55.55.55 WebExamples/earth.html 8888
DNS lookup
Socket
http://www.cdk5.net:8888/WebExamples/earth.html
2:60:8c:2:b0:5a
Name Services
35
Fig13.2:Iterative navigation
Client 1
2
3
A client iteratively contacts name servers NS1–NS3 in order to
resolve a name
NS2
NS1
NS3
Name servers
36
Fig13.3:Non-recursive and recursive
server-controlled navigation
1
2
3
5
1
2
3 4
4
A name server NS1 communicates with other name servers on
behalf of a client
client client
Recursive server-controlled
NS2
NS1
NS3
NS2
NS1
NS3
Non-recursive server-controlled
37
Fig13.4:DNS name servers
Note: Name server names
are in italics, and the
corresponding domains
are in parentheses.
Arrows denote name
server entries
a.root-servers.net (root)
ns0.ja.net (ac.uk)
dns0.dcs.qmul.ac.uk (dcs.qmul.ac.uk)
alpha.qmul.ac.uk (qmul.ac.uk)
dns0-doc.ic.ac.uk (ic.ac.uk)
ns.purdue.edu (purdue.edu)
uk purdue.edu
ic.ac.uk
qmul.ac.uk
dcs.qmul.ac.uk *.qmul.ac.uk
*.ic.ac.uk *.dcs.qmwul.ac.uk
* .purdue.edu
ns1.nic.uk (uk)
ac.uk
co.uk
yahoo.com
38
Fig13.5:DNS resource records
Record type Meaning Main contents
A A computer address IP number
NS An authoritative name server Domain name for server
CNAME The canonical name for an alias Domain name for alias
SOA Marks the start of data for a zone Parameters governing the zone
WKS A well-known service description List of service names and protocols
PTR Domain name pointer (reverse lookups)
Domain name
HINFO Host information Machine architecture and operating system
MX Mail exchange List of < preference, host > pairs
TXT Text string Arbitrary text
39
Fig13.6:DNS zone data records
domain name time to live class type value
1D IN NS dns0
1D IN NS dns1
1D IN NS cancer.ucs.ed.ac.uk
1D IN MX 1 mail1.qmul.ac.uk
1D IN MX 2 mail2.qmul.ac.uk
domain name ti m e to l i ve class type value
www 1D IN CNAME apr i cot
apr i cot 1D IN A 138.37.88.248
dcs 1D IN NS dns0.dcs
dns0.dc s 1D IN A 138.37.88.249
dcs 1D IN NS dns1.dcs
dns1.dc s 1D IN A 138.37.94.248
dcs.qmul.ac.uk
dcs.qmul.ac.uk
dcs.qmul.ac.uk
dcs.qmul.ac.uk
dcs.qmul.ac.uk
40
Fig13.7:GNS directory tree and value
tree for user Peter.Smith
UK FR
AC
QMW DI: 322
Peter.Smith
password mailboxes
DI: 599 (EC)
DI: 574 DI: 543
DI: 437
Alpha Gamma Beta
41
Fig13.8:Merging trees under a new root
EC
UK FR
DI: 599
DI: 574 DI: 543
NORTH AMERICA
US
DI: 642
DI: 457 DI: 732
#599 = #633/EC #642 = #633/NORTH AMERICA
Well-known directories:
CANADA
DI: 633 (WORLD)
42
Fig13.9:Restructuring the directory
EC
UK FR
DI: 599
DI: 574 DI: 543
NORTH AMERICA
US
DI: 642
DI: 457 DI: 732 CANADA
DI: 633 (WORLD)
#633/EC/US
US
#599 = #633/EC #642 = #633/NORTH AMERICA
Well-known directories:
43
Fig13.10:X.500 service architecture
DSA
DSA
DSA
DSA
DSA DSA DUA
DUA
DUA
44
Fig13.11:Part of the X.500 Directory
Information Tree
... France (country) Great Britain (country) Greece (country) ...
BT Plc (organization) University of Gormenghast (organization) ... ...
Department of Computer Science (organizationalUnit)
Computing Service (organizationalUnit)
Engineering Department (organizationalUnit)
...
...
X.500 Service (root)
Departmental Staff (organizationalUnit)
Research Students (organizationalUnit)
ely (applicationProcess)
...
...
Alice Flintstone (person) Pat King (person) James Healey (person) ... ... Janet Papworth (person) ...
45
Fig13.12:An X.500 DIB Entry
info
Alice Flintstone, Departmental Staff, Department of Computer Science, University of Gormenghast, GB
commonName
Alice.L.Flintstone Alice.Flintstone Alice Flintstone A. Flintstone
surname
Flintstone
telephoneNumber
+44 986 33 4604
uid
alf
roomNumber
Z42
userClass
Research Fellow
46
Questions?