5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the...

93
5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented

Transcript of 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the...

Page 1: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5 NAMING

The difference between namingin distributed systems and nondistributed

systems lies in the way namingsystems are implemented

Page 2: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Contents

• 5.1 NAMES, IDENTIFIERS, AND ADDRESSES• 5.2 FLAT NAMING• 5.3 STRUCTURED NAMING• 5.4 ATTRIBUTE-BASED NAMING

Page 3: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.1 NAMES, IDENTIFIERS, AND ADDRESSES

• Entity• Access point• Address• identifier• Name

Page 4: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Entity• Entity– An entity in a distributed system can be practically anything– processes, users, mailboxes, newsgroups, Web pages,

graphical windows, messages, network connections• Access point– An access point is yet another, but special, kind of entity in

a distributed system– 访问体系中的实体,是普通实体的宿主,归属– The name of an access point is called an address

• An address of that entity– The address of an access point of an entity is also simply

called the address of that entity

Page 5: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Reference an entity• By an address

– An entity may easily change an access point , eg. Tel num– An entity offers more than one access point, which?– Name should be independent from its address

• By an identifier– 1. An identifier refers to at most one entity.– 2. Each entity is referred to by at most one identifier.– 3. An identifier always refers to the same entity (i.e., it is never

reused)• By a name

– a human-friendly name is generally represented as a character string

– defined entirely by the user. eg. File names, DNS names

Page 6: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Name of an entity

• Name– A name in a distributed system is a string of bits or

characters that is used to refer to an entity• Type of a name– Address– Identifier– name

Page 7: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

names and identifiers to addresses?

• name-to-address binding– a table of (name, address)– in distributed systems a centralized table is not

going to work• Instead, what often happens is that a name is

decomposed into several parts such as Jtp.cs. vu.nl and that name resolution takes place through a recursive lookup of those parts

Page 8: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

DNS 解析过程及原理• 第一步:客户机提出域名解析请求 , 并将该请求发送给本地的域名

服务器。• 第二步:当本地的域名服务器收到请求后 , 就先查询本地的缓存 ,

如果有该纪录项 , 则本地的域名服务器就直接把查询的结果返回。• 第三步:如果本地的缓存中没有该纪录 , 则本地域名服务器就直接

把请求发给根域名服务器 , 然后根域名服务器再返回给本地域名服务器一个所查询域 ( 根的子域 ) 的主域名服务器的地址。

• 第四步:本地服务器再向上一步返回的域名服务器发送请求 , 然后接受请求的服务器查询自己的缓存 , 如果没有该纪录 , 则返回相关的下级的域名服务器的地址。

• 第五步:重复第四步 , 直到找到正确的纪录。 • 第六步:本地域名服务器把返回的结果保存到缓存 , 以备下一次使

用 , 同时还将结果返回给客户机。

Page 9: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.2 FLAT NAMING

• Identifiers– Flat names• They are simply random bit strings. which we

conveniently refer to as unstructured names– An important property of such a name is that it

does not contain any information on how to locate the access point of its associated entity

Page 10: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.2.1 Simple Solutions

• Broadcasting and Multicasting• Forwarding Pointers• Both solutions are applicable only to local-

area networks

Page 11: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Broadcasting

• Broadcasting– In essence, a machine broadcasts a packet on the

local network asking who is the owner of a given IP address. When the message arrives at a machine, the receiver checks whether it should listen to the requested IP address. If so, it sends a reply packet containing, for example, its Ethernet address

• Inefficiency of Broadcasting– Network bandwidth wasted– too many hosts maybe interrupted by requests

Page 12: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Multicasting

• Multicasting only a restricted group of hosts receives the request– allowing hosts to join a specific multicast group– Such groups are identified by a multicast address– When a host sends a message to a multicast address,

the network layer provides a best-effort service to deliver that message to all group members

• A general location service for multiple entities• To associate it with a replicated entity

Page 13: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Forwarding Pointers

• When an entity moves from A to B, it leaves behind in A a reference to its new location at B– advantage

• simplicity

– Drawbacks• a chain for a highly mobile entity can become so long that locating that

entity is prohibitively expensive• all intermediate locations in a chain will have to maintain their part of

the chain of forwarding pointers as long as needed• the vulnerability to broken links

• An important issue – to keep chains relatively short, and to ensure that forwarding

pointers are robust

Page 14: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.
Page 15: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

An example of Forwarding Pointers

• Each forwarding pointer is implemented as a (client stub, server stub) pair

• A server stub contains either a local reference to the actual object or a local reference to a remote client stub for that object

• migration is completely transparent to a client

Page 16: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

short-cut the chain

• A client-stub identification consists of the client's transport-level address, combined with a locally generated number to identify that stub

• When the invocation reaches the object at its current location, a response(the current location) is sent back to the client stub

• the client stub adjusts its companion server stub to the new location

Page 17: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Redirecting a forwarding pointer by storing a shortcut in a client stub

Page 18: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

reverse path of forwarding pointers

• sending the response along the reverse path allows adjustment of all intermediate stubs

• distributed garbage collection (server stubs)• Trade-off • Problems arise when a process in a chain of

(client stub, server stub) pairs crashes

Page 19: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Problems of broadcasting and forwarding pointers

• Broadcasting or multicasting is difficult to implement efficiently in large scale networks whereas

• long chains of forwarding pointers introduce performance problems and are susceptible to broken links

Page 20: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.2.2 Home-Based Approaches• Home location– Where an entity was created– It keeps track of the current location of an entity

• The home-based approach is used as a fall-back mechanism for location services based on forwarding pointers

• An example: Mobile IP

Page 21: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

IPV6---Home-Based Approache

• A mobile node has a home network for which it has an stable address, known as its home address (HoA).

• This home network has a special router attached, known as the home agent, which will take care of traffic to the mobile node when it is away.

• When a mobile node attaches to a foreign network, it will receive a temporary care-of address (CoA)

• This care-of address is reported to the node's home agent who will then see to it that all traffic is forwarded to the mobile node

Page 22: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The invocation process of Mobile IP

• When the home agent receives a packet for the mobile host, it looks up the host's current location– If the host is on the current local network, the

packet is simply forwarded– Otherwise, it is tunneled to the host's current

location• At the same time, the sender of the packet is

informed of the host's current location

Page 23: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Note that the IP address is effectively used as an identifier for the mobile host.

This principle of Mobile IP

Page 24: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Drawback of home-based approaches

• A client first has to contact the home, increase in communication latency

• Fixed home location, which must always exists• It would have been better if the home could have

moved along with the host• A solution to this problem– Register the home at a traditional naming service and to

let a client first look up the location of the home.– The location can be effectively cached after it has been

looked up

Page 25: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.2.3 Distributed Hash Tables

• General Mechanism• Linear • Chord– Examples – Keeping the finger tables up-to-date

• Exploiting Network Proximity

Page 26: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

General Mechanism

• Chord uses an m-bit identifier space to assign randomly-chosen identifiers to nodes as well as keys to specific entities

• An entity with key k falls under the jurisdiction of the node with the smallest identifier id >= k. This node is referred to as the successor of k and denoted as succ(k)

• The main issue in DHT-based systems is to efficiently resolve a key k to the address of succ(k).

Page 27: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Linear approach

• An obvious nonscalable approach is let each node p keep track of the successor succ(p+ 1) as well as its predecessor pred(p)

• When a node p receives a request to resolve key k, it will simply forward the request to one of its two neighbors-whichever one is appropriate-unless pred (p) < k -1 in which case node p should return its own address to the process that initiated the resolution of key k

Page 28: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Chord

• Instead of the linear approach toward key lookup, each Chord node maintains a finger table of at most m entries finger table– FTp[i]=succ(p+2i-1)

• To look up a key k, node p will then immediately forward the request to node q with index j in p's finger table where:– q=FTp[j]<=k<FTp[j+1]

Page 29: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.
Page 30: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Example • resolving k = 26 from node 1• node 1 will look up k = 26 in its finger table FT1[5]

=> node 18• FTl8 [2] < k < FTl8 [3] => node 20• 20 => 21• 21 => 28• a lookup will generally require O(log (N)) steps,

with N being the number of nodes in the system• Another: look up k=12 at 28, 28->4->9->11->14

Page 31: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Changing

In large distributed systems the collection of participating nodes can be expected to change all the time.

Not only will nodes join and leave voluntarily, we also need to consider the case of nodes failing, to later recover again

Page 32: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Keeping the finger tables up-to-date

• Most important is that for every node q, FTq [1]– each node q regularly runs a simple procedure to check– q == pred (succ (q +I)) ?

• Issuing a request to resolve succ(FTq[i-1])– such requests are issued regularly by means of a background

process.• Likewise, each node q will regularly check whether its

predecessor is alive or unknown• If the predecessor of succ(q+ 1) has been set to

"unknown," it will simply notify succ( q+ 1) that it suspects it to be the predecessor

Page 33: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Exploiting Network Proximity

• One of the potential problems– may be routed erratically across the Internet– For example, assume that node 1 in Fig. 5-4 is placed in

Amsterdam, The Netherlands; node 18 in San Diego, California; node 20 in Amsterdam again; and node 21 in San Diego. The result of resolving

– key 26 will then incur three wide-area message transfers– It can be reduced to at most one

• To minimize these pathological cases, designing a DHT-based system requires taking the underlying network into account------ Network Proximity

Page 34: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Three Ways

• Topology-based assignment of node identifiers• Proximity routing• Proximity neighbor selection

Page 35: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Topology-based assignment

– two nearby nodes will have identifiers that are also close to each other

– node identifiers are sampled from a one-dimensional space, mapping a logical ring to the Internet

– nodes on the same enterprise network will have identifiers from a relatively small interval. When that network becomes unreachable, we suddenly have a gap in the otherwise uniform distribution of identifiers

Page 36: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Proximity routing

• Each node in Chord could equally well keep track of r successors

• This redundancy can be applied for every entry in the finger table

• For node p, FTp [i] points to the first node in the range FTp[p+2i-1,p+2i-1], p cannot keep track of r nodes in that range– A node can pick one of the r successors that is closest to

itself– node failures need not immediately lead to failures of

lookups

Page 37: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Proximity neighbor selection

• Optimize routing tables such that the nearest node is selected as neighbor

• When a node joins it receives information about the current overlay from multiple other Nodes

• This information is used by the new node to construct a routing table

Page 38: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.2.4 Hierarchical Approaches

A leaf domain: A local-area network in a computer network or a cell in a mobile telephone network

Page 39: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A tree of directory nodes

• A network is divided into a collection of domains.• A single top-level domain • Each domain can be subdivided into subdomains• A lowest-level domain, called a leaf domain– a local-area network– a cell in a mobile telephone network

• Each domain D has an associated directory node– keeps track of the entities in that domain

Page 40: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

To keep track of an entity

• To keep track of the whereabouts of an entity, each entity currently located in a domain D is represented by a location record in the directory node dir(D)

• Entity, location record, domain, directory node,• The root node will have a location record for

each entity– each location record stores a pointer to the

directory node of the next lower-level subdomain

Page 41: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Multiple addresses of An entity

Page 42: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Looking up a location

Page 43: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Update operations

• The lookup operation exploits locality. Update operations exploit locality in a similar fashion

• An entity E that has created a replica in leaf domain D for which it needs to insert its address– initiated at the leaf node dir(D) of D– forwards the insert request to its parent– until it reaches a directory node M– node M store a pointer in the location record for E,– This process continues until we reach the leaf node from

which the insert was initiated

Page 44: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Update operations

Page 45: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Others

• Create a location record before passing the insert request to the parent node

• The advantage of the latter is that an address becomes available for lookups as soon as possible– if a parent node is temporarily unreachable, the

address can still be looked up within the domain represented by the current node

• A delete operation is analogous to an insert op.

Page 46: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Flat names are good for machines, but are generally not very convenient for humans to use

an alternative, naming systems generally support structured names that are composed from simple, human-readable names

5.3 STRUCTURED NAMING

Page 47: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.3.1 Name Spaces

• name space– Name spaces for structured names can be

represented as a labeled, directed graph with two types of nodes

– A leaf node represents a named entity and has the property that it has no outgoing edges

– a directory node has a number of outgoing edges, each labeled with a name

Page 48: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A general naming graph

• A directory node– an associated identifier– a directory table (edge label, node identifier)

• naming graph– Root– Path• absolute path name• relative path name

Page 49: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A general naming graph

Page 50: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Naming fashion

• global name– a name that denotes the same entity, no matter where that

name is used in a system• local name– a name whose interpretation depends on where that name

is being used• Compare with the file system– Different path, same node.( hard link )– all resources, such as processes, hosts, I/O devices, and

network interfaces, are named in the same fashion as traditional files

– directed acyclic graph

Page 51: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Compare to the file system

• The index number of an inode corresponds to a node identifier in the naming graph

Page 52: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.3.2 Name Resolution

• Name resolution– The process of looking up a name

• For example N: <label-1, label-2, ... .label-n>– Starts at node N, the name label-1 is looked up in the

directory table, and returns the identifier of the node– Then continues at the identified node by looking up

the name label-2 in its directory table and so on. – Resolution stops at the last node referred to by label-n

Page 53: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Closure Mechanism

• Closure mechanism – Deals with selecting the initial node in a name

space from which name resolution is to start• They are dependent on the system,

necessarily partly implicit and may be very different when comparing them to each other– inode of the root directory, actual byte offset– telephone number 0031204430784– the variable named HOME闭合机制,一些隐含的信息,形成一个完整的环境

Page 54: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Linking and Mounting

• An alias– Another name for the same entity– two different ways to implement an alias

• Hard links– allow multiple absolute paths names to refer to the

same node in a naming graph• Symbolic links– represent an entity by a leaf node, say N, but instead

of storing the address or state of that entity, the node stores an absolute path name

Page 55: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Symbolic Link

Page 56: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Mount

• Let a directory node refer to a different name• The directory node storing the node identifier

is called a mount point• Accordingly, the directory node in the foreign

name space is called a mounting point• The principle of mounting can be generalized

to other name spaces as well

Page 57: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Mounting in distributed file systems

• Each name space is implemented by a different server, different machine.

• To mount a foreign name space in a distributed system requires at least the following information– 1. The name of an access protocol.– 2. The name of the server.– 3. The name of the mounting point in the foreign name

space.• In nondistributed systems, none of the three points

may actually be needed

Page 58: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

URL(uniform resource location)

• To mount a foreign name space requires– The name of an access protocol– The name of the server– The name of the mounting point in the foreign

name space• One possibility is to represent the three

names listed above as a URL– nfs://flits.cs. vu.nl//homelsteen

• Name resolve

Page 59: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Mounting remote name spaces

Page 60: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

cd /remote/vuIs -I

The beauty of all this is that the user is spared the details of the actual access to the remote server

Ideally, only some loss in performance is noticed

The name space rooted on the local machine, and the one rooted at /home/steen, form a single name space

Page 61: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.3.3 The Implementation of a Name Space

• Naming service– allows users and processes to add, remove, and look

up names– is implemented by name servers

• A name space forms the heart of a naming service• Local area network– a single name server

• In large-scale distributed systems– a name space over multiple name servers

Page 62: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Name Space Distribution

• Name spaces are organized hierarchically• The global layer– is formed by highest-level nodes

• The administrational layer – is formed by directory nodes that together are

managed within a single organization• The managerial layer– consists of nodes that may typically change

regularly

Page 63: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The global layer

• The root node and other directory nodes logically close to the root

• Stability: directory tables are rarely changed• Such nodes may represent organizations,or

groups of organizations, for which names are stored in the name space

Page 64: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The administrational layer

• The nodes represent groups of entities that belong to the same organization or administrational unit– directory node for a department– a directory node from which all hosts can be found– directory node may be used as the starting point

for naming all users• The nodes in the administrational layer are

relatively stable

Page 65: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The managerial layer

• Nodes representing hosts in the local network belong to this layer

• Nodes representing shared files• Nodes includes those that represent user-

defined directories and files• The nodes in the managerial layer are

maintained not only by system administrators, but also by individual end users of a distributed system

Page 66: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

DNS name space

A zone is a part of the name space that is implemented by a separate name server

Page 67: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A and P in the global layer

• High availability is especially critical for name servers in the global layer

• Performance– name servers in the global layer do not have to

respond quickly to a single lookup request– throughput may be important, especially in large-

scale systems with millions of users• Replicating servers

Page 68: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A and P in the administrational layer

• Availability – Availability for a name server in the

administrational layer is primarily important for clients in the same organization

– it may be less important that resources in an organization are temporarily unreachable for users outside that organization

• With respect to performance, name servers in the administrational layer have similar characteristics as those in the global layer

Page 69: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

These requirements can often be met by using high-performance machines to run name servers. In addition, client-side caching should be applied, combined with replication for increased overall availability

Page 70: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A and P in the managerial level

• Availability requirements for name servers at the managerial level are generally less demanding

• Performance is crucial. Users expect operations to take place immediately. Because updates occur regularly, client-side caching is often less effective

Page 71: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A comparison between name servers for implementing nodes

Page 72: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Implementation of Name Resolution

• The distribution of a name space affects the implementation of name resolution

• ftp://ftp.cs.vu.nl/pub/globe/index.txt• Iterative name resolution – User process hand over nl,vu,cs,ftp to the client’s

name resolver – The name resolver returns the ftp server address

to the user process– The user process contact to the ftp server

Page 73: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Similarity between LDAP and DNS

• Directory User Agents (DUA)• A DUA is similar to a name resolver in

structured-naming services

Page 74: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Difference between LDAP and DNS

• Given a set of criteria that attributes of the searched entries---LDAP– answer = search("&(C=NL)(O=Vrije Universiteit)

(OU=*)(CN=Main server)")– Searching in a directory service is generally an

expensive operation– We need to access several leaf nodes of a DIT to get

an answer. In contrast, naming services can often be implemented in only a single leaf node

• Name---DNS

Page 75: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

A Forest of LDAP domains

• Allowing several trees(LDAP) to co-exist, while also being linked to each other.

• Active Directory(Microsoftware) usually assumes there is a global index server (called a global catalog) that can be searched first. The index will indicate which LDAP domains need to be searched further

• Every tree in LDAP needs to be accessible at the root. The root is often known under a DNS name

Page 76: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The principle of iterative name resolution

Page 77: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The principle of recursive name resolution

• A name server passes the result to the next name server it finds

• When the root name server finds the address of the name server implementing the node named nl, it requests that name server to resolve the path name nl:<vu, CS, ftp, pub, globe, index.html>

• Eventually return the file index.html to the root server, which, in tum, will pass that file to the client's name resolver

Page 78: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The principle of recursive name resolution

Page 79: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

The principle of recursive name resolution

• The first advantage is that caching results is more effective compared to iterative name resolution

• it is often cheaper with respect to communication

Page 80: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.4 ATTRIBUTE-BASED NAMING

• Location independence and human friendliness are not the only criterion for naming entities

• This approach requires that a user can provide merely a description of what he is looking for

• Attribute-based naming– To describe an entity in terms of (attribute, value)

pairs

Page 81: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.4.1 Directory Services

• Attribute-based naming systems are also known as directory services

• Systems that support structured naming are generally called naming systems

• Designing an appropriate set of attributes is not trivial– In most cases, attribute design has to be done

manually• Setting the values is also a problem

Page 82: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Resource Description Framework

• Triplets(Person, name, Alice)– Person whose name is Alice

• In RDF, each subject, predicate, or object can be a resource itself

• Query on resource description– Separate techniques need to be applied when the

data is distributed across multiple, potentially dispersed computers

Page 83: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.4.2 Hierarchical Implementations: LDAP

• LDAP (Lightweight Directory Access Protocol)• An LDAP directory service consists of a number of

records(directory entries)• A directory entry– Is comparable to a resource record in DNS– Each record is made up of a collection of (attribute.

value) pairs– Each attribute has an associated type

• single-valued attributes• multiple-valued attributes (array or list)

Page 84: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

An LDAP directory entry

Page 85: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Globally unique name

• The collection of all directory entries in an LDAP directory service is called a directory information base (DIB)

• Each record(in DIB) is uniquely named• Such a globally unique name appears as a sequence of

naming attributes– Country, Organization, and Organizational Unit could be used

to form the globally unique name– /C=NL/O=Vrije University/OU=Comp. Sc.

• Each naming attribute is called a relative distinguished name, or RDN for short

Page 86: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Directory Information Tree (DIT)

• A DIT essentially forms the naming graph of an LDAP directory service

• Each node represents a directory entry• A node may also act as a directory in the traditional

sense• 一个 node 既是一个目录又是一个属性的集合

(目录) • Operations on the node– Read, read a single record given its path name in the DIT– List, list the names of all outgoing edges of a given node

Page 87: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.
Page 88: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Implementing --- Compared with DNS

• Much the same way as implementing a naming service such as DNS

• LDAP supports more lookup operations• When dealing with a large-scale directory,– The DIT is usually partitioned and distributed across

several servers, known as directory service agents (DSA)

– Each part of a partitioned DIT thus corresponds to a zone in DNS

– Each DSA behaves very much the same as a normal name server

Page 89: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

5.4.3 Decentralized Implementations

• Decentralized attribute-based naming systems– Peer to peer

• The key issue here is that (attribute, value) pairs need to be efficiently mapped so that searching can be done efficiently, that is, by avoiding an exhaustive search through the entire attribute space

Page 90: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Mapping to Distributed Hash Tables

• Queries consist of a conjunction of pairs– specifies a list of attributes, along with the unique

value– Each entity is assumed to be described by

hierarchically organized attributes

Page 91: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Mapping to Distributed Hash Tables

• The main issue is to transform the AVTrees into a collection of keys that can be looked up in a DHT system

• Every path originating in the root is assigned a unique hash value– h1: hash(type-book)– h 2: hash(type-book-author)– h 3: hashttype-book-author- Tolkien)– h4: hash(type-book-title)– h 5: hash(type-book-title-LOTR)– h 6: hash(genre-fantasy)

• A node responsible for hash value hi will keep (a reference to) the actual resource

Page 92: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Example

• "Return books written by Tolkien”• This query is translated into the AVTree• Leading to computing the following three hashes– h1: hash(type-book)– h2 : hash( type-book -author)– h3: hashttype-book-author- Tolkien)

Page 93: 5 NAMING The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.

Range specifications for attribute values

• Looking for a house will generally want to specify that the price must fall within a specific range