Chapter 5 naming

Chapter 5 - Naming

2

Introductionnames play an important role to:

share resourcesuniquely identify entitiesrefer to locationsetc.

an important issue is that a name can be resolved to the entity it refers toto resolve names, it is necessary to implement a namingsystemin a distributed system, the implementation of a naming system is itself often distributed, unlike in nondistributedsystemsefficiency and scalability of the naming system are the main issues

3

we will discusssome general issues in naminghow human-friendly names are organized and implemented; e.g., those for file systems and the WWW; classes of naming systems

flat namingstructured naming, andattribute-based naming

Objectives of the Chapter

4

5.1 Names, Identifiers, and Addresses

a name in a distributed system is a string of bits or characters that is used to refer to an entityan entity is anything; e.g., resources such as hosts, printers, disks, files, objects, processes, users, Web pages, newsgroups, mailboxes, network connections, ... entities can be operated on

e.g., a resource such as a printer offers an interface containing operations for printing a document, requesting the status of a job, etc.a network connection may provide operations for sending and receiving data, setting quality of service parameters, etc.

to operate on an entity, it is necessary to access it through its access point, itself an entity (special)

5

access pointthe name of an access point is called an address (such as IP address and port number as used by the transport layer)the address of the access point of an entity is also referred to as the address of the entityan entity can have more than one access point (similar to accessing an individual through different telephone numbers)an entity may change its access point in the course of time (e.g., a mobile computer getting a new IP address as it moves)

6

an address is a special kind of name it refers to at most one entityeach entity is referred by at most one address; even when replicated such as in Web pagesan entity may change an access point, or an access point may be reassigned to a different entity (like telephone numbers in offices)separating the name of an entity and its address makes it easier and more flexible; such a name is called location independent

there are also other types of names that uniquely identify an entity; in any case a true identifier is a name with the following properties

it refers to at most one entityeach entity is referred by at most one identifierit always refers to the same entity (never reused)

identifiers allow us to unambiguously refer to an entity

7

examplesname of an FTP server (entity)

URL of the FTP serveraddress of the FTP server

IP number:port numberthe address of the FTP server may change

there are three classes on naming systems: flat naming, structured naming, and attribute-based naming

8

5.2 Flat Naminga name is a sequence of characters without structure; like human names? may be if it is not an Ethiopian name!difficult to be used in a large system since it must be centrally controlled to avoid duplicationmoreover, it does not contain any information on how to locate the access point of its associated entityhow are flat names resolved (or how to locate an entity when a flat name is given)

name resolution: mapping a name to an address or an address to a name is called name-address resolutionpossible solutions: simple solutions, home-basedapproaches, and hierarchical approaches

9

1. Simple Solutionstwo solutions (for LANs only): Broadcasting andMulticasting, and Forwarding Pointers

a. Broadcasting and Multicastingbroadcast a message containing the identifier of an entity; only machines that can offer an access point for the entity send a replye.g., ARP (Address Resolution Protocol) in the Internet to find the data link address (MAC address) of a machine

a computer that wants to access another computer for which it knows its IP address broadcasts this addressthe owner responds by sending its Ethernet address

broadcasting is inefficient when the network grows (wastage of bandwidth and too much interruption to other machines)multicasting is better when the network grows - send only to a restricted group of hosts

10

multicasting can also be used to locate the nearest replica - choose the one whose reply comes in first

b. Forwarding Pointershow to look for mobile entitieswhen an entity moves from A to B, it leaves behind a reference to its new locationadvantage

simple: as soon as the first name is located using traditional naming service, the chain of forwarding pointers can be used to find the current address

drawbacksthe chain can be too long - locating becomes expensiveall the intermediary locations in a chain have to maintain their pointersvulnerability if links are broken

hence, making sure that chains are short and that forwarding pointers are robust is an important issue

11

2. Home-Based Approachesbroadcasting and multicasting have scalability problems; performance and broken links are problems in forwardingpointersa home location keeps track of the current location of an entity; often it is the place where an entity was createdit is a two-tiered approachan example where it is used in Mobile IP

each mobile host uses a fixed IP addressall communication to that IP address is initially directly sent to the host’s home agent located on the LAN corresponding to the network address contained in the mobile host’s IP address whenever the mobile host moves to another network, it requests a temporary address in the new network (called care-of-address) and informs the new address to the home agent

12

when the home agent receives a message for the mobile host (from a correspondent agent) it forwards it to its new address (if it has moved) and also informs the sender the host’s current location for sending other packets

home-based approach: the principle of Mobile IP

13

problems:creates communication latency (Triangle routing: correspondent-home network-mobile)the home location must always exist; the host is unreachable if the home does no more exist (permanently changed); the solution is to register the home at a traditional name service and let a client first look up the location of the home

14

3. Hierarchical Approachesa generalization of the two-tiered approach into multiple layersa network is divided into a collection of domains, similar to DNSa single top-level domain spans the entire networkeach domain can be subdivided into multiple, smaller domainsthe lowest-level domain is called a leaf domain; typically a LANeach domain D has an associated directory node dir(D)that keeps track of the entities in that domain leading to a tree of directory nodesthe root (directory) node knows about all entities

15

hierarchical organization of a location service into domains, each having an associated directory node

16

each entity is represented by a location record in the directory node dir(D) to keep track of its whereaboutsa location record for an entity in a leaf domain contains the entity’s current address; all other high-level domains will have only pointers to this address; this means the root node will store only pointers to all entitiesan entity may have multiple addresses, for instance, if it is replicated; a higher level domain containing the two subdomains where the entity has addresses will have two pointers

17

an example of storing information of an entity having two addresses in different leaf domains D1 and D2

18

looking up a location in a hierarchically organized location service

example of a look up operationa client (in Domain D) would like to locate an entity E

19

update operations (i.e., inserting and deleting addresses)read pages 194 - 195)

in addition to the three methods discussed so far (simplesolutions, home-based approaches, and hierarchicalapproaches), another approach for resolution in flat naming is Distributed Hash Tables (DHT)

read pages 188 - 191

20

5.3 Structured Namingflat names are not convenient for humansName Spaces

names are organized into a name spaceeach name is made of several parts; the first may define the nature of the organization, the second the name, the third departments, ...the authority to assign and control the name spaces can be decentralized where a central authority assigns only the first two parts

a name space is generally organized as a labeled, directed graph with two types of nodes

leaf node: represents the named entity and stores information such as its address or the state of that entitydirectory node: a special entity that has a number of outgoing edges, each labeled with a name

each node in a naming graph is considered as another entity with an identifier

21

a general naming graph with a single root node, no

a directory node stores a table in which an outgoing edge is represented as a pair (edge label, node identifier), called a directory tableeach path in a naming graph can be referred to by the sequence of labels corresponding to the edges of the path and the first node in the path, such as

N:<label-1, label-2, ..., label-n>, where N refers to the first node in the path

22

such a sequence is called a path nameif the first node is the root of the naming graph, it is called an absolute path name; otherwise it is a relative path nameinstead of the path name n0:<home, steen, mbox>, we often use its string representation /home/steen/mboxthere may also be several paths leading to the same node, e.g., node n5 can be represented as /keys or /home/steen/keysalthough the above naming graph is directed acyclic graph (a node can have more than one incoming edge but is not permitted to have a cycle), the common way is to use a tree (hierarchical) with a single root (as is used in file systems)

in a tree structure, each node except the root has exactly one incoming edge; the root has no incoming edgeseach node also has exactly one associated (absolute) path name

23

e.g., file naming in UNIX file systema directory node represents a directory and a leaf noderepresents a filethere is a single root directory, represented in the naming graph by the root nodewe have a contiguous series of blocks from a logical diskthe boot block is used to load the operating systemthe superblock contains information on the entire file system such as its size, etc.inodes are referred to by an index number, starting at number zero, which is for the inode representing the root directorygiven the index number of an inode, it is possible to access its associated file

24

Name Resolutiongiven a path name, the process of looking up a name stored in the node is referred to as name resolution; it consists of finding the address when the name is given (by following the path)knowing how and where to start name resolution is referred to as closure mechanism; e.g., UNIX file system

Linking and MountingLinking: giving another name for the same entity (an alias)e.g., environment variables in UNIX such as HOME that

refer to the home directory of a usertwo types of links (or two ways to implement an alias): hard link and symbolic link

hard link: to allow multiple absolute path names to refer to the same node in a naming graphe.g., in the previous graph, there are two different path

names for node n5: /keys and /home/steen/keys

25

the concept of a symbolic link explained in a naming graph

symbolic link: representing an entity by a leaf node and instead of storing the address or state of the entity, the node stores an absolute path name

when first resolving an absolute path name stored in a node (e.g., /home/steen/keys in node n6), name resolution will return the path name stored in the node (/keys), at which point it can continue with resolving that new path name, i.e., closure mechanism

26

so far name resolution was discussed as taking place within a single name spacename resolution can also be used to merge different name spaces in a transparent waythe solution is to use mounting

Mountingas an example, consider a mounted file system, which can be generalized to other name spaces as welllet a directory node store the directory node from a different (foreign) name spacethe directory node storing the node identifier is called a mount pointthe directory node in the foreign name space is called a mounting point, normally the root of a name spaceduring name resolution, the mounting point is looked up and resolution proceeds by accessing its directory table

27

consider a collection of name spaces distributed across different machines (each name space implemented by a different server)to mount a foreign name space in a distributed system, the following are at least required

the name of an access protocol (for communication)the name of the serverthe name of the mounting point in the foreign name space

each of these names needs to be resolvedto the implementation of the protocol so that communication can take place properlyto an address where the server can be reachedto a node identifier in the foreign name space (to be resolved by the server of the foreign name space)

the three names can be listed as a URL

28

example: Sun’s Network File System (NFS) is a distributed file system with a protocol that describes how a client can access a file stored on a (remote) NFS file server

an NFS URL may look like nfs://flits.cs.vu.nl/home/steen- nfs is an implementation of a protocol- flits.cs.vu.nl is a server name to be resolved using DNS- /home/steen is resolved by the foreign servere.g., the subdirectory /remote includes mount points for foreign name spaces on the client machine

a directory node named /remote/vu is used to store nfs://flits.cs.vu.nl/home/steenconsider /remote/vu/mboxthis name is resolved by starting at the root directory on the client’s machine until node /remote/vu, which returns the URL nfs://flits.cs.vu.nl/home/steenthis leads the client machine to contact flits.cs.vu.nlusing the NFS protocolthen the file mbox is read in the directory /home/steen

29

mounting remote name spaces through a specific process protocol

mount point

mounting point

30

distributed systems that allow mounting a remote file system also allow to execute some commandsexample commands to access the file system

cd /remote/vu /*changing directory on the remote machinels -l /*listing the files on the remote machine

by doing so the user is not supposed to worry about the details of the actual access; the name space on the local machine and that on the remote machine look to form a single name space

31

The Implementation of a Name Spacea name space forms the heart of a naming servicea naming service allows users and processes to add, remove, and lookup namesa naming service is implemented by name serversfor a distributed system on a single LAN, a single server might suffice; for a large-scale distributed system the implementation of a name space is distributed over multiple name servers

Name Space Distributionin large scale distributed systems, it is necessary to distribute the name service over multiple name servers, usually organized hierarchicallya name service can be partitioned into logical layersthe following three layers can be distinguished (according to Cheriton and Mann)

32

global layerformed by highest level nodes (root node and nodes close to it or its children) nodes on this layer are characterized by their stability, i.e., directory tables are rarely changedthey may represent organizations, groups of organizations, ..., where names are stored in the name space

administrational layergroups of entities that belong to the same organization or administrational unit, e.g., departmentsrelatively stable

managerial layernodes that may change regularly, e.g., nodes representing hosts of a LAN, shared files such as libraries or binaries, …nodes are managed not only by system administrators, but also by end users

33

an example partitioning of the DNS name space, including Internet-accessible files, into three layers

34

the name space is divided into nonoverlapping parts, called zones in DNSa zone is a part of the name space that is implemented by a separate name serversome requirements of servers at different layers: performance(responsiveness to lookups), availability (failure rate), etc.

high availability is critical for the global layer, since name resolution cannot proceed beyond the failing server; it is also important at the administrational layer for clients in the same organizationperformance is very important in the lowest layer, since results of lookups can be cached and used due to the relative stability of the higher layersthey may be enhanced by client side caching (for global and administrational layers since names do not change often) and replication; they create implementation problems since they may introduce inconsistency (see Chapter 7)

35

a comparison between name servers for implementing nodes from a large-scale name space partitioned into a global layer, an administrational

layer, and a managerial layer

Item Global Administrational Managerial

Geographical scale of network Worldwide Organization Department

Total number of nodes Few Many Vast numbers

Responsiveness to lookups Seconds Milliseconds Immediate

Update propagation Lazy Immediate Immediate

Availability requirement Very High High low

Number of replicas Many None or few None

Is client-side caching applied? Yes Yes Sometimes

36

Implementation of Name Resolutionrecall that name resolution consists of finding the addresswhen the name is givenassume that name servers are not replicated and that no client-side caches are allowedeach client has access to a local name resolver, responsible for ensuring that the name resolution process is carried oute.g., assume the path name

root:<nl, vu, cs, ftp, pub, globe, index.txt>is to be resolvedor using a URL notation, this path name would correspond

to ftp://ftp.cs.vu.nl/pub/globe/index.txt

37

a host that needs to map a name to an address calls a DNS client named a resolver (and provides it the name to be resolved - ftp.cs.vu.nl)the resolver accesses the closest DNS server with a mapping requestif the server has the information it satisfies the resolver; otherwise, it either refers the resolver to other servers (called Iterative Resolution) or asks other servers to provide it with the information (called Recursive Resolution)Iterative Resolution

a name resolver hands over the complete name to the root name server the root name server will resolve the name as far as it can and return the result to the client; at the minimum it can resolve the first level and sends the name of the first level name server to the client the client calls the first level name server, then the second, ..., until it finds the address of the entity

38

the principle of iterative name resolution

39

Recursive Resolutiona name resolver hands over the whole name to the root name server the root name server will try to resolve the name and if it can’t, it requests the first level name server to resolve it and to return the addressthe first level will do the same thing recursively

the principle of recursive name resolution

40

Advantages and drawbacksrecursive name resolution puts a higher performance demand on each name server; hence name servers in the global layer support only iterative name resolutioncaching is more effective with recursive name resolution

each name server gradually learns the address of each name server responsible for implementing lower-level nodeseventually lookup operations can be handled efficiently

41

recursive name resolution of <nl, vu, cs, ftp>; name servers cache intermediate results for subsequent lookups

Server for node

Should resolve

Looks up

Passes to child

Receives and caches

Returns to requester

cs <ftp> #<ftp> -- --#<ftp>

#<cs>#<cs,ftp>

#<vu>#<vu,cs>#<vu,cs,ftp>

#<ftp>vu <cs,ftp> #<cs> <ftp> #<cs>

#<cs, ftp>nl <vu,cs,ftp> #<vu> <cs,ftp> #<vu>

#<vu,cs>#<vu,cs,ftp>

root <nl,vu,cs,ftp> #<nl> <vu,cs,ftp> #<nl>#<nl,vu>#<nl,vu,cs>#<nl,vu,cs,ftp>

42

the comparison between recursive and iterative name resolution with respect to communication costs; assume the client is in Ethiopia and

the name servers in the Netherlands

communication costs may be reduced in recursive name resolution

SummaryMethod AdvantagesRecursive Less Communication cost; Caching is more effectiveIterative Less performance demand on name servers

43

Example - The Domain Name System (DNS)one of the largest distributed naming services is the Internet DNSit is used for looking up host addresses and mail servershierarchical, defined in an inverted tree structure with the root at the topthe tree can have only 128 levels

44

Labeleach node has a label, a string with a maximum of 63 characters (case insensitive)the root label is null (has no label)children of a node must have different names (to guarantee uniqueness)

Domain Nameeach node has a domain name; it is a path name to its root nodea full domain name is a sequence of labels separated by dots (the last character is a dot)domain names are read from the node up to the rootfull path names must not exceed 255 characters

45

the contents of a node is formed by a collection of resource records; the important ones are the following

Type of record

Associated entity Description

SOA (start of authority) Zone Holds information on the represented zone, such as an

e-mail address of the system administratorA (address) Host Contains an IP address of the host this node representsMX (mail exchange) Domain Refers to a mail server to handle mail addressed to this

node; it is a symbolic link; e.g. name of a mail serverSRV Domain Refers to a server handling a specific serviceNS (name server) Zone Refers to a name server that implements the

represented zoneCNAME Node Contains the canonical name of a host; an alias

PTR (pointer) Host Symbolic link with the primary name of the represented node; for mapping an IP address to a name

HINFO (host info) Host Holds information on the host this node represents;

such as machine type and OS

TXT Any kind Contains any entity-specific information considered useful; cannot be automatically processed

46an excerpt from the DNS database for the zone cs.vu.nl

cs.vu.nl represents the domain as well as the zone; it has 4 name servers (ns, star, top, solo) and 3 mail serversname server for this zone with 2 network addresses (star)mail servers; the numbers preceding the name show priorities; first the one with the lowest number is tried

47an excerpt from the DNS database for the zone cs.vu.nl, cont’d

a Web server and an FTP server, implemented by a single machine (soling)older server clusters (vucs-das1)two printers (inkt and pen) with a local address; i.e., they cannot be accessed from outside

48

part of the description for the vu.nl domain which contains the cs.vu.nl domain

cs.vu.nl is implemented as a single zonehence, the records in the previous slides do not include references to other zonesnodes in a subdomain that are implemented in a different zone are specified by giving the domain name and IP address

49

5.4 Attribute-Based Naming

flat naming: provides a unique and location-independent way of referring to entitiesstructured naming: also provides a unique and location-independent way of referring to entities as well as human-friendly namesbut both do not allow searching entities by giving a description of an entityin attribute-based naming, each entity is assumed to have a collection of attributes that say something about the entitythen a user can search an entity by specifying (attribute, value) pairs known as attribute-based namingDirectory Services

attribute-based naming systems are also called directory services whereas systems that support structured naming are called naming systems

50

how are resources described? one possibility is to use RDF(Resource Description Framework) that uses triplets consisting of a subject, a predicate, and an objecte.g., (person, name, Alice) to describe a resource Personwhose Name is Aliceor in e-mail systems, we can use sender, recipient, subject, etc. for searching

Hierarchical Implementations: LDAPdistributed directory services are implemented by combining structured naming with attribute-based naminge.g., Microsoft’s Active Directory servicesuch systems rely on the lightweight directory accessprotocol or LDAP which is derived from OSI’s X.500 directory servicea LDAP directory service consists of a number of records called directory entries (attribute, value) pairs, similar to a resource record in DNS; could be single- or multiple-valued (e.g., Mail_Servers on next slide)

51

a simple example of an LDAP directory entry using LDAP naming conventions to identify the network addresses of some servers

Attribute Abbr. ValueCountry C NLLocality L AmsterdamOrganization O Vrije UniversiteitOrganizationalUnit OU Comp. Sc.CommonName CN Main serverMail_Servers -- 137.37.20.3, 130.37.24.6,137.37.20.10FTP_Server -- 130.37.20.20WWW_Server -- 130.37.20.20

52

the collection of all directory entries is called a DirectoryInformation Base (DIB) each record is uniquely named so that it can be looked upeach naming attribute is called a Relative DistinguishedName (RDN); the first 5 entries abovea globally unique name is formed using abbreviations of naming attributes, e.g.,/C=NL/O=Vrije Universiteit/OU=Comp. Sc.this is similar to the DNS name nl.vu.cslisting RDNs in sequence leads to a hierarchy of the collection of directory entries, called a DirectoryInformation Tree (DIT)a DIT forms the naming graph of an LDAP directory service where each node represents a directory entry

53

part of the directory information tree

node N corresponds to the directory entry shown earlier; it also acts as a parent of other directory entries that have an additional attribute, Host_Name; such entries may be used to represent hosts

54

two directory entries having Host_Name as RDN

Attribute Value Attribute Value

Country NL Country NL

Locality Amsterdam Locality Amsterdam

Organization Vrije Universiteit Organization Vrije Universiteit

OrganizationalUnit Comp. Sc. OrganizationalUnit Comp. Sc.

CommonName Main server CommonName Main server

Host_Name star Host_Name zephyr

Host_Address 192.31.231.42 Host_Address 137.37.20.10

read pages 221 - 226 about Decentralized Implementations

Chapter 5 naming

Devices & Hardware

Transcript of Chapter 5 naming