Chapter 5 naming

Click here to load reader

Embed Size (px)

description

jhlkj;l; jhkjk

Transcript of Chapter 5 naming

  • Chapter 5 - Naming
  • Introduction names play an important role to: share resources uniquely identify entities refer to locations etc. an important issue is that a name can be resolved to the entity it refers to to resolve names, it is necessary to implement a naming system in a distributed system, the implementation of a naming system is itself often distributed, unlike in nondistributed systems efficiency and scalability of the naming system are the main issues 2
  • Objectives of the Chapter we will discuss some general issues in naming how human-friendly names are organized and implemented; e.g., those for file systems and the WWW; classes of naming systems flat naming structured naming, and attribute-based naming 3
  • 5.1 Names, Identifiers, and Addresses a name in a distributed system is a string of bits or characters that is used to refer to an entity an entity is anything; e.g., resources such as hosts, printers, disks, files, objects, processes, users, Web pages, newsgroups, mailboxes, network connections, ... entities can be operated on e.g., a resource such as a printer offers an interface containing operations for printing a document, requesting the status of a job, etc. a network connection may provide operations for sending and receiving data, setting quality of service parameters, etc. to operate on an entity, it is necessary to access it through its access point, itself an entity (special) 4
  • access point the name of an access point is called an address (such as IP address and port number as used by the transport layer) the address of the access point of an entity is also referred to as the address of the entity an entity can have more than one access point (similar to accessing an individual through different telephone numbers) an entity may change its access point in the course of time (e.g., a mobile computer getting a new IP address as it moves) 5
  • an address is a special kind of name it refers to at most one entity each entity is referred by at most one address; even when replicated such as in Web pages an entity may change an access point, or an access point may be reassigned to a different entity (like telephone numbers in offices) separating the name of an entity and its address makes it easier and more flexible; such a name is called location independent there are also other types of names that uniquely identify an entity; in any case a true identifier is a name with the following properties it refers to at most one entity each entity is referred by at most one identifier it always refers to the same entity (never reused) identifiers allow us to unambiguously refer to an entity 6
  • examples name of an FTP server (entity) URL of the FTP server address of the FTP server IP number:port number the address of the FTP server may change there are three classes on naming systems: flat naming, structured naming, and attribute-based naming 7
  • 5.2 Flat Naming a name is a sequence of characters without structure; like human names? may be if it is not an Ethiopian name! difficult to be used in a large system since it must be centrally controlled to avoid duplication moreover, it does not contain any information on how to locate the access point of its associated entity how are flat names resolved (or how to locate an entity when a flat name is given) name resolution: mapping a name to an address or an address to a name is called name-address resolution possible solutions: simple solutions, home-based approaches, and hierarchical approaches 8
  • 1. Simple Solutions two solutions (for LANs only): Broadcasting and Multicasting, and Forwarding Pointers a. Broadcasting and Multicasting broadcast a message containing the identifier of an entity; only machines that can offer an access point for the entity send a reply e.g., ARP (Address Resolution Protocol) in the Internet to find the data link address (MAC address) of a machine a computer that wants to access another computer for which it knows its IP address broadcasts this address the owner responds by sending its Ethernet address broadcasting is inefficient when the network grows (wastage of bandwidth and too much interruption to other machines) multicasting is better when the network grows - send only to a restricted group of hosts 9
  • multicasting can also be used to locate the nearest replica - choose the one whose reply comes in first b. Forwarding Pointers how to look for mobile entities when an entity moves from A to B, it leaves behind a reference to its new location advantage simple: as soon as the first name is located using traditional naming service, the chain of forwarding pointers can be used to find the current address drawbacks the chain can be too long - locating becomes expensive all the intermediary locations in a chain have to maintain their pointers vulnerability if links are broken hence, making sure that chains are short and that forwarding pointers are robust is an important issue 10
  • 2. Home-Based Approaches broadcasting and multicasting have scalability problems; performance and broken links are problems in forwarding pointers a home location keeps track of the current location of an entity; often it is the place where an entity was created it is a two-tiered approach an example where it is used in Mobile IP each mobile host uses a fixed IP address all communication to that IP address is initially directly sent to the hosts home agent located on the LAN corresponding to the network address contained in the mobile hosts IP address whenever the mobile host moves to another network, it requests a temporary address in the new network (called care-of-address) and informs the new address to the home agent 11
  • when the home agent receives a message for the mobile host (from a correspondent agent) it forwards it to its new address (if it has moved) and also informs the sender the hosts current location for sending other packets home-based approach: the principle of Mobile IP 12
  • problems: creates communication latency (Triangle routing: correspondent-home network-mobile) the home location must always exist; the host is unreachable if the home does no more exist (permanently changed); the solution is to register the home at a traditional name service and let a client first look up the location of the home 13
  • 3. Hierarchical Approaches a generalization of the two-tiered approach into multiple layers a network is divided into a collection of domains, similar to DNS a single top-level domain spans the entire network each domain can be subdivided into multiple, smaller domains the lowest-level domain is called a leaf domain; typically a LAN each domain D has an associated directory node dir(D) that keeps track of the entities in that domain leading to a tree of directory nodes the root (directory) node knows about all entities 14
  • hierarchical organization of a location service into domains, each having an associated directory node 15
  • each entity is represented by a location record in the directory node dir(D) to keep track of its whereabouts a location record for an entity in a leaf domain contains the entitys current address; all other high-level domains will have only pointers to this address; this means the root node will store only pointers to all entities an entity may have multiple addresses, for instance, if it is replicated; a higher level domain containing the two subdomains where the entity has addresses will have two pointers 16
  • an example of storing information of an entity having two addresses in different leaf domains D1 and D2 17
  • example of a look up operation a client (in Domain D) would like to locate an entity E looking up a location in a hierarchically organized location service 18
  • update operations (i.e., inserting and deleting addresses) read pages 194 - 195) in addition to the three methods discussed so far (simple solutions, home-based approaches, and hierarchical approaches), another approach for resolution in flat naming is Distributed Hash Tables (DHT) read pages 188 - 191 19
  • 5.3 Structured Naming flat names are not convenient for humans Name Spaces names are organized into a name space each name is made of several parts; the first may define the nature of the organization, the second the name, the third departments, ... the authority to assign and control the name spaces can be decentralized where a central authority assigns only the first two parts a name space is generally organized as a labeled, directed graph with two types of nodes leaf node: represents the named entity and stores information such as its address or the state of that entity directory node: a special entity that has a number of outgoing edges, each labeled with a name each node in a naming graph is considered as another entity with an identifier 20
  • a general naming graph with a single root node, no a directory node stores a table in which an outgoing edge is represented as a pair (edge label, node identifier), called a directory table each path in a naming graph can be referred to by the sequence of labels corresponding to the edges of the path and the first node in the path, such as N:, where N refers to the first node in the path 21
  • such a sequence is called a path name if the first node is the root of the naming graph, it is called an absolute path name; otherwise it is a relative path name instead of the path name n0:, we often use its string representation /home/steen/mbox there may also be several paths leading to the same node, e.g., node n5 can be represented as /keys or /home/steen/keys although the above naming graph is directed acyclic graph (a node can have more than one incoming edge but is not permitted to have a cycle), the common way is to use a tree (hierarchical) with a single root (as is used in file systems) in a tree structure, each node except the root has exactly one incoming edge; the root has no incoming edges each node also has exactly one associated (absolute) path name 22
  • e.g., file naming in UNIX file system a directory node represents a directory and a leaf node represents a file there is a single root directory, represented in the naming graph by the root node we have a contiguous series of blocks from a logical disk the boot block is used to load the operating system the superblock contains information on the entire file system such as its size, etc. inodes are referred to by an index number, starting at number zero, which is for the inode representing the root directory given the index number of an inode, it is possible to access its associated file 23
  • Name Resolution given a path name, the process of looking up a name stored in the node is referred to as name resolution; it consists of finding the address when the name is given (by following the path) knowing how and where to start name resolution is referred to as closure mechanism; e.g., UNIX file system Linking and Mounting Linking: giving another name for the same entity (an alias) e.g., environment variables in UNIX such as HOME that refer to the home directory of a user two types of links (or two ways to implement an alias): hard link and symbolic link hard link: to allow multiple absolute path names to refer to the same node in a naming graph e.g., in the previous graph, there are two different path names for node n5: /keys and /home/steen/keys 24
  • symbolic link: representing an entity by a leaf node and instead of storing the address or state of the entity, the node stores an absolute path name the concept of a symbolic link explained in a naming graph when first resolving an absolute path name stored in a node (e.g., /home/steen/keys in node n6), name resolution will return the path name stored in the node (/keys), at which point it can continue with resolving that new path name, i.e., closure mechanism 25
  • so far name resolution was discussed as taking place within a single name space name resolution can also be used to merge different name spaces in a transparent way the solution is to use mounting Mounting as an example, consider a mounted file system, which can be generalized to other name spaces as well let a directory node store the directory node from a different (foreign) name space the directory node storing the node identifier is called a mount point the directory node in the foreign name space is called a mounting point, normally the root of a name space during name resolution, the mounting point is looked up and resolution proceeds by accessing its directory table 26
  • consider a collection of name spaces distributed across different machines (each name space implemented by a different server) to mount a foreign name space in a distributed system, the following are at least required the name of an access protocol (for communication) the name of the server the name of the mounting point in the foreign name space each of these names needs to be resolved to the implementation of the protocol so that communication can take place properly to an address where the server can be reached to a node identifier in the foreign name space (to be resolved by the server of the foreign name space) the three names can be listed as a URL 27
  • example: Suns Network File System (NFS) is a distributed file system with a protocol that describes how a client can access a file stored on a (remote) NFS file server an NFS URL may look like nfs://flits.cs.vu.nl/home/steen - nfs is an implementation of a protocol - flits.cs.vu.nl is a server name to be resolved using DNS - /home/steen is resolved by the foreign server e.g., the subdirectory /remote includes mount points for foreign name spaces on the client machine a directory node named /remote/vu is used to store nfs://flits.cs.vu.nl/home/steen consider /remote/vu/mbox this name is resolved by starting at the root directory on the clients machine until node /remote/vu, which returns the URL nfs://flits.cs.vu.nl/home/steen this leads the client machine to contact flits.cs.vu.nl using the NFS protocol then the file mbox is read in the directory /home/steen 28
  • mount point mounting point mounting remote name spaces through a specific process protocol 29
  • distributed systems that allow mounting a remote file system also allow to execute some commands example commands to access the file system cd /remote/vu /*changing directory on the remote machine ls -l /*listing the files on the remote machine by doing so the user is not supposed to worry about the details of the actual access; the name space on the local machine and that on the remote machine look to form a single name space 30
  • The Implementation of a Name Space a name space forms the heart of a naming service a naming service allows users and processes to add, remove, and lookup names a naming service is implemented by name servers for a distributed system on a single LAN, a single server might suffice; for a large-scale distributed system the implementation of a name space is distributed over multiple name servers Name Space Distribution in large scale distributed systems, it is necessary to distribute the name service over multiple name servers, usually organized hierarchically a name service can be partitioned into logical layers the following three layers can be distinguished (according to Cheriton and Mann) 31
  • global layer formed by highest level nodes (root node and nodes close to it or its children) nodes on this layer are characterized by their stability, i.e., directory tables are rarely changed they may represent organizations, groups of organizations, ..., where names are stored in the name space administrational layer groups of entities that belong to the same organization or administrational unit, e.g., departments relatively stable managerial layer nodes that may change regularly, e.g., nodes representing hosts of a LAN, shared files such as libraries or binaries, nodes are managed not only by system administrators, but also by end users 32
  • an example partitioning of the DNS name space, including Internetaccessible files, into three layers 33
  • the name space is divided into nonoverlapping parts, called zones in DNS a zone is a part of the name space that is implemented by a separate name server some requirements of servers at different layers: performance (responsiveness to lookups), availability (failure rate), etc. high availability is critical for the global layer, since name resolution cannot proceed beyond the failing server; it is also important at the administrational layer for clients in the same organization performance is very important in the lowest layer, since results of lookups can be cached and used due to the relative stability of the higher layers they may be enhanced by client side caching (for global and administrational layers since names do not change often) and replication; they create implementation problems since they may introduce inconsistency (see Chapter 7) 34
  • Item Global Administrational Managerial Geographical scale of network Worldwide Organization Department Total number of nodes Few Many Vast numbers Responsiveness to lookups Seconds Milliseconds Immediate Update propagation Lazy Immediate Immediate Availability requirement Very High High low Number of replicas Many None or few None Yes Sometimes Is client-side caching applied? Yes a comparison between name servers for implementing nodes from a largescale name space partitioned into a global layer, an administrational layer, and a managerial layer 35
  • Implementation of Name Resolution recall that name resolution consists of finding the address when the name is given assume that name servers are not replicated and that no client-side caches are allowed each client has access to a local name resolver, responsible for ensuring that the name resolution process is carried out e.g., assume the path name root: is to be resolved or using a URL notation, this path name would correspond to ftp://ftp.cs.vu.nl/pub/globe/index.txt 36
  • a host that needs to map a name to an address calls a DNS client named a resolver (and provides it the name to be resolved - ftp.cs.vu.nl) the resolver accesses the closest DNS server with a mapping request if the server has the information it satisfies the resolver; otherwise, it either refers the resolver to other servers (called Iterative Resolution) or asks other servers to provide it with the information (called Recursive Resolution) Iterative Resolution a name resolver hands over the complete name to the root name server the root name server will resolve the name as far as it can and return the result to the client; at the minimum it can resolve the first level and sends the name of the first level name server to the client the client calls the first level name server, then the second, ..., until it finds the address of the entity 37
  • the principle of iterative name resolution 38
  • Recursive Resolution a name resolver hands over the whole name to the root name server the root name server will try to resolve the name and if it cant, it requests the first level name server to resolve it and to return the address the first level will do the same thing recursively the principle of recursive name resolution 39
  • Advantages and drawbacks recursive name resolution puts a higher performance demand on each name server; hence name servers in the global layer support only iterative name resolution caching is more effective with recursive name resolution each name server gradually learns the address of each name server responsible for implementing lower-level nodes eventually lookup operations can be handled efficiently 40
  • Server for node Should resolve Looks up Passes to child Receives and caches Returns to requester cs vu # # - -# nl # # # root # # # # # # # # # # # # # # recursive name resolution of ; name servers cache intermediate results for subsequent lookups 41
  • communication costs may be reduced in recursive name resolution the comparison between recursive and iterative name resolution with respect to communication costs; assume the client is in Ethiopia and the name servers in the Netherlands Summary Method Advantages Recursive Less Communication cost; Caching is more effective Iterative Less performance demand on name servers 42
  • Example - The Domain Name System (DNS) one of the largest distributed naming services is the Internet DNS it is used for looking up host addresses and mail servers hierarchical, defined in an inverted tree structure with the root at the top the tree can have only 128 levels 43
  • Label each node has a label, a string with a maximum of 63 characters (case insensitive) the root label is null (has no label) children of a node must have different names (to guarantee uniqueness) Domain Name each node has a domain name; it is a path name to its root node a full domain name is a sequence of labels separated by dots (the last character is a dot) domain names are read from the node up to the root full path names must not exceed 255 characters 44
  • the contents of a node is formed by a collection of resource records; the important ones are the following Type of record Associated Description entity SOA (start of authority) Zone Holds information on the represented zone, such as an e-mail address of the system administrator A (address) Host Contains an IP address of the host this node represents MX (mail exchange) Domain Refers to a mail server to handle mail addressed to this node; it is a symbolic link; e.g. name of a mail server SRV Domain Refers to a server handling a specific service NS (name server) Zone Refers to a name server that implements the represented zone CNAME Node Contains the canonical name of a host; an alias PTR (pointer) Host Symbolic link with the primary name of the represented node; for mapping an IP address to a name HINFO (host info) Host Holds information on the host this node represents; such as machine type and OS TXT Any kind Contains any entity-specific information considered useful; cannot be automatically processed 45
  • cs.vu.nl represents the domain as well as the zone; it has 4 name servers (ns, star, top, solo) and 3 mail servers name server for this zone with 2 network addresses (star) mail servers; the numbers preceding the name show priorities; first the one with the lowest number is tried an excerpt from the DNS database for the zone cs.vu.nl 46
  • a Web server and an FTP server, implemented by a single machine (soling) older server clusters (vucs-das1) two printers (inkt and pen) with a local address; i.e., they cannot be accessed from outside an excerpt from the DNS database for the zone cs.vu.nl, contd 47
  • cs.vu.nl is implemented as a single zone hence, the records in the previous slides do not include references to other zones nodes in a subdomain that are implemented in a different zone are specified by giving the domain name and IP address part of the description for the vu.nl domain which contains the cs.vu.nl domain 48
  • 5.4 Attribute-Based Naming flat naming: provides a unique and location-independent way of referring to entities structured naming: also provides a unique and locationindependent way of referring to entities as well as humanfriendly names but both do not allow searching entities by giving a description of an entity in attribute-based naming, each entity is assumed to have a collection of attributes that say something about the entity then a user can search an entity by specifying (attribute, value) pairs known as attribute-based naming Directory Services attribute-based naming systems are also called directory services whereas systems that support structured naming are called naming systems 49
  • how are resources described? one possibility is to use RDF (Resource Description Framework) that uses triplets consisting of a subject, a predicate, and an object e.g., (person, name, Alice) to describe a resource Person whose Name is Alice or in e-mail systems, we can use sender, recipient, subject, etc. for searching Hierarchical Implementations: LDAP distributed directory services are implemented by combining structured naming with attribute-based naming e.g., Microsofts Active Directory service such systems rely on the lightweight directory access protocol or LDAP which is derived from OSIs X.500 directory service a LDAP directory service consists of a number of records called directory entries (attribute, value) pairs, similar to a resource record in DNS; could be single- or multiple-valued (e.g., Mail_Servers on next slide) 50
  • Attribute Abbr. Value Country C NL Locality L Amsterdam Organization O Vrije Universiteit OrganizationalUnit OU Comp. Sc. CommonName CN Main server Mail_Servers -- 137.37.20.3, 130.37.24.6,137.37.20.10 FTP_Server -- 130.37.20.20 WWW_Server -- 130.37.20.20 a simple example of an LDAP directory entry using LDAP naming conventions to identify the network addresses of some servers 51
  • the collection of all directory entries is called a Directory Information Base (DIB) each record is uniquely named so that it can be looked up each naming attribute is called a Relative Distinguished Name (RDN); the first 5 entries above a globally unique name is formed using abbreviations of naming attributes, e.g., /C=NL/O=Vrije Universiteit/OU=Comp. Sc. this is similar to the DNS name nl.vu.cs listing RDNs in sequence leads to a hierarchy of the collection of directory entries, called a Directory Information Tree (DIT) a DIT forms the naming graph of an LDAP directory service where each node represents a directory entry 52
  • node N corresponds to the directory entry shown earlier; it also acts as a parent of other directory entries that have an additional attribute, Host_Name; such entries may be used to represent hosts part of the directory information tree 53
  • Attribute Value Attribute Value Country NL Country NL Locality Amsterdam Locality Amsterdam Organization Vrije Universiteit Organization Vrije Universiteit OrganizationalUnit Comp. Sc. OrganizationalUnit Comp. Sc. CommonName Main server CommonName Main server Host_Name star Host_Name zephyr Host_Address 192.31.231.42 Host_Address 137.37.20.10 two directory entries having Host_Name as RDN read pages 221 - 226 about Decentralized Implementations 54