Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage...

93
VPFS 2003 DCNDS Group 3 Page 1 of 93 Executive Summary Introduction This document specifies the characteristics of VPFS (Virtual Policy-driven File Store) system which is a distributed file system in a GRID scenario and provides a reliable file store and retrieval facility. It provides transparency as it hides from the user the details of the actual location of where their files are stored by presenting multiple physically distributed storage devices as a single local peer. It utilises Sun Microsystem’s JXTA architecture, thus it is platform independent and easily expandable. Existing nodes can search and discover new ones that might have dynamically entered into the network. All messages and configuration data are manipulated and stored in XML structures. The system has been divided into five major components, each of which performs a particular service that is required and may run on any of the participating peers: a. Communication module: handles the low-level communication of the peers via peer- to-peer messages or broadcasted advertisements. b. File Manipulation module: handles the transfer and storage of files within the VPFS network. c. VFAT module: manages the details of individual files, their storage location and provides mapping from file path name to physical storage location. d. User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between the modules of different peers. One of the fundamental features of the design of VPFS, is that it enables users to specify the characteristics of the type of storage for particular files, thus giving them more control over the way the files are stored. These characteristics are specified using well-defined storage policies based on which the system selects the locations on which to store the files. Due to time constraints, only one policy, namely replication, has been implemented, but others can fairly easily be added as the framework already exists. One serious drawback with the system is that the files stored have to be retrieved locally before being manipulated and no method for remote access has been implemented as was required by the client. It thus works like an ftp application which makes it behave more like an archive system, as opposed to a file store. Project Methodology The project has been approached in two main phases: Analysis of Requirements: During regular meetings with the supervisor, the requirements of the end-system were analysed, documented and agreed upon. Design/Implementation/Testing: Based on the requirements decided during the previous phase, the team followed the iterative approach of design, implementation and testing of the individual system components. At the end of each iteration the new component would be integrated to the system which would be tested as a whole.

Transcript of Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage...

Page 1: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 1 of 93

Executive Summary

Introduction This document specifies the characteristics of VPFS (Virtual Policy-driven File Store) system which is a distributed file system in a GRID scenario and provides a reliable file store and retrieval facility. It provides transparency as it hides from the user the details of the actual location of where their files are stored by presenting multiple physically distributed storage devices as a single local peer. It utilises Sun Microsystem’s JXTA architecture, thus it is platform independent and easily expandable. Existing nodes can search and discover new ones that might have dynamically entered into the network. All messages and configuration data are manipulated and stored in XML structures. The system has been divided into five major components, each of which performs a particular service that is required and may run on any of the participating peers:

a. Communication module: handles the low-level communication of the peers via peer-to-peer messages or broadcasted advertisements.

b. File Manipulation module: handles the transfer and storage of files within the VPFS network.

c. VFAT module: manages the details of individual files, their storage location and provides mapping from file path name to physical storage location.

d. User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between the modules of different

peers. One of the fundamental features of the design of VPFS, is that it enables users to specify the characteristics of the type of storage for particular files, thus giving them more control over the way the files are stored. These characteristics are specified using well-defined storage policies based on which the system selects the locations on which to store the files. Due to time constraints, only one policy, namely replication, has been implemented, but others can fairly easily be added as the framework already exists. One serious drawback with the system is that the files stored have to be retrieved locally before being manipulated and no method for remote access has been implemented as was required by the client. It thus works like an ftp application which makes it behave more like an archive system, as opposed to a file store.

Project Methodology The project has been approached in two main phases:

• Analysis of Requirements: During regular meetings with the supervisor, the requirements of the end-system were analysed, documented and agreed upon.

• Design/Implementation/Testing: Based on the requirements decided during the previous phase, the team followed the iterative approach of design, implementation and testing of the individual system components. At the end of each iteration the new component would be integrated to the system which would be tested as a whole.

Page 2: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 2 of 93

Recommendations and Conclusions As is discussed in the document, the end-product is a system that can be used as a standalone tool, which nevertheless, with the implementation of an API can interface with existing applications. It is well suited for collaborative environments with low central administration where peers dynamically enter and exit. Its drawback of offering a similar service to ftp, can be overcome by implementing the read/write API that has been designed.

Page 3: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 3 of 93

Table of Contents EXECUTIVE SUMMARY ..................................................................................................................................... 1

INTRODUCTION....................................................................................................................................................... 1 PROJECT METHODOLOGY ...................................................................................................................................... 1 RECOMMENDATIONS AND CONCLUSIONS.............................................................................................................. 2

TABLE OF CONTENTS ........................................................................................................................................ 3 FIGURES ................................................................................................................................................................. 4 TABLES................................................................................................................................................................... 5

CHAPTER 1: INTRODUCTION .......................................................................................................................... 6 1.1. PURPOSE.......................................................................................................................................................... 6 1.2. CLIENT BACKGROUND .................................................................................................................................... 6 1.3. SCOPE.............................................................................................................................................................. 6 1.4. MOTIVATION OF OUR WORK .......................................................................................................................... 7 1.5. ACKNOWLEDGEMENT OF PREVIOUS WORK.................................................................................................... 7 1.6. OVERVIEW....................................................................................................................................................... 7

CHAPTER 2: BACKGROUND ............................................................................................................................. 9 2.1. INTRODUCTION................................................................................................................................................ 9 2.2. FILE SYSTEMS ................................................................................................................................................. 9 2.3. GRID ENVIRONMENT..................................................................................................................................... 10 2.4. PEER-TO-PEER ARCHITECTURE (P2P) .......................................................................................................... 11 2.5. EXISTING DISTRIBUTED FILE SYSTEMS ........................................................................................................ 12

CHAPTER 3: OBJECTIVES OF VPFS ............................................................................................................. 15 3.1. INTRODUCTION.............................................................................................................................................. 15 3.2. OBJECTIVES................................................................................................................................................... 15

CHAPTER 4: CLIENT’S SYSTEM .................................................................................................................... 17 4.1. INTRODUCTION.............................................................................................................................................. 17 4.2. OVERALL SYSTEM......................................................................................................................................... 17 4.3. SYSTEM AT UCL PREMISES........................................................................................................................... 18

CHAPTER 5: JXTA FRAMEWORK ................................................................................................................. 20 5.1. PEER-TO-PEER FRAMEWORK ......................................................................................................................... 20 5.2. JXTA............................................................................................................................................................. 20 5.3. FEATURES OF THE JXTA PLATFORM ............................................................................................................ 21

CHAPTER 6: REQUIREMENTS SPECIFICATION ...................................................................................... 27 6.1. INTRODUCTION.............................................................................................................................................. 27 6.2. ACTORS ......................................................................................................................................................... 27 6.3. GROUPS ......................................................................................................................................................... 29 6.4. FILE OPERATIONS ......................................................................................................................................... 29 6.5. FILE AND DIRECTORY PROPERTIES............................................................................................................... 30 6.6. DIRECTORY STRUCTURE ...............................................................................................................................31 6.7. SECURITY ISSUES .......................................................................................................................................... 31 6.8. USER INTERFACE........................................................................................................................................... 31 6.9. NON-FUNCTIONAL REQUIREMENTS.............................................................................................................. 31 6.10. PROFILE DEFINITIONS ................................................................................................................................. 32

CHAPTER 7: SYSTEM ARCHITECTURE ...................................................................................................... 34 7.1. INTRODUCTION.............................................................................................................................................. 34 7.2. MODULES ...................................................................................................................................................... 34 7.3. OVERVIEW OF THE SYSTEM .......................................................................................................................... 36 7.4. STRUCTURE OF A PEER.................................................................................................................................. 38

Page 4: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 4 of 93

7.5. DEPLOYMENT PEER GROUPS AND COMMUNICATION BETWEEN PEERS....................................................... 39 CHAPTER 8: SYSTEM DESIGN........................................................................................................................ 40

8.1. FILE MANIPULATION SERVICE MODULE ...................................................................................................... 40 8.2. VIRTUAL FILE ALLOCATION TABLE SERVICE MODULE (VFAT)................................................................. 44 8.3. USER PROFILE SERVICE MODULE (UPM)..................................................................................................... 54 8.4. VPFS ACCESS MODULE................................................................................................................................ 61 8.5. VPFS INITIALISATION................................................................................................................................... 64 8.6. PEER COMMUNICATION MODULE................................................................................................................. 66

CHAPTER 9: TEST PLAN .................................................................................................................................. 80 9.1. INTRODUCTION.............................................................................................................................................. 80 9.2. STRATEGY ..................................................................................................................................................... 80 9.3. TESTING ENVIRONMENTS ............................................................................................................................. 82

CHAPTER 10: EVALUATIONS AND FUTURE DEVELOPMENTS .......................................................... 83 10.1. INTRODUCTION............................................................................................................................................ 83 10.2. COMPARISONS WITH PREVIOUS WORK....................................................................................................... 83 10.3. OBJECTIVES MET ........................................................................................................................................ 85 10.4. FUTURE ENHANCEMENTS ........................................................................................................................... 87

CHAPTER 11: CONCLUSIONS ......................................................................................................................... 89 REFERENCES....................................................................................................................................................... 92

Figures Figure 2-1: An Example Hierarchical File System....................................................................9 Figure 2-2: Client-Server Model..............................................................................................11 Figure 2-3: P2P Model.............................................................................................................12 Figure 2-4: SRB Architecture ..................................................................................................13 Figure 4-1: GenTHREADER System......................................................................................17 Figure 4-2: GenTHREADER UCL Premises ..........................................................................18 Figure 5-1: Example of a JXTA Service Advertisement .........................................................24 Figure 6-1: Users Use Case Model ..........................................................................................28 Figure 6-2: Administrators Use Case Model ...........................................................................29 Figure 7-1: Layers of the System.............................................................................................37 Figure 7-2: Detailed Structure .................................................................................................38 Figure 7-3: Peer Structure........................................................................................................39 Figure 8-1: The FileManipulationService Interface ................................................................42 Figure 8-2: The FileStorage Class ...........................................................................................43 Figure 8-3: FMM Class Diagram.............................................................................................44 Figure 8-4: An Example VFAT Tree.......................................................................................46 Figure 8-5: Inode Class Diagram.............................................................................................47 Figure 8-6: A Fragment that is moved.....................................................................................48 Figure 8-7: Total inodes held by each directory ......................................................................49 Figure 8-8: Fragmentation of the VFAT tree stored on host A ...............................................49 Figure 8-9: A Remote Directory is created on host A after fragmentation .............................50 Figure 8-10: Closest Root Pathname Match Approach ...........................................................51 Figure 8-11: The VFATService Interface................................................................................52 Figure 8-12: The VFAT Service section of the Peer Profile ...................................................53 Figure 8-13: The basic structure of the VFAT.........................................................................53 Figure 8-14: VFAT Class Diagram..........................................................................................54 Figure 8-15: The User Profile tree ...........................................................................................55 Figure 8-16: The UserProfileService Interface........................................................................57

Page 5: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 5 of 93

Figure 8-17: Profiles Class diagram ........................................................................................58 Figure 8-18: User Profile Module class diagram.....................................................................59 Figure 8-19: The basic UPM structure.....................................................................................60 Figure 8-20: The UPM section of the Peer Profile ..................................................................60 Figure 8-21: Login Operation Sequence Diagram...................................................................61 Figure 8-22: "Put File" operation Sequence Diagram .............................................................63 Figure 8-23: Initialisation Module class diagram ....................................................................66 Figure 8-24: The QueryHandler Interface ...............................................................................67 Figure 8-25: Messages Class Diagram ....................................................................................67 Figure 8-26: FMM Request Class Diagram.............................................................................69 Figure 8-27: FMM Response Class Diagram ..........................................................................69 Figure 8-28: A GetFileRequest Message.................................................................................70 Figure 8-29: VFAT Request Class Diagram............................................................................71 Figure 8-30: VFAT Response Class Diagram .........................................................................72 Figure 8-31: A GetInodeRequest Message ..............................................................................72 Figure 8-32: UPM Request Class Diagram .............................................................................74 Figure 8-33: UPM Response Class Diagram...........................................................................75 Figure 8-34: A GetProfileRequest Message ............................................................................75 Figure 8-35: Advertisements Class Diagram...........................................................................76 Figure 8-36: AdvertisementHandler Class Diagram................................................................77 Figure 8-37: An FMM Advertisement.....................................................................................77 Figure 8-38: A VFAT Advertisement......................................................................................78 Figure 8-39: A UPM Advertisement........................................................................................79 Figure 11-1: VPFS in conjunction with GenTHREADER......................................................90 Figure 11-2: Consolidation of Data .........................................................................................91

Tables Table 2-1: Comparison between SRB and VPFS ....................................................................14 Table 6-1: VPFS Permissions ..................................................................................................30 Table 6-2: User Profile Contents .............................................................................................32 Table 6-3: Group Profile Contents...........................................................................................33 Table 6-4: Peer Profile .............................................................................................................33 Table 8-1: Replication Thresholds...........................................................................................41 Table 8-2: Clasest Path Matching ............................................................................................56

Page 6: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 6 of 93

Chapter 1: Introduction

1.1. Purpose The purpose of this report is to provide a description of the project carried out by Group 3. The report will detail the processes adopted and the decisions made by the group in order to create the necessary file store system that fulfils the requirements of our client. The report also incorporates background information of the technologies that have been used in order to familiarise the reader with them and thus have a better overall understanding of the project. The report outlines the objectives, the architecture and goes into a detailed design description of the overall system. The last chapters include the tests carried out, whether the objectives have been met and enhancements that the group considered to be useful for future implementations of the system.

1.2. Client Background Client: Stefano Street The client of the project is Stefano Street, a member of the research staff of the computer science department of UCL. Stefano is involved in a project known as e-Protein GenTHREADER. The GenTHREADER system holds specific information that aids research specialising in bioinformatics to understand the biological functions of gene products. The system is dispersed across three different universities sites: UCL, Imperial College and Cambridge. Stefano is responsible for the site at the UCL premises.

1.2.1. Client Problem Our client is managing a cluster of approximately 170 machines. Currently one of these machines holds a large database. The rest of the cluster retrieves data from the database machine, processes it and the results are sent back to this database machine. We can elicit here that firstly the database machine is overloaded since not only does it hold the database but also it stores the results. Secondly there is a poor utilisation of hard disk capacity on the rest of the machines.

1.3. Scope The system that we are building is named Virtual Policy-driven File Store (VPFS). Our task is to build this system on top of the client’s system with the goal of reducing the load of the database machine as well as improving the utilisation of hard disk capacity within the cluster. The VPFS will allow the processing machines to store the results on any of the other machines in the cluster, thus establishing a better utilisation of resources. The VPFS system expands upon the concept of distributed file systems to provide a reliable data store and retrieval facility among the machines of our client. But the scope of the project goes beyond that. It provides a system which aims to fulfil the requirement of improving the usability of a Grid infrastructure. One requirement as such is to provide the Grid infrastructure with a distributed file storage facility. Even though existing distributed file systems can be used in such an infrastructure, they do not provide a complete solution to Grid-related issues. For example, current distributed file systems are based on a client/server model. As such they are not capable of supporting dynamic connections and disconnections of resource providers, decentralised administration, or the ability to run over

Page 7: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 7 of 93

heterogeneous environments. The VPFS system aims to fill this gap by providing a distributed file system created with Grid scenarios in mind. The VPFS system is based on a peer-to-peer network model, which is ideal for environments with decentralised administration. The peer-to-peer model is capable of developing a dynamic environment where hosts can take the role of both client and server. They are also capable of connecting or disconnecting automatically without affecting the deployment of the system. In addition, replication of services can be provided within the system, which eliminates the possibility of a single point of failure. The VPFS system is designed to be platform independent in order to support Grid environments where multiple platforms are used. It utilises the Java programming language to establish platform independency. The VPFS system also provides a set of commands through its interface that allow users to manipulate their files and directories stored on the VPFS system. In addition, the commands have been enhanced with the extra parameters to allow users to specify storage policies for their files.

1.4. Motivation of Our Work The concept of Grid environments is new and hence the majority of technical services do not address Grid-like scenarios. One such technical service is the provision of a file store facility. In order to enhance Grid environments, we are developing a file store system which will place files on physical stores based on criteria defined by the user. Such criteria could be latency requirements, security needs, high-availability, etc. The complete system will encompass user commands enhanced with parameters that define policy specifications. These policy specification parameters will control the way in which the commands actually place files on physical storage. For example, if the user specifies a high availability policy for a file then the system will copy this file across a number of machines instead of just storing it just on one.

1.5. Acknowledgement of Previous Work This project is based on work carried out by students in a previous year. The system has been expanded with extra capabilities and better structured services in order to be robust. In addition, this project incorporates a client interested in the services of the system and as such, tries to meet his requirements. A comparison between the two systems, illustrating the limitations of the previous design, is given in the Evaluation chapter.

1.6. Overview The rest of the report is organised as follows:

• Chapter 2: Background This chapter includes background information on different technological aspects that were used in the project to help the reader with the overall understanding of the system.

• Chapter 3: Objectives of VPFS This chapter introduces the system and identifies the objectives that we are trying to achieve.

Page 8: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 8 of 93

• Chapter 4: Client’s system This chapter introduces the system of our client and describes the problems that their system currently faces.

• Chapter 5: Technologies used All the different technologies that were utilised in the project are described in this chapter.

• Chapter 6: The JXTA Framework The JXTA framework was used to develop the peer-to-peer infrastructure for the system. This chapter includes the different features of this framework.

• Chapter 7: Requirement Specification This chapter identifies and describes the requirements for the system.

• Chapter 8: System Architecture. An overview of the architecture of the system is described in this chapter. It introduces the different components of the system and what each component does.

• Chapter 9: System Design. This chapter expands on Chapter 8. It incorporates detailed design of each component.

• Chapter 10: Test Strategies This chapter includes all the tests that were carried out.

• Chapter 11: Evaluations and Future Developments. In this chapter we evaluate the system. The evaluations are based on whether the objectives have been met. It also encompasses a comparison between our project and one carried out in previous years on the same topic and concludes with the future enhancements for the system.

• Chapter 12: Conclusions

Page 9: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 9 of 93

Chapter 2: Background

2.1. Introduction This chapter gives background information on different technological aspects that relate to this project. It gives an introduction on file systems and distinguishes between local file systems and distributed ones. It then outlines the concept of Grid environments and introduces the notion of a peer-to-peer infrastructure. Lastly it presents an existing distributed file system and a comparison with the VPFS is outlined.

2.2. File Systems A file system is the generic name given to the logical structures of the data stored on physical storage. They encompass software routines, which are used to control access to the data. File systems play an imperative role in the operating system. They provide the mechanisms for the organisation and management of data and programs stored on the hard disk. Such mechanisms include storage, retrieval and specification for file naming. Different operating systems such as Windows, Unix-based, Macintosh may use different file systems, which are independent of the specific hardware being used.

2.2.1. Hierarchical File Systems These file systems provide a hierarchical structure of the file organisation by allowing the formation of files which can contain other files. In this way, users can group related files together in directories as shown in the diagram below. A directory can contain other directories and so on.

Figure 2-1: An Example Hierarchical File System

Hierarchical file systems can have two types of interfaces:

• Command line interfaces These interfaces require the user to input commands in order to manipulate data. These commands are in the form of special words or letters and are considered to be more flexible than graphical driven interfaces. On the other hand they are more difficult to learn due to the large number of complex commands.

Page 10: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 10 of 93

• Graphical interfaces These interfaces are the antithesis of the command driven interfaces. The user can execute commands to manipulate data from menus and thus freeing the user from learning complex command languages.

2.2.2. Local/Distributed File Systems File systems can be either local or distributed. A local file system is a system that stores all the files on the same computer, which implies a centralised file system. A distributed file system, on the other hand, permits the storage of files among several systems that have a different geographical location and provides important benefits over the local file systems. The location of the files is transparent to the users, which implies that if the files are located on another computer the users perceive the location as local. The VPFS system is based on a hierarchical structure for the organisation of files and incorporates a command line interface. The system utilises a distributed infrastructure, where different components of the system are dispersed across a number of computers.

2.3. Grid Environment A Grid environment brings together computational resources that are dispersed across different geographical locations and organisations for the purpose of problem solving and sharing these resources in a co-ordinated way. The sharing encompasses a variety of resources such as computational power, data, software, storage facilities and many others. The Grid model emphasizes that this sharing should be highly controlled with a clear definition between the resource providers and resource users. The definitions encompass what resources are shared, who is permitted to share them and the circumstances under which sharing happens. These groups of people or organisations that establish such collaboration and are bound with such sharing rules form what is known as Virtual Organisations (VO). Current distributed computing technologies can also achieve a network of sharing resources between institutions just like the Grid technologies. They do not, however, address the requirements of co-ordination of this resource sharing between different industries trying to achieve a common goal, which is a fundamental aspect in Grid technologies. Internet technologies in today’s industry deal with communication and information cross-organisational sharing but do not provide mechanisms for the co-ordination of these resources and policies to govern the sharing. The main initiative of creating the VPFS system is to attempt the provision of a storage facility, taking into consideration the nature and the challenges of the Grid environment. The system aims to provide the storage facility in an environment where peers are geographically dispersed, capable of linking cross-organisational storage. These storage facilities will be capable of connecting and disconnecting from the network dynamically. The system takes this into account and establishes mechanisms for discovering and allocating storage resources accordingly, while maintaining robustness and reliability. It encompasses policies to govern the storage of files according to the user’s needs and incorporates a control list to establish different access rights to the information that it holds.

Page 11: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 11 of 93

The features mentioned above contribute to the establishment of a dynamically co-ordinated storage resource sharing network in a Grid environment. Developing such a distributed file system makes it a very challenging project.

2.4. Peer-to-Peer Architecture (P2P) Over the past two decades, networking applications have been developed primarily in a hierarchical structure based on the client-server model. Clients connect to a server using a specific communication protocol such as the File Transfer Protocol (FTP) in order to share resources.

Server

Clients

Figure 2-2: Client-Server Model

The client-server model as shown in the figure above was the starting point for the evolution of networks providing an infrastructure for sharing resources and since then it has made a significant change in the way organisations operate. Although this model has brought significant change it has major weaknesses when it comes to distributed computing. One major weakness that can be revealed from the model is that it contains a single point of failure, since the clients connect to a server to share resources. Thus if the server goes down the entire network infrastructure will be disrupted. In addition, the clients perform very little computation as the server is the one that performs the mechanisms to service the requests from the clients. Distributed systems therefore cannot take advantage of clients with high computational power due to the nature of the model. Lastly, the client server model lacks scalability. The peer-to-peer (P2P) model does provide solutions to the limitations of the client-server model. The most important one is that it avoids having a single point of failure. In a P2P environment clients connect directly to each other to share resources. Each client is capable of servicing requests for other clients. Thus if one goes down, another that performs the same operations can take over.

Page 12: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 12 of 93

Figure 2-3: P2P Model

P2P infrastructure offers many benefits for administering the increase in the number of connected users and devices. It provides more direct communication channels to all the devices within a network, thus simplifying the process of locating and sharing resources for users. Today, P2P applications and services facilitate interactive communication, enabling connectivity with almost any device on the Internet to enable communication and collaboration. P2P architecture aims to establish networks more adaptive to changes in their environments, where clients connect or disconnect dynamically, thus making this model ideal to distributed systems. The VPFS system utilises the P2P infrastructure in order to provide a distributed file system that users perceive as a single logical file store. Due to the benefits of P2P infrastructure the VPFS system can provide a dynamic environment for sharing data, while maintaining persistence, robustness and reliability. Currently, companies are developing P2P frameworks that aid with the development of P2P applications. The VPFS system utilises one of them, the JXTA framework, developed by Sun Microsystems which will be described in Chapter 5.

2.5. Existing Distributed File Systems

2.5.1. Storage Resource Broker (SRB) This distributed file system was developed by the SDSC Data Intensive Computing Environments (DICE) group. SRB’s infrastructure is based on the client-server model. It provides a platform independent file system for Grid environments, accommodating a transparent access to storage resources. It encompasses a uniform API, allowing distributed clients to access different types of data storages across local and wide-area networks in a heterogeneous computing environment. The major components of SRB are: • An SRB server - allows distributed clients to access diverse storage resources in a

heterogeneous computing environment. • A client library - provides client applications with an API in order to communicate with

the SRB server.

Page 13: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 13 of 93

• An MCAT server – provides directory and access control information in order to allow location and access transparency.

The SRB utilises a set of UNIX-like utilities such as ls, cp and chmod for manipulating and querying collections and datasets held in the SRB space. The following diagram illustrates the architecture of SRB:

DB2, Oracle, Illustra, Object store HPPS, Uni Tree Unix, FTP

SRB Server

MCAT Server

Application (SRB Client)

Figure 2-4: SRB Architecture

Overall, SRB provides a way to access data sets and resources using their attributes rather than their physical location, thereby supporting location transparency alongside better reliability, availability and fault tolerance. Nevertheless, the SRB can be improved considerably at 2 levels: 1. It does not give users enough freedom when it comes to data storage policies. Apart from

file replication and access permissions, the SRB does not provide the user with the ability to specify the type of storage that they require for their files.

2. In terms of architecture, it is advantageous to move away from its centralised architecture and towards a peer-to-peer type communication model, where each party involved would basically have the same capabilities and act as both client and server. A peer-to-peer model would provide a more robust system, where there is no single point of failure. In addition, it can, in many cases, make more efficient use of resources and allow communities to deploy the system between them with more ease. Part of this project is to investigate and design ways in which this can be achieved in the VPFS system.

The following table illustrates the main differences between SRB and the VPFS system to be implemented.

Page 14: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 14 of 93

SRB VPFS

Transparency Not total - User specifies replica to open by specifying replica Number

Total

Availability Automatic Replication by joining 2 or more physical resources into a logical resource group

Replication of files across peers

Architecture Client-server File servers and clients

Peer-to-Peer Each peer can be a client or server or both.

File allocation tracking

MCAT server maps logical names to physical locations

Virtual File Allocation tables fragmented across peers – map logical names to physical locations

Security Supports four authentication schemes, GSI, SEA, plain text password and encrypted password.

Uses its own basic authentication mechanism

Sharing Access control

Issuing of ticket from data owner. Read only access

Data owner specifies read/write permissions to users or groups

Storage Configuration

Collections - contain a group of physically distributed data sets and/or sub-collections.

Groups of peers

Table 2-1: Comparison between SRB and VPFS

Page 15: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 15 of 93

Chapter 3: Objectives of VPFS

3.1. Introduction The VPFS system aims to provide a policy-based distributed file system within a Grid environment. The specific objectives are detailed in this chapter.

3.2. Objectives The main objectives of the VPFS system are the following: • Provide a distributed file storage facility. • Storage of files driven by a policy specification. • Interoperability in a Grid environment. • Meet Client’s needs Each main objective described above constitutes a number of further sub-objectives that are explained below:

3.2.1. Objective 1: Distributed File Storage Facility • Provide hierarchical storage structure for the organisation of files, where files are ordered

in directories. • Provide access control mechanisms allowing users to specify access permissions. • Provide file sharing between groups of users and concurrent access to files. • Provide basic file manipulation commands such as cp, mv and ls.

3.2.2. Objective 2: Storage of Files Driven by a Policy Specification • Establish mechanisms through which users can specify storage policies for files. For

example, one policy could be high availability – thus the system will store the file with a higher replication value.

• Establish mechanisms to allow administrators to specify properties for individual physical storage devices. For example, one property could be security – storage devices with such a property may require authentication mechanisms in order to be accessed.

3.2.3. Objective 3: Interoperability in a Grid Environment • Provide location transparency where users view the system as a single virtual file store. • Provide platform independent environment • Provide resource sharing among users who have different geographical locations. • Establish a dynamic environment where hosts can connect or disconnect from the system

without any effect on the deployment of the network. • Provide reliability and high availability of the stored files by implementing replication

mechanisms, thus eliminating the possibility of a single point of failure. • Provide better utilisation of storage capacities as all machines that are participating in the

system could be used to store files.

Page 16: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 16 of 93

3.2.4. Objective 4: Meet Client’s Needs • Manipulate large files

o Storing o Reading o Appending

• Better utilization of available cluster storage. • Improve system’s scalability • System API

Page 17: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 17 of 93

Chapter 4: Client’s System

4.1. Introduction Our client, as mentioned in Chapter 1, is Stefano Street, a member of the Research staff in the Computer Science department at UCL. Our client’s application is called GenTHREADER. The GenTHREADER application processes specific information that aids research specialising in bioinformatics to understand the biological functions of gene products. The system is dispersed across three different university sites: UCL, Imperial College and Cambridge. The main goal of the system is to share computational power among the three universities in order to process bioinformatics information. The reason for this is because the size of the bioinformatics data is so large and the associated processing requirements are vast, that a single machine will take days or weeks to process. By having this system, it aids research to process the data within hours. Stefano is responsible for the system’s infrastructure at the UCL premises and connectivity with the other sites. This chapter incorporates a description of how the overall system operates. It then expands on the infrastructure of the system at the UCL premises only.

4.2. Overall System The overall system can be illustrated by the following diagram:

User interface via DAS

Web interface @

UCLWeb interface

@IC

Web interface@

EBI

Resource Management

under GLOBUS

CPUs@

UCL-CS

RDB of annotation

@IC

RDB of annotation

@EBI

RDB of annotation

@UCL

CPUs@IC

CPUs@

EBI

CPUs@

Biochem

Figure 4-1: GenTHREADER System

Each site has its own database annotation resource and currently access to the databases is not possible across sites so data has to be copied over regularly between sites. Users access the system via a web interface. Once they access the system, they choose what is called a “sequence file” which is stored at their local machines. They then choose which computational resources the system should use (UCL, IC or Cambridge) and then send the data for processing. The Resource Management component then takes the data and allocates job processes to the machines which are dispersed across the three universities.

Page 18: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 18 of 93

4.3. System at UCL premises The GenTHREADER at the UCL premises is dispersed across two sites; one site is located at the Biochemistry Department and the other one at the Computer Science department. The two sites are connected with the system via the JANET network. The system incorporates two main servers whose names are TITIN and FREKE. TITIN holds the main database where FREKE is the Master Scheduler Queuing server. Both of these servers connect to a cluster of about 170 machines whose roles are to process data. The system architecture is shown below:

UCL Premises

GERE

Su

mm

it4

8S

um

mit

48

Su

mm

it4

8

FR

EK

ETIT

IN

Cis

coJANETIC DOC

Biochem

Figure 4-2: GenTHREADER UCL Premises

All the machines have mount points to the server FREKE in order to access data in a remote fashion and avoid sending the entire dataset to individual machines. The Master Scheduling Queuing server (FREKE) holds software to break down the data into a number of sub-jobs ready for processing and assigns each job to a machine that is idle. If all the machines are busy processing then the jobs are queued up at FREKE. The machines processing is independent from each other. Each machine that is assigned a job needs to access data from the TITIN server via the FREKE server; this is done by the use of a symbolic link to TITIN from FREKE. Once the processing is finished, results are not stored locally, but are stored back on the server FREKE. We can elicit here major issues that the current system is facing:

1. Scalability issues. The system is not easily scalable. The machines are required to know which servers hold what data and as such the appropriate mount points need to be made. Thus if they require to move data to a new or an existing server, each of the machines that perform the processing of data need to be configured manually in regards to the mounting points. And as the system holds a cluster of 170 machines, this can be a nightmare for the administrator. Also, there is an NFS limitation on the number of times a server can be mounted.

2. Bottleneck issues The operational structure of the system can experience bottleneck issues. The two servers could be overloaded. Machines access them to retrieve data and access them

Page 19: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 19 of 93

again to store the results. The network between FREKE and the cluster is therefore a bottleneck.

3. Poor utilisation of storage resources Currently the machines do not store the results locally, as this would cause a mess of identifying which machine holds which data results. As such, the results are stored back to the servers. Thus the processing machines storage resources are wasted.

Page 20: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 20 of 93

Chapter 5: JXTA Framework

5.1. Peer-to-peer framework P2P frameworks are platforms on which to build applications based on a peer-to-peer communication model. Currently, there are few of these frameworks in the market, which are open source and can be utilised by anyone who wants to develop peer to peer applications. Research was conducted on three frameworks: Jini, Globus and JXTA. Each framework was evaluated based on certain criteria that the group considered vital. These criteria are listed below:

1. Platform independence – the group was looking for framework that is not limited to specific platforms.

2. Java Programming Language – the group preferred a framework that was based on the Java Programming Language as it is platform independent and the majority of the group were familiar with the language.

3. Open source – this would give the group a better understanding of the workings of the framework.

4. Ease of set up – the group preferred a framework that can be easily configured due to the time limit in order to bring the system into completion.

5. Technological aspects – the group preferred a framework that incorporated certain mechanisms, making it capable of performing the following:

a. Discovery of peers/groups. b. Group development. c. Broadcasting messages. d. Secure transferring of messages.

All the frameworks did satisfy the above criteria. The chosen framework was JXTA and the reasons for choosing it were mainly organisational:

• JXTA carries the least risk of all platforms. It was used in a previous project (VPFS2002) and therefore the group could see examples of what the features can do or be persuaded to do.

• We can consult with the supervisors of the VPFS2002 project about it, which may not be possible with any other frameworks.

• There is a JXTA community that we can use as a source of help as well.

This chapter continues with a description of the JXTA framework. The JXTA framework is large and complex and thus cannot be covered in too much detail. This section gives an introduction to this peer-to-peer framework, in order to aid the reader to better understand the underlying architecture of the VPFS system.

5.2. JXTA JXTA incorporates a collection of concepts and protocol specifications which aim to aid the developer to build peer-to-peer applications and provide an abstraction from the complexity of having to design correct and robust communication protocols. The basic protocols defined by JXTA are:

• Peer Discovery Protocol - allows peers to advertise their own resources as well as to discover resources from other peers in the network.

Page 21: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 21 of 93

• Peer Information Protocol - allows peers to obtain status information such as uptime, state, recent traffic from other peers in the network.

• Peer Resolver Protocol - enables peers to send, receive and process generic queries to one or more peers.

• Pipe Binding Protocol - allows peers to launch a virtual channel of communication, which is known as a pipe, between one or more peers.

• Endpoint Routing Protocol - enables peers to discover route information to other peers.

• Rendezvous Protocol – provides mechanism to allow peers to propagate messages between peers.

All these protocols are asynchronous, and are used by peers to query about services, route paths, status information about other peers as well to advertise their resources within the group. They are based on a query/response model where peers send queries to one or more peers in the peer group. In return, they may receive zero or more responses.

5.3. Features of the JXTA Platform

5.3.1. Peers The JXTA network consists of a series of interconnected hosts known as peers. Peers are capable of joining or leaving the network at any time. Peers can be categorised into four types according to the way they administer the communication messages among them. These types are listed below:

1. Minimal edge peer - A minimal edge peer has limited resources and as such does not cache information about the network resources (peers, peer groups, pipes, services). These peers just send and receive messages.

2. Full-featured edge peer - A full-featured peer is not only capable of sending and receiving messages, but also caching information about network resources. Hence such peers can reply to discovery requests from other peers with information found in its cached information, but do not have the capability of forwarding any discovery requests to other peers.

3. Rendezvous peer - A rendezvous peer is able of performing the same mechanisms as the Full-feature peer. In addition though, it is also capable of forwarding discovery requests to aid other peers discover resources. Each peer group must have at least one Rendezvous peer. If a rendezvous peer cannot respond to a request, then it forwards these requests to another one.

4. Relay peer - A relay peer is capable of maintaining route information to other peers. When a peer wants to send information to another, it will first look in its local cache for route information. If it does not find any, it will send a query to a relay peer, which in turn will respond with the route information.

Peers can be configured to take any of the forms mentioned above. The organisation of the network is not commanded by the JXTA framework, thus developers have the flexibility of using a combination of any of the peer types they want, to suit their needs. For example they can even configure a peer to provide relay services as well as rendezvous services, or they can set up a computer that has very limited hard-disk capacity as a minimal edge peer.

Page 22: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 22 of 93

5.3.2. Peer Groups Peer groups are amalgamations of peers that have an agreement upon an accepted group of services. For example, the VPFS system is an agreement between a set of peers to provide a file sharing service. Groups can be of two types:

• Unrestricted – no boundaries are defined for the group and as such any peer may join the group, without going through any security mechanisms.

• Restricted – boundaries are defined; these groups provide a secure domain

environment for its peer members allowing them to access and publish protected services. For a peer to join restricted groups it has to go through the security mechanism that the particular group is implementing.

Peers may create, join or leave peer groups at any time and it is also possible for peers to belong to more than one peer group. Peer groups are organised in a hierarchical structure and thus they sustain a hierarchical parent-child relationship among them. Messages not only propagate within the group itself but they also propagate within the parent group. There is a generic peer group that all peers belong to when initially logging in to the JXTA network. This generic peer group is called the World Peer Group. From there, peers may join or create peer groups. When a peer first joins a group, it contacts a rendezvous peer within the specific group to discover resources as well to publish its resources to the group. When a peer creates a new group, it becomes automatically the rendezvous peer for that group.

5.3.3. Identifiers All the resources within the JXTA network such as peers, peer groups and pipes are given specific JXTA identifiers which have two roles:

1. Uniquely identify a component within the network. 2. Serve as the official way to reference that component.

JXTA utilises the UUID (Universal Unique IDentifiers) for its identifiers. The UUIDs are 128-bit numbers that are assigned to each entity within the JXTA network and are guaranteed to be unique within a local runtime environment. These identifiers are generated by the JXTA J2SE platform binding. However, the JXTA community does not guarantee that the UUID will be unique across an entire global community which incorporates millions of peers.

5.3.4. Services A service defines a functionality that a peer can provide. Services represent modules, where each module signifies a piece of code which is used to implement a particular behaviour. A service could be either executed by a single peer or it may require the collaboration of multiple peers to execute instances of that service. In the former case, the service is known as a peer service where in the latter case it is known as a peer group service. The JXTA framework defines a fundamental set of services. These are listed below:

Page 23: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 23 of 93

• Discovery Service - This service deal with the discovery of peer group resources, such as peers, peer groups, pipes and services.

• Membership Service - This service handles membership applications for peers to join new groups.

• Access Service - This service deals with the validation of requests by verifying whether the peer making the request is permitted to have access to the specified service.

• Pipe Service - This service administers pipe connections between members of a peer group.

• Resolver Service - This service deals with generic query requests. • Monitoring Service - This service enables peers to monitor other peers within the

group. Not all of the above core services need to be implemented. Depending on the needs of a network, a developer is flexible in choosing which ones to implement to suit their needs. In addition it is possible to create new services as well. The VPFS system utilises a combination of the core services mentioned above and new ones that the group developed in order to satisfy the objectives of the system.

5.3.5. Advertisements Advertisements, which are basically XML structured documents, are used by the JXTA framework to describe all the JXTA network resources. It is a form of communication between peers that aid them to understand what is happening within their environment. For example, when a peer joins a group, an advertisement will be sent to the rendezvous peer, with information about that specific peer, such as the peer’s ID and what services it provides. The rendezvous peer, capable of distributing the advertisement within the network, aids other peers to discover the resource that just joined the group. The following is an example of an advertisement. It specifies that it is a service advertisement from a specific peer. It includes the ID of the peer, and that the service it provides is File Manipulation. The types of services provided by peers in the VPFS system are discussed in later chapters.

Page 24: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 24 of 93

<?xml version="1.0"?> <!DOCTYPE FileManipulationAdvertisement> <FileManipulationAdvertisement> <jxta:PipeAdvertisement xmlns:jxta="http://jxta.org"> <Id> urn:jxta:uuid-C66B3A5631A34634A1F0915E2A3BB8F22BB14CD9BE9942968726B019DF02403D04 </Id> <Type> JxtaUnicast </Type> <Name> FileManipulationServiceAcceptPipe.end1 </Name> </jxta:PipeAdvertisement> <PeerID> urn:jxta:uuid-59616261646162614A787461503250334344AEF057B149CFBD81B50E976C9A0E03 </PeerID> <AvailableSpace> 104836207 </AvailableSpace> <Status> 1 </Status> <IsFM> true </IsFM> </FileManipulationAdvertisement>

Figure 5-1: Example of a JXTA Service Advertisement

Once peers discover resources, via the use of the advertisements, they may cache the discovered advertisements locally for later use. The JXTA framework takes into consideration that some advertisements could be invalid as peers join or leave dynamically. To sustain an environment where obsolete resources are removed from the network, advertisements are published with a lifetime. Thus advertisements which are locally cached in peers are only valid for a specific period of time. When that period expires, the peers no longer use the information in those advertisements, and will try to search for the corresponding new ones. The JXTA framework defines eight types of advertisements but it is possible to define new ones as well, which is a desirable feature for developers.

5.3.6. Communication The JXTA framework supports the use of three transport protocols for communication between peers. They are: TCP/IP, HTTP and TLS (Transport Layer Security). The developer has the flexibility of choosing which protocol, or combination of protocols, to use for their JXTA network. The VPFS system utilises the TCP/IP protocol and incorporates three types of communication between peers:

1. Advertisements. 2. Query/Response. 3. File transfer.

Page 25: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 25 of 93

Each type of communication is handled by a different service:

• Discovery service – Advertisement communication • Pipe service – File transfer communication • Resolver service – Query/response communication

Discovery Service - Advertisements The Discovery service is used to circulate advertisements between peers. It is the default service implemented by the JXTA framework to support the discovering of resources within a group. It utilises the Peer Discovery Protocol, which is the default protocol in the JXTA framework to probe peers for advertisements. The JXTA framework utilises a combination of IP multicast and the use of rendezvous peers to circulate advertisements around a group. The rendezvous peers capable of forwarding requests from one peer to another, aid peers to discover information dynamically. This eliminates the need to administer the network when peers join or leave the group. Peers send discovery query requests in order to discover resources within the group. These message requests include the credentials of the peer sending the request, which aids the recipient in identifying the sender. The discovery messages can be sent to any peer whose presence is known (i.e. it is stored in the cache of the sending peer), or to any rendezvous points. In return the sending peer may receive zero or more responses.

Pipes – File Transfer Pipes are virtual communication channels that can be set up between peers in order to communicate. Pipes support asynchronous and unidirectional mechanisms for the transfer of messages and can be used to transport any object such as binary code, data strings, and Java technology-based objects. The VPFS system, utilises pipes for the transfer of files between peers and for the exchange of control messages during file transfer. Pipes provide two modes of communication as described below:

• Point-to-point mode – where a pipe connects two peers directly. • Propagate mode – where a pipe connects one peer to multiple peers.

A peer is capable of creating several pipes that share a single port on that peer simultaneously. The existence of these pipes is established through pipe advertisements. In the case of the VPFS system, peers that want to share files need each others pipe advertisements in order to send or receive a file. Typically the peer who wishes to receive a file will generate a pipe advertisement, which will contain information to allow other peers to search for it. It would then publish an advertisement via the discovery service and wait for the file to arrive.

Resolver Service – Query/Response Messages This service is provided by JXTA and allows peers to send queries to specific peers or broadcast a query to all peers in a group. For example, in the VPFS system, a peer may send a query to other peers to locate where a particular file is stored. Query messages can be sent to any peer whose presence is known (i.e. it is stored in the cache of the sending peer) or to any rendezvous points.

Page 26: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 26 of 93

The resolver service utilises the Peer Resolver Protocol that handles these query messages and identifies matching response messages. Query/response messages are XML documents that contain the credential of the sender, a unique query/response ID and the query or the response. A peer is capable of sending multiple query messages and in return the sending peer may receive zero or more responses.

Page 27: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 27 of 93

Chapter 6: Requirements Specification

6.1. Introduction This chapter incorporates the specification of the requirements for the VPFS system. The chapter describes possible actors that will use the system and the file operations that could be used by them. It then expands on the properties that the files and directories held in the system would have and explains the security issues involved. It also encompasses the non-functional requirements of the system and reveals the notion of profiles.

6.2. Actors There are two main actors that will interact with the system directly. The primary actors will be the standard users who utilise the system in daily activities. The secondary actors are the administrators, who enable the primary actors to use the VPFS system. The following describes what each actor should be able to do with the system.

6.2.1. Users Users are empowered with the basic functions to utilise the system. Their interaction with the system involves the following:

• Authenticating their identity. • Organising and manipulating files and directories. • Choosing policies to store their files. • Manipulating access control list of files and directories.

In order for the users to utilise the system, they must first be registered. This registration incorporates the creation of their user profiles, which are created by the Administrator (this is explained in the next section). Upon registration each user is allocated a certain amount of file space, defined in MB, which is independent of any actual physical storage devices used. In addition, each user is assigned a home directory, which only he/she has access to. Once users are registered, they can log onto the VPFS system. The login process incorporates an authentication mechanism, where the system verifies the users’ identity. This involves a combination input of a username and password from the users. Once they are verified, they access their home directory. From there users are able to manage and manipulate their files using Unix-style commands. Users are also empowered with the capability of defining access permissions to their files and directories. These access permissions define which users have what rights to their files. In addition, users can also define storage policies. These policies define the storage characteristics of files. Each user is capable of modifying the permissions and policies of his files only. The diagram below illustrates the users’ interaction with the system:

Page 28: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 28 of 93

VPFS System

User

Organising/Manipulating files anddirectories

Authentication

Registration

Administrator

<<extends>>

<<extends>>

Figure 6-1: Users Use Case Model

6.2.2. Administrators Administrators are defined as users empowered with more functions and privileges. As such they have the same capabilities as the standard users but also they are entrusted with administrative abilities. They too have to be registered in order to use the system. Their administrative interaction with the system, involves the following:

• Organising the services of the VPFS system among peers of the group. • Creating and maintaining accounts of users and groups. • Creating and maintaining policies.

Administrators are accountable for setting up and organising the peers that their group will use. Each peer will be configured with a specific service or a combination of services that the VPFS system will provide. For example some peers may be initialised as storage devices where files could be stored, while other peers may be configured to provide other services. As the number of files increase, administrators could initialise more peers to provide storage. These services are introduced in a later section of this chapter. Administrators will also be responsible for creating and maintaining accounts for users, and groups. In order to register new users with the system, the administrator will have to create user accounts for them. This involves the creation of user profiles. In the same way, new groups can be created which requires the creation of group profiles. There will be a main root administrator for each peer group who will have the power to assign administrative capabilities to other users. Lastly, the administrators will be accountable for creating and maintaining policies. Even though the VPFS system incorporates three pre-defined policies, administrators have the capability to create new ones that will be applicable for their group.

Page 29: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 29 of 93

VPFS System

Organising Services

Access System

Creating andmaintainingaccounts

Creating andmaintaining policies

Registration

Administrator

<<ex

tends

>>

<<extends>>

<<ex

tends

>>

<<extends>>

Figure 6-2: Administrators Use Case Model

6.3. Groups Users may wish to make particular files or directories available for other users to read or write to. We hence need to introduce the notion of user groups. A user group consists of a list of users that belong to the group. Groups are set up by administrators. A user may belong to one or more groups, but a file can only be made available to one group at a time.

6.4. File Operations A set of Unix-based commands will be implemented in order to allow the users to organise and manipulate their files and directories. These commands are listed below:

• cd: change directory. • ls: list the files within the current directory. • mkdir: create a directory. • rm: delete a file or directory from the system • copy: copy a file to a new file in the same or other location. • mv: move a file from one location to another. • chmod: modify access permissions

An API (Application Programmer Interface) for the VPFS will not be implemented. This could be a future enhancement for the system, where it will allow other applications to use the services of the system. For the scope of this project, the above file operations are

Page 30: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 30 of 93

sufficient enough to aid users organise and manipulate their files via the commands stated above.

6.5. File and Directory Properties Each file or directory held in the VPFS system will have certain properties attached to them. We define here the two most important ones, the first one being the access permissions of the file or directory and the latter one the policies attached to them.

6.5.1. File and Directory Access Control Users will be able to define the type of access permissions their files should have. There will be two kinds of access permissions: read (r) and write (w). In addition, users could define who should have access to their files. They could grand permissions to specific users, to the whole group or to any user outside the group. The system utilises a Unix-like access control mechanism where there are three kinds of users for a file: owner, group and others. Users could manipulate these access permissions to their files using the command chmod. Depending on the type of access permission they want to grand and to whom they want to give the permission to, the corresponding number is input after the chmod command with the filename afterwards. Some examples are illustrated in the diagram below:

PERMISSION COMMAND U G O rw rw rw chmod 333 filename rw rw r- chmod 332 filename rw r- r- chmod 322 filename U = User G = Group O = Others

r = readable w = writable - = no permission

Table 6-1: VPFS Permissions

Users may choose to have their home directory available for the group or they may choose to make certain files and directories within their home directory available for anyone to read/write. The default access permission of files is that only the owner of those files would have read/write access to them.

6.5.2. Storage Policy Definitions Policies determine the type of storage for files. Three policies will be implemented and users will have the flexibility of choosing a combination of them for their files. These pre-defined policies will be globally accessible by all groups within the VPFS network. In addition a group could define their own, additional policies applicable to that group only. The three policies that will be implemented are listed below:

• Level of Availability - the system should replicate the files onto more hosts thus it is more easily available as hosts may join or leave the system at any given time.

• Level of Access speed - the system should save these files to much faster more efficient resource devices. This can be done for critical files.

Page 31: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 31 of 93

• Level of Security - the system should store these files to a secure device which requires encryption as well as authentication procedures to be performed. This can be done for sensitive data.

6.6. Directory Structure To a user, the file pathnames and general directory structure should appear as being location independent. The user should not be aware of where their files are stored, and how file pathnames are mapped to physical locations. User should be capable of viewing what appears to be a typical directory structure, where they can organise their files into directories and sub-directories, as they would on any local file system. A user is never aware of the physical location of his files and refers to them using their virtual file name and pathname.

6.7. Security Issues The VPFS will provide a file sharing facility where users could be in different geographical locations. This imposes some security issues that need to be identified. The security issues can be separated into three areas:

1. Authentications and access control 2. Secure Storage of files 3. Secure transfer of files.

Authentication processes must ensure that the identity of a user is the correct one. The system should be capable of detecting unauthorised users and should not allow them to log on to the system. Access controls are also vital for providing a control mechanism when sharing files. The system should provide a secure storage of files. It should prevent users that are not logged onto the system from accessing VPFS files that are stored on their local machine via their local file system. This is a very important issue and should be taken into consideration. Lastly, the system should provide a secure transfer of the files between the users. All three types of security are closely related and thus we should not provide a particular type of security without the other two types.

6.8. User Interface Users interact with the system via a command line interface. Additionally, a graphical user interface allowing users to view their directories and manipulate their contents could be built. Administrators interact with the system via the same command line interface but with additional commands available. This interface allows them not only to perform the operations of a normal user but also perform administrative functions such as creating user/group profiles and policies.

6.9. Non-Functional Requirements Systems designed for Grids, should be capable of sustaining their services under very dynamic cross-organisations environments. As such, some very important non-functional

Page 32: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 32 of 93

requirements have been taken into consideration when designing the VPFS system. These are listed below:

• Platform Independence - System must not be limited to particular platforms in order to use it.

• Transparency - The system must provide transparency to the user – the user must be unaware that the processes are not taking place on their local drive.

• Robustness - System should be able to adapt to changes in its environment, e.g. peers connecting or disconnecting from the system.

• Reliability - system must have a predictable behaviour – system performs the exact same operations for a give request.

• Availability - The system should make stored files available to users. • Load Balancing - Traffic (files) is distributed efficiently among peers based on the

capacity of the peers resources as well as the response time. Specific technologies will be used and certain mechanisms will be implemented in the system in order to support the above non-functional requirements.

6.10. Profile Definitions Profiles are XML documents and their role is to maintain information about the following entities:

• Users/Administrators • Groups • Peers

The information held by these profiles will be used for authentication purposes. For example, when a user logs on, the VPFS system will authenticate its identity by looking at their user profile. More importantly though, the information that the profiles will hold will aid the system to maintain some form of control which will be essential in order to manage the storage resources and services that will provide. For example, the system will specify disk quota for each user and will be able to check from the profiles of the users whether this limit has been exceeded or not.

6.10.1. User Profile These profiles are created by the administrator for each user. They will contain personal information about the user as well as some other attributes required for the system. These attributes are illustrated below:

User Profile Username Disk quota Full name Used file space Password Home directory pathname Group name Administrative capabilities

Table 6-2: User Profile Contents

Page 33: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 33 of 93

6.10.2. Group Profile This profile will be created by the administrator of the group. The profile will contain the list of the group’s members, where the administrator will be able to add or remove them from the group. A group profile contains the following details: .

Group Profile Name of the group List of the members

Table 6-3: Group Profile Contents

6.10.3. Peer Profile Peer profiles will be created by the administrator for each peer in the group. These profiles will state what service the peers will provide as well as the properties of those services. The VPFS system will provide four services thus each peer should be capable of providing either one of these services or a combination of them depending on how the administrator will configure them. These services are listed below:

• File Manipulation Service – a peer with this service will provide file storage. • User Profile service – a peer with this service will store and manage user/group

profiles. • Virtual File Allocation Table service – a peer with this service will store and manage

the virtual file allocation table (VFAT) of the VPFS. • Policy Service – a peer with this service will store and manage policies.

The following table illustrates the information of a peer profile: Peer Profile File Manipulation Service

Virtual File Allocation Table Service

User Profile service Policy Service

- Total available file space. - Total used file space. - Security level provided. - Level of access speed of its file store.

- Maximum number of VFAT entries that the host can hold. - Number of FAT entries it currently holds

- Maximum number of profiles that the host can hold. - Number of profiles it currently holds

- Maximum number of policies that the host can hold. - Number of policies it currently holds

Table 6-4: Peer Profile

Page 34: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 34 of 93

Chapter 7: System Architecture

7.1. Introduction This chapter incorporates an overview of the design. It illustrates how the system was broken down into a number of relatively independent modules. We will then proceed to detail the design and implementation of each of the components of the system in the following chapters.

7.2. Modules The VPFS system incorporates a number of modules. Each module provides a specific service, either to the other modules or to the application using the VPFS system. Most of the services are performed internally and are completely transparent to the user. We can identify four major modules, and these are listed below:

• File Manipulation Module • Virtual File Allocation Table Module • User Profile Module • Policy Module

7.2.1. File Manipulation Module (FMM) This module is responsible for three major services:

• Management of files • Allocation of physical storage • Sustaining availability of files.

A well defined interface to this module has been created, enabling the provision of the above services. The management of files service incorporates a set of commands, which allows a user to manipulate files in the VPFS system, such as:

• Placing files onto the VPFS system from the local file store. • Copying files from one virtual location to another within the VPFS system. • Deleting and retrieving files. • Finding the appropriate data store to store files based on the policies chosen by the

user. • Manipulating directories.

This module is also responsible for the allocation of physical storage of peers to the VPFS system. Each peer that participates in a VPFS network must define whether or not it provides physical storage so that it can be utilised by the VPFS system. This definition is performed by this module and thus enables the VPFS system to store files on that peer. In addition, the FMM in conjunction with the VFAT module (described below) is accountable for the availability of files held in the VPFS system. As peers connect or disconnect from a network dynamically, this could affect whether files are available or not. The FMM incorporates a replication mechanism which enables the replication of files across peers and thus aids in maintaining the availability of files.

Page 35: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 35 of 93

7.2.2. Virtual File Allocation Table Module (VFAT) This module is responsible for the following services:

• Maintaining the directory tree. • Provide file location transparency. • Maintaining attributes of files. • Sustaining availability of files.

The defined interface of this module allows the provision of the above services. The VFAT module’s primary role is to maintain the directory tree structure of the files and directories held in the VPFS system. There is a single directory tree structure which is common in all peers that participate in the VPFS system that enables users to organise their files/directories. This organisation of files is completely location independent via the use of virtual directories and virtual pathnames. The maintenance of this directory tree structure involves the VFAT performing the following functions:

• Creating new entries in the tree structure when users create new files/directories. • Modifying existing entries in the tree structure when users copy files/directories from

one virtual location to another. Modification of the tree structure could also occur when replication mechanism takes place or peers connect/disconnect from the network.

• Deleting existing entries in the tree structure when users delete files/directories. The VFAT module is also accountable for providing file location transparency to the users. Users should be unaware that files are stored on different machines. They should view the system as a single integrated storage facility. In order to provide file location transparency, the VFAT module incorporates mechanisms in order to map virtual pathnames of files to their physical locations. This eliminates the need for users to know which machines hold which files. Each entry in the directory tree structure constitutes what we call an inode. This inode contains all the attributes for the particular file/directory for which the entry in the directory tree structure has been made. Some examples of such attributes are the following:

• Owner. • Access control. • Name. • Virtual pathname. • Physical location. • Number of replicas.

The VFAT module in conjunction with the FMM is responsible for maintaining the availability of files (as mentioned in the FMM section above). The VFAT module via the use of the inodes, can identify the number of replicas each file has. Thus when this number drops below a defined threshold, the VFAT module will inform the FMM in order to deploy the replication mechanism.

7.2.3. User Profile Module (UPM) This module is responsible for the following service:

Page 36: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 36 of 93

• Maintaining profiles The UPM is responsible for providing and maintaining user/group profiles. The user profiles contain information about the user such as the name of the user, username and passwords, the administrative capabilities they have etc. Thus for each user, a user profile is created. Group profiles on the other hand contain information and characteristics of the group such as a list of all its members. The UPM is accessed by the administrator of a group and allows him to perform the following operations:

• Create new user/group profiles • Modify user/group profiles • Delete user/group profiles

The UPM is also used for authentication processes. When users log onto the VPFS system, the UPM will be accessed in order to authenticate the credentials of the user.

7.2.4. Policy Module (PM) This module is responsible for the following services:

• Providing policies. • Maintaining policies.

The VPFS system allows users to store files based on the policies chosen by the user. These policies are provided by the Policy Module. The FMM is responsible for choosing the appropriate file store based on the policy defined by the user. The FMM will contact the PM in order to find out information about the chosen policy. The PM will send the corresponding policy object and the FMM will execute the object. The PM is also accessed by the administrator and allows him to perform the following operations:

• Create new policies • Modify existing policies • Delete policies

In order to allow administrators to have the capability of creating new policies, a master policy is created, which is held in the PM and all new policies should conform to its structure. The master policy has the following attributes:

• Name of the policy. • Description of the policy. • The parameters of the Policy. • The code of the policy.

7.3. Overview of the System A high-level overview of the system is shown below, where 4 main layers of operation are identified:

Page 37: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 37 of 93

Application Layer

OS + existing file systems

Communication Layer

VPFS Layer

Figure 7-1: Layers of the System

The Application layer represents any application that is developed to use the VPFS system as their underlying file system. These could be a Unix-style command tool or a window-based graphical tool adapted to use the VPFS system. The VPFS service layer is the core layer. It identifies the layer in which the system provides the services through the modules (FMM, VFAT, UPM and PM) defined in the previous section. These services are provided to the application layer via a well defined interface, enabling users to organise and manage their files and directories. The communication layer represents the JXTA framework, which provides the establishment of a peer-to-peer environment. This layer incorporates the means by which hosts in the system can communicate with each other via the exchange of request/response messages and allow files to be transferred across hosts. Finally, all the above-mentioned layers are built on top of existing file systems and operating systems. Clearly the layers that comprise the VPFS system are the application and VPFS ones. These layers, when broken down to their individual components, give a more detailed structure of the overall system, as illustrated below.

Page 38: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 38 of 93

ApplicationLayer

Command line tools(User shell) Adapted application

VPFS

VPFS Access Interface

File ManipulationService Module

User ProfileService Module

VFAT ServiceModule

Policy ServiceModule

Figure 7-2: Detailed Structure

Each service provided by the VPFS system corresponds to a module. Each individual module handles the communication between peers that is required to take place in order to provide the corresponding service. Hence communication between peers always takes place within the context of a particular service. The VPFS access interface is the interface through which applications use the VPFS system. It defines a set of file operations, administration operations, and user authentication operations that can be called by those applications. It is within this layer that we define the activities that each service module must perform when a particular operation is called.

7.4. Structure of a Peer Peers that connect to the VPFS system should be capable of providing any of the VPFS services. The VPFS system should be capable of allowing administrators of groups to set which peers will provide which services. For example, they could allow only certain peers to provide the VFAT service. The set of services provided by a particular peer is defined by its Peer Profile. A Peer Profile, which is a simple XML document, is created for, and held on, each peer of the VPFS system. It defines whether or not the peer provides a particular VPFS service and the characteristics of the service provided. The Peer Profile also specifies whether the host provides an access point to users to the VPFS system. The initialisation module is responsible for initialising each one of the service modules, based on the values of the Peer Profile. If the peer does provide one or more of the services described above, the service modules will be responsible for advertising the characteristics of the services provided. The initialisation module will be dynamic. At any time the administrator could change the services provided by a peer within the group. The initialisation module will then start-up or shut down the appropriate service. The following diagram illustrates the structure of a node:

Page 39: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 39 of 93

Peer Profile

Initialisationmodule

Application

VPFS Access Interface

JXTA Framework (Services and communication channels

FileManipulation

Service Module

Policy ServiceModule

User ProfileService Module

VFAT ServiceModule

OS - Local file system

Figure 7-3: Peer Structure

7.5. Deployment Peer Groups and Communication Between Peers The creation of a VPFS network of peers will be achieved via the use of the JXTA framework. This framework will allow us to create logical groups, which will have mainly two roles:

• Act as a control mechanism, which will restrict who may have access to services of the VPFS system.

• Allow us to discover the set of peers forming the group and their locations. Also establish communication channels between them by specifying the presence of specific rendezvous points which will act as gateways to the rest of the network.

It is possible to create peer groups within peer groups. This would be useful for example, if a network were to span across two cities e.g. New York and London. A user who accesses the system from London, would have their files and replicas stored on data stores in the London subgroup. We have decided however that it is not a sufficiently important feature to have in the core of the system. In our vision of the VPFS system, the organisation of groups is simplified. A user can create their own VPFS group of peers for which they will be the root administrator. They will also specify a root directory. Each peer in the group may only belong to a single peer group at a time. These groups are such that several VPFS peer groups may coexist within the same LAN or other environment without interfering with each other.

Page 40: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 40 of 93

Chapter 8: System Design

8.1. File Manipulation Service Module File Manipulation Service module (FMM) provides the core services of the VPFS system that allow users to manipulate files within their file space. The main requirements of FMM are peer selection, file transfer and storage. This is achieved based on the storage policies specified by the user. The functions offered by the FMM are detailed as follows: Peer selection: The FMM is responsible for selecting peers for file storage according to the storage policies defined by the user. File transfer: After peer selection, the FMM is responsible for transferring files to the selected peer. File storage: When the destination peer receives the file, the FMM needs to store that file locally in the file space allocated to the VPFS system. File encryption: If user specifies a security level to a particular file, that file should be encrypted before it is transferred to the destination peer. File retrieving: Files stored on the VPFS system can be obtained by the user and stored on their local file system. File deletion: Users can delete files from their file space in the VPFS system. The deletion includes all the replicas of that file as well as the file inode. In order to achieve location transparency, all the files and directories stored in the VPFS system are handled by their virtual path name. The mapping between virtual path name and physical path names is stored in the VFAT service module which will be described in a later section of this chapter. The FMM is responsible for returning the physical locations of a file and its replicas to the VFAT.

8.1.1. Peer Selection for File Storage In order to store a file and its replicas onto the VPFS system, the FMM needs to choose a set of peers with appropriate file storage properties. The first property that the FMM needs to check is whether this peer provides the file storage service, since not all the peers are capable of storing users files. For example, a peer in the VPFS network may be a personal computer which has very limited hard disk space to store files. So it cannot provide the file storage service. The information about whether a peer provides the file storage service is stored in the peer profile. The peer profile is created by the administrator during initialisation. In addition to whether or not a peer provides the file storage service, there are some other characteristics that need to be checked during peer selection. They are described as follows: • Available space: When selecting peers for file storage, the peers with the most available

space are always selected as the desired candidate. • Security level: Since not all peers provide secure file storage, this factor will be taken

into account when the user specifies the level of security for a file.

Page 41: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 41 of 93

8.1.2. Replication Currently the VPFS system has three replication levels (1, 2 and 3) which can be specified by the user. In the case of putting a file onto VPFS system, a user can specify a replication level to the destination file. For example: put hello.txt test.txt –rep:2

In this case replication level 2 is assigned to destination file test.txt. According to this replication level, the FMM can obtain the number of replicas that need to be created for destination file test.txt from the StoragePolicies class in the Policy Module. Moreover, each replication level has a high threshold and a low threshold. This is described as follows:

Replication level Low threshold (number of replicas)

High threshold (number of replicas)

1 1 2 2 2 4 3 4 8

Table 8-1: Replication Thresholds

The main purpose of setting low and high thresholds for each replication level is to make the VPFS system more robust. For the above example, when the FMM receives the request to put a file with destination file name test.txt and replication level 2, the FMM will first try to find 4 appropriate peers to store the 4 replicas (high threshold) for test.txt. However, sometimes, the FMM might not find enough peers to store all the replicas. In this case, if it can find 2 appropriate peers (low threshold), the put operation can still succeed, rather than throw an exception. In order to maintain the replication level for each file, the VFAT module is responsible for monitoring the number of replicas that is currently available for a file. If it reaches the low threshold of the specified replication level, the VFAT will call the FMM to create more replicas up to the high threshold to keep the replication level of file test.txt. For the above example again, if the current number of replicas for test.txt is 1 which is lower than the low threshold of replication level 2, then the VFAT will call the FMM to create another 3 replicas to meet the high threshold. Assuming that there was a fixed value for each replication level, whenever a peer holding a replica disconnected, the replication process would need to be initiated. This would be wasteful if that particular peer would reconnect within a short amount of time. The above mechanism ensures that the replication process occurs less frequently.

8.1.3. The FileManipulationService Interface The following functions are defined in the File Manipulation Module Interface:

Page 42: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 42 of 93

public String[] putFile(String fileName, String destVPath, StoragePolicies sp) public void getFile(String peerID, String sourceFileName, String targetFileName) public boolean deleteFile(Vector peerIDs, String vPathName) public String[] copyFile(String sourceID, String sourcePathName, String vPathName, StoragePolicies sp)

Figure 8-1: The FileManipulationService Interface

The FMM provides the following services: • Put File: a user can put a file from their local file system onto the VPFS system. In order

to do so, the user needs to specify the path name of the source file in the local file system and the virtual path name in the VPFS system. The FMM will proceed to find a set of peers to store the replicas of the new file, return the locations of the peers and then create a new inode for the file in VFAT module. As mentioned above, the mappings between virtual path name and physical locations will be stored in the VFAT module. In order to maintain the integrity of the system, two-phase commit protocol is used. This means only when all the peers that have been selected, are ready to receive the file, the final storage request can be sent. Otherwise a rollback will be sent.

• Get File: a user can obtain a file from the VPFS system and store it on his local file system by specifying the virtual path name of the file. When receiving the request, the FMM will retrieve the file from one of the peers holding the replica, and then store it in the user’s local file system.

• Delete File: a user can delete a file from the VPFS system by specifying the virtual path name of the file. The FMM will then proceed to delete the file from all the peers holding the replicas, as well as the inode in the VFAT module. Here, there is a difference with the Put File: during the deletion, if a peer does not delete the file successfully, the whole delete operation will still be successful. The inconsistent state can be recovered during synchronisation i.e. when the peer synchronise its files with VFAT upon startup, it will find that the file no longer exists in the VPFS system, and therefore it will delete the file straight away.

• Copy File: a user can create a copy of an existing file in VPFS system by specifying the source and target file name. This operation is slightly more complicated than the above operations. When receiving the request, the FMM will proceed to find a peer holding one of the replicas of the source file and send the request to that peer. The FMM in that peer will then proceed to find a set of appropriate peers to store the new replicas of the source file based on the storage policies specified by the user. Finally, the FMM will create an inode for the new file.

8.1.4. File Handling This section introduces in detail how a file is handled in the FMM: • File Transfer File transfer is one of the critical requirements of the FMM. Within JXTA, the communication between peers can be achieved through pipes. As the VPFS system is built on top of the JXTA platform, pipes are used for file transfer. When sending a file to peers, several bi-directional pipes need to be created. These pipes can then be used for file transfer as well as the control messages.

Page 43: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 43 of 93

If a peer provides the file storage service, an accept pipe will be created during initialisation. This pipe will be continually listening for incoming connection requests for file transfer. Moreover, this accept pipe must be advertised using a pipe advertisement so that other peers in the group can be aware of its existence and establish a connection to it. When transferring a file, all the information about this file as well as file data is encapsulated into an object called FileStorage. Within FileStorage, a couple of methods have been defined to build and parse a message which is the unit of transfer used by a JXTA pipe. The class is represented as follows:

FileStorage

-fileData:byte[]-vPathName:String

+store:void+readFile:FileStorage+makeJXTAMessage:Message+parseJXTAMessage:FileStorage+encryptFile:byte[]+decryptFile:byte[]

Figure 8-2: The FileStorage Class

In our design, we also provide the way to fragment files, if necessary. This means a FileStorage object can be fragmented so that each object contains a fragment of the file. This function is very useful when transferring a large file. We can distribute the fragments to several peers rather than putting the whole file onto one peer. This will make the system more robust. • File Storage File storage in VPFS still relies on existing local file systems. Thus, we need a mapping between virtual pathname in VPFS and physical pathname used in the local file system. Within our design, all the peers providing the file storage service will create a directory on their local file system to store VPFS files. This directory will be referred to as VPFS_HOME. All the VPFS files will be stored under this directory. The physical location of VPFS_HOME for each peer is stored in the peer profile. In the VPFS_HOME directory, the organisation of the files should map to the virtual pathname exactly. For instance, if a file is created in the VPFS system with the path name /VPFSGroup/David/personal/hello.txt, then, assuming the path name of VPFS_HOME on the peer’s local file system is /usr/vpfs/files, the path name of the VPFS file hello.txt on that peer’s local disk will be: /usr/vpfs/files /VPFSGroup/David/personal/hello.txt. • Synchronisation Within the VPFS system, the replicas of a file are distributed onto several peers. It is thus necessary for each peer to synchronise its files with the VFAT upon start-up. The reason is that while that peer was offline, modifications could have been made on the files for which it holds replicas. In this case, the peer will verify the hash value of each replica it holds against

Page 44: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 44 of 93

the one stored in the VFAT. If the hash value stored in the peer is different from the one stored in VFAT, the peer will then remove the local replica as the replication mechanism would have already created the appropriate number of replicas if needed.

8.1.5. File Manipulation Module Component Overview The following figure illustrates the individual components of the FMM and their relationships. Each component is described in detail below:

spawns

creates

implementsimplements

interfaceFileManipulationService

+putFile:String[]+getFile:void+deleteFile:boolean

interfaceQueryHandler

+processResponse:void+processQuery:int

FileManipulationServiceImpl

-advHandler:FMMAdvertisementHandler-discovery:DiscoveryService

FMMPipeListener

-acceptPipe:BidirectionalPipeService.AcceptPipe-connectionHandlers:Vector

+run:void+kill:void

ConnectionHandler

-chPipe:BidirectionalPipeService.Pipe

+run:void+end:void

Figure 8-3: FMM Class Diagram

The FileManipulationServiceImpl is the implementation of the FileManipulationService interface, and handles most of the functionality of the module. During initialisation of this service, the FileManipulationServiceImpl will proceed to get peer properties and can determine whether or not the peer provides a file store service. The FileManipulationServiceImpl will also generate a new FMMAdvertisementHandler object to discover and publish advertisements. If the peer profile states that the peer offers the file storage service, the FileManipulationServiceImpl will create a new FMMPipeListener thread. This thread will be listening for incoming file transfer connection requests. When a connection has been established, a new JXTA pipe is created between the two peers. The FMMPipeListener will generate a new ConnectionHandler thread to deal with the pipe that was created, handle the receipt of the file and the exchange of control messages. The FileManipulationServiceImpl also implements the QueryHandler interface of the JXTA framework, which allows the exchange of control messages with the FileManipulationServiceImpl in other peers.

8.2. Virtual File Allocation Table Service Module (VFAT) The VFAT service module organises files hierarchically, i.e. in the typical directory tree structure used in most file systems. It maintains and manages this directory structure along

Page 45: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 45 of 93

with the properties of every file and directory in it. It consists of a table, which holds the mappings of virtual to physical pathnames. Virtual pathnames consist of names that users give to files or directories, whereas physical pathnames consist of the addresses of the location in the data store where files or directories are being held. Thus, location transparency is achieved, as a user need never be aware of the physical location of a file. All they need to do in order to access it, is to provide its virtual pathname. The VFAT is a collection of structures called inodes which hold the attributes of a file or directory. In the case of files, these attributes are:

• File name • Virtual pathname • Physical pathnames (Files are replicated across several hosts, see section 8.1.2) • Access control lists • Storage characteristics • Creation date • Last modification date • Flag indicating whether the file has been locked

In the case of directories, these attributes are:

• Directory name • Virtual pathname • Access control lists • Storage characteristics of the files residing within

The VFAT is stored in XML documents on a peer’s file system. The contents are organised as a tree to reflect the directory structure. As the VFAT grows though, it may be inefficient to hold it in a single peer, as traversing down a tree to locate an inode becomes time-consuming. Processing such a large file may also become memory and CPU intensive. Thus, to achieve scalability and robustness, the VFAT has been designed to be self-fragmenting: as the number of inodes that are created increases and the structure grows to a pre-specified limit, some inodes are moved to other peers that are capable of and have been authorised (by the administrator) to provide the VFAT service. Thus not only does a peer that provides this service, have to maintain its VFAT, it also has to locate and retrieve any inodes, if requested to, that are being held on other peers. In order for a peer to be able to locate inodes on a remote host, the latter will have to advertise the fragment of the VFAT it holds.

8.2.1. VFAT Module Design Overview During the design of the VFAT, there have been several important issues that needed to be dealt with. Probably the most important one is the management of inodes. If done correctly, it would allow for more efficient storage and retrieval of inodes within the structure. Moreover, as mentioned above, it would be inefficient to store the VFAT in a single peer, so a clever and efficient algorithm for fragmenting it and merging it (as inodes get deleted) needs to be created.

Page 46: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 46 of 93

Managing Inodes As mentioned above, it was decided to store inodes in a tree structure. One of the main reasons for following this approach was the fact that it provides an efficient way for traversing its nodes for insertion and deletion purposes. Moreover, it is similar to the file structures used in most file systems, thus is easy to visualise and comprehend. As such, it has a parent node that points to several child nodes, which could themselves be parent nodes of their own child nodes. Since the VFAT is fragmented across several peers, overall it will have several parent nodes. The tree model in the figure below, gives a visual example of the structure a VFAT fragment might have:

Figure 8-4: An Example VFAT Tree

A parent node is a directory, whereas its child nodes can either be files or subdirectories. Files are modelled as leaf nodes that have no child nodes. Their full pathnames can be formed using a concatenation of the node names that are encountered as the tree is traversed from its parent node to the file in question. For example, the full pathname of file q.java is /ucl/cs/student/dcnds/q.java. A directory inode can only be deleted if it is empty. Otherwise, all its files and directories have to be deleted first. During the initialisation process, a single main root directory is created and it only exists on the first host that has been initialised. Thus any files or directories created thereafter exist within the context of this root directory. As VFAT is fragmented, different peers will have different root directories. These represent directory inodes that have been moved there as part of the fragmentation process and have become the root directories in the VFAT fragments of those peers. The figure below presents the structure of the inode and XML related classes.

ucl

eng cs

mech student

dcnds year1

k.doc m.sip d.gs q.java

s.txt f.doc

a.txt

File Node

Page 47: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 47 of 93

implements implements

FileNode

-hash:String-storeLocation:ArrayList-fileSize:Long-writeLock:Boolean

+setHash:void+addStoreLocation:void+setFileSize:void

XMLFileNode

Inode

-name:String-owner:String-virtualPath:String-creationDate:Calendar-lastModDate:Calendar-groupName:String-permissions:Boolean[ ]-replication:int

+setName:void+setVPath:void+setOwner:void

DirNode

-contents:Vector-isRoot:Boolean-isLocal:Boolean-noOfFiles:int-totalNoOfFiles:int

+addEntry:void+setIsRoot:void+setIsLocal:void+setNoOfFiles:void

XMLDirNodeinterface

XMLInode

+readDocument:void+getDocument:Document+getInodeType:String

Figure 8-5: Inode Class Diagram

Fragmentation Assuming the VPFS system starts off with its VFAT residing on a single peer, as the number of file and directory inodes grows over time, there will come a point that manipulating the inodes within it will be expensive. To avoid that, fragmentation can be used which would involve moving some file and directory inodes from the current host to another. The moved inodes need to be then accessed transparently, as they were still in the original host. In determining the fragmentation process, there are a couple of issues that need to be addressed:

a. Which files and directories should be moved: Clearly, there need to be some criteria based on which files and directories are selected for fragmentation, otherwise there could be inconsistencies between the hosts. Assume, for example, that some files are randomly removed from directory /ucl/cs, which resides on host A, and are placed on host B. The two hosts have now different views of the same directory, as they have different contents. Such an inconsistency may not cause any errors, but will most likely result in delays in file requests, caused by inefficient searches, as files will now have to be searched across multiple hosts.

Page 48: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 48 of 93

b. How many files and directories should be moved:

The number of inodes to be moved should be carefully considered. If the number of inodes moved is too big, the new host might be overloaded, whereas if it is too small, the original host will soon be overloaded again.

In consideration to the issues above, the following design decisions have been made:

a. Maximum and minimum fragment sizes: Maximum and minimum fragment sizes will be set for each host. The first refers to the maximum number of inodes that a host can hold before fragmenting its VFAT, whereas the second refers to the minimum number of inodes that will be moved. By setting a limit to the minimum number of inodes to be moved, the original host is allowed to grow for quite some time before reaching its maximum fragment size again. Moreover, as already mentioned, it prevents the new host from being immediately overloaded. The algorithm for choosing suitable fragment values could well be a small project on its own. Thus, it has not been investigated and it is assumed, for now, that the administrator specifies them during the host’s initialisation.

b. Unit of fragmentation: It has been decided that fragmentation should only occur at directory level, which means that only one directory inode and its branches will be moved, instead of moving single files from different directories. Thus, delays resulting from inefficient searches, as mentioned above, are avoided and all related files reside in the same host. The figure below gives an example of a directory inode that is moved, along with all its branches:

Figure 8-6: A Fragment that is moved

c. Ideal fragment size: When choosing the directory inode to be moved, it seems reasonable to choose one whose total number of inodes is closest to and above the minimum fragment size. Thus, each directory will need to keep track of the number of file and directory inodes it holds, including the number of inodes held by all its directories. The total number of

ucl

eng cs

student

File Node

Assuming directory inode cs is moved; all its files and subdirectories will be moved

Page 49: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 49 of 93

inodes on a host is, therefore, determined recursively by the sum of inodes of all the parent directory’s child nodes. If two candidates were found to satisfy the criteria, the most recent directory inode is set to be the ideal one. Assuming the minimum fragment size on the host is 6, the figure below shows the ideal directory to be moved (next to its directory, the total number of inodes that it holds is shown):

Directory Total nodes held student 2 cs 4 eng 1 ucl (root) 7

Ideal directory to be moved is cs

Figure 8-7: Total inodes held by each directory

The fragmentation process involves 2 steps:

a. Move the fragment: Assume that host A in the figure below, has a minimum fragment size of 3 and a maximum fragment size of 7. If a file is added to /ucl/eng, the host will have exceeded its maximum fragment size of 7, thus fragmentation will need to occur and an ideal directory to be selected will need to be chosen:

Figure 8-8: Fragmentation of the VFAT tree stored on host A

ucl

eng cs

student (2)

(4)

(7)

(1)

ucl

eng cs

student (2)

(4)

(7)

(1)

Host A

cs

(2)

(4)

student

Host B

Fragment is chosen and placed on host B

A file is added in /ucl/eng. Total number of inodes becomes 8. Host A needs to fragment

/ucl/cs now becomes a root directory on host B

Page 50: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 50 of 93

As is shown above, root directories are created through fragmentation. Each one is unique and will be used as the access point to search for files or directories in the host they reside. Their full pathnames are advertised in JXTA advertisements. A fragment can be moved to any empty host which has sufficient space to store it. However, if no such host exists, the fragment will not be removed from the original host. The latter will be allowed to grow and another attempt for fragmentation will be made the next time round, where hopefully there will exist another directory with smaller, but still suitable, fragment size.

b. Create a remote directory:

As can be seen in the figure above, after the VFAT is fragmented, there is no indication that directory cs exists below ucl and that they used to be linked together as parent and child. That produces inconsistent views of the VFAT model in both hosts: the model on host A shows that ucl has only eng as its child, which, of course, is not correct. Moreover, assuming directory eng was deleted at some point, an attempt to delete directory inode ucl would wrongly seem to be valid, as ucl has also directory cs as its child and should not be removed.

It is therefore necessary for some kind of link to be indicated in host A. This could be in the form of an empty directory that is created in the place where the moved directory used to exist. It can be named after it and is referred to as a remote directory. A remote directory is sufficient to indicate that ucl used to have a child called cs that has been moved to another host. Therefore, assuming that eng does not exist, a request to delete ucl will be rejected, as it would be clear that the latter still has another child. The tree models on both hosts provide now consistent views of the links between them:

Figure 8-9: A Remote Directory is created on host A after fragmentation

The only case where the creation of a remote directory will not be required is if the root directory is moved to a new host, as it does not have a parent.

ucl

eng

(7)

(1)

Host A

cs

(2)

(4)

student

Host B

A remote directory for cs is now created.

cs

Page 51: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 51 of 93

Locating Inodes As shown above, the fragmentation results in the inodes being distributed across the network. Thus, whenever there is a request for an inode, the latter will have to be located first. As mentioned above, each host advertises its root directories via JXTA advertisements. Thus, a request can discover the host whose root pathname best matches the pathname of the inode in question. This is called the closest root pathname match. In this way, the root directory of the host that is most likely to be the parent of the requested inode, or the inode itself, can be identified. This approach is best illustrated by the following example:

Figure 8-10: Closest Root Pathname Match Approach

As can be seen from the figure above, host A requests that a file is created in directory /ucl/cs/dcnds/z15. After receiving all advertisements from hosts B, C and D, it will make a comparison between the advertised roots and the required directory inode (i.e. /ucl/cs/dcnds/z15) to find which root matches it closest. This comparison is shown below:

Comparison /aston/eng/mech /ucl /ucl/cs/dcnds /ucl/cs/dcnds/s15 No match Matched by 1 Matched by 3

Thus, root directory /ucl/cs/dcnds in host D is found to provide the closest root match. Therefore, the request is sent to it to locate /ucl/cs/dcnds/z15 and create the inode.

Replication It has been illustrated so far that the fragmentation process involves moving a fragment of the VFAT from one host to another. What happens though, if the host where the moved fragment resides fails? All the fragment’s inodes will be permanently lost and only the name of the

Advertisement /ucl

Advertisement /ucl/cs/dcnds

Request Create a file inode

in /ucl/cs/dcnds/s15

Host A

Host D Host B

Host C

Advertisement

Advertise-ment

Advertise-

ment

Advertisement /aston/eng/mech

Forwards Request

Page 52: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 52 of 93

fragment’s root directory could be possibly recovered, as it is also created as a remote directory on the original host. Therefore, to make the system more robust, it would be ideal to move the same fragment to more than one hosts, hence replicating the fragments. Due to the project’s time constraints, the VFAT’s design involves fragmenting only to a single host. However, as long as other hosts with available space exist, applying the fragmentation to more than one host should not be extremely hard, as it would mainly involve transporting the same fragment to those hosts.

8.2.2. VFAT Module Implementation An inode is represented by an abstract Inode class which includes all the attributes and methods (common attributes are manipulated in the same way) that are common to files and directories. This class has two child classes, FileNode for files and DirNode for directories.

The VFATService Interface The interface of the VFAT module must allow any peer in the network to process transparently the inodes held, i.e. create a new one, retrieve and/or modify an existing one or delete one. Thus a peer need not be aware how the inodes are stored, nor does it need to search for an inode within the structure that it is stored in. Providing an inode name should be enough to retrieve the requested inode. Moreover, the fragmentation process should be transparent to external modules and no methods for that purpose should be included in the module’s interface. The interface of the VFAT module (VFATServiceInterface) defines the following methods: public int createInode(Inode inode, String vPathName) public int modifyInode(String vPathName, int attr_type, String attr_value) public int modifyInode(Inode inode, String vPathName) public void deleteInode(String vPathName) public Inode getInode(String vPathName) public Vector listInodes(String dirName) public void startService()

Figure 8-11: The VFATService Interface

Service Implementation Not all peers should be authorised to provide the VFAT service and thus manage inodes. Typically, it would be preferable if only trusted peers could do so. A host’s client profile states whether that host may provide the VFAT service, or not. Those who provide it, reserve space for inodes and can store fragments of the overall VFAT structure. Assuming a peer may provide the VFAT service, its profile needs to list information about the path of the XML document in which inodes are stored persistently, the minimum and maximum fragment values.

Page 53: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 53 of 93

The following code represents the part of a peer profile that refers to the VFAT service: <PeerProfile> ... <VFATService ProvidesService="True"> <MaxFragSize>10</MaxFragSize> <MinFragSize>5</MinFragSize> <XMLFilePath>VFAT.xml</XMLFilePath> </VFATService> ... </PeerProfile>

Figure 8-12: The VFAT Service section of the Peer Profile

The VFAT module is composed of three main classes: a. VFATServiceImpl: is responsible for handling communication aspects, i.e. exchange

of request/response messages, advertisements (using the VFATServiceAdvertisementHandler). Moreover, it implements the module’s interface methods.

b. VFATManager: is initialised by VFATServiceImpl to manage the inodes stored. It encapsulates methods to create, retrieve, update or delete the inodes that are stored locally and communicates with the XMLModel, explained below, to write inodes to the XML storage file. Moreover, it is responsible for initiating and carrying out the fragmentation process.

c. XMLModel: is responsible for handling the XML documents related to the inodes. It provides methods to retrieve information of an inode from the XML file and store it into the DOM structure in memory, where it can be manipulated. The following code represents the hierarchical tree structure in which an inode is arranged within an XML file:

<?xml version="1.0"?> <vfat node="node1"> <inode name="file/dir Name" pathname="ucl/cs/dcnds" type="file/dir"> <ownername>Name of Owner</ownername> <ownerread>true/false</ownerread> <ownerwrite>true/false</ownerwrite> . . . . Additional Access properties of inode . . . <replication>1/2/3</replication> . . . . Additional VPFS related policies . . . . . . . . Additional Directory related policies if inode is directory ... <inode name="file/dir name" pathname="ucl/cs/dcnds/s10" type="file"> <ownername>ownerName</ownername> . . . Aditional Access related properties. . . . . . Additional VPFS related policies. . . . . . Additional File related policies if inode is file . . . </inode> </inode> </vfat>

Figure 8-13: The basic structure of the VFAT

Finally, the figure below, illustrates the components of the VFAT module:

Page 54: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 54 of 93

stores inodes

implementsimplements

uses

manages local inodes

interfacevFATService

+createInode:int+listInodes:Vector+getInode:Inode

vFATServiceImpl

-vfm:vFATManager-advHandler:VFATServiceAdvHandler

-processCreateInodeRequest:CreateInodeResponse-processGetInodeRequest:GetInodeResponse

VFATServiceAdvHandler

-localPeerID:String-discovery:DiscoveryService

+findClosestMatch:VFATAdvertisement+findSpace:VFATAdvertisement

vFATManager

-vfa:XMLModel

+createRoot:void+checkFragment:void+checkDirIsLink:boolean

interfaceQueryHandler

+processResponse:void+processQuery:int

XMLModel

-doc:Document

+createElement:void+createAttributeElement:void+getAttribute:String

Figure 8-14: VFAT Class Diagram

8.3. User Profile Service Module (UPM) The details of all users of the VPFS system are stored in user profiles, which are created by administrators. The types of details in these profiles are username, full name, administrative capabilities, storage limitations and others. These profiles will be referred to when carrying out functions such as logging in and executing commands. Users are logically organised into groups. These groups also have profiles. A group represents a set of users who have something in common, such as being in the same department, and may therefore wish to share files. A user may make files available for reading and/or writing to members of their group. All information about a group is contained within the group profile, which is created by an administrator just like a user profile would be. All users must belong to a group. These groups are organised hierarchically in a tree structure similar to the structure of the VFAT. Groups may therefore contain subgroups as well as users. Profiles should be spread across multiple machines in order to increase robustness. The User Profile tree will therefore be fragmented across the network. The User Profile tree structure and the VFAT tree structure have many similarities and therefore the design of the two modules will be closely linked. As with the VFAT structure,

Page 55: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 55 of 93

the User Profile structure will be self fragmenting. As more and more profiles are added, the tree will fragment and the fragments will migrate to other peers running the UPM and with available space. Like inodes in the VFAT module, user and group profiles will be stored persistently on disk in XML documents.

8.3.1. User-profile Module Design Overview

Managing User Profiles The entities in the user profile module are similar to those in the VFAT module. Group Profiles correspond to Directory Inodes and User Profiles correspond to File Inodes. The Profile structure is therefore a tree, just like the VFAT tree. A group refers to the scope of accessibility a user has. It can correspond to people within a particular department, or users with a particular common interest, for example. A user must belong to a group and a group is always part of a larger group. Both entities require information to be kept about them, so User Profiles and Group Profiles respectively are created.

Figure 8-15: The User Profile tree

The above diagram illustrates the tree structure created by user and group profiles and how the profiles relate to each other. Group profiles are shown as nodes and user profiles are shown as leaves. The children of a node are therefore the users and sub-groups of the group the node represents. Unlike the VFAT tree, the structure of the Profile tree represents the concept of user hierarchy. A parent group has precedence over a child group. This hierarchy is useful for administrative scope. An administrator will be able to carry out administrative tasks within the group they belong to and any sub-groups below. They cannot carry out these tasks in the group above theirs. In the example tree above, an administrator belonging to group dcnds would be able to create users in group Z15 but not in group cs.

ucl

eng cs

mech dcnds

z15 year1

Alice Bob Fred George

Ella Carol

Dave

Group Profile

Page 56: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 56 of 93

The tree model is traversed in the exact same way as the VFAT tree model. Every profile has a name and a complete parent path which together uniquely identify the profile. A profile will be created as a child under a particular group node. In the above example tree, users Fred and George belong to group /ucl/cs/dcnds/z15 and therefore their profiles are created as leaves of the /ucl/cs/dcnds/z15 node. Initially the tree will contain a root group with the root administrators profile within that group. These will have been created by the root administrator during initialisation of the system.

Fragmentation The fragmentation of the User Profile tree is carried out in exactly the same way as the VFAT tree. Please refer to section 8.2.1 for further details.

Locating User Profiles As profiles can be distributed across a number of peers, a mechanism is needed to locate an individual profile. When a user or group profile is needed, the first step is to find the peer that holds the parent group. Each peer that provides the User Profile services advertises the roots it holds. These JXTA advertisements are used to locate the parent group. The request for a user profile for user Carol of group /ucl/cs will look at all the currently available advertisements and look for the one that provides the closest pathname match for /ucl/cs. The profile required will be found on that particular peer. The mechanism used to find the closest match was described in the VFAT section, but it is given here again as a recap. Peer A makes a request to get user profile Carol of domain /ucl/cs Peer B advertises root group /ucl/cs/dcnds and /ucl/phy Peer C advertises root group /ucl/cs

Peer Root advertised Path needed Match /ucl/cs/dcnds No match B /ucl/phy No match

C /ucl/cs

/ucl/cs

Matched by 2 directories Table 8-2: Clasest Path Matching

Peer C provides the closest match for the required group /ucl/cs and therefore the profile for Carol must be on that peer. Peer A will then send the request for the user profile to Peer C.

Replication Due to time constraints, the current design only incorporates fragmenting the profile tree to single hosts. This can however be problematic as if a peer containing a fragment goes offline; all the people contained within that fragment will no longer be able to log in, until that peer comes back up. The solution to this problem would be to replicate the Profile tree. This could be achieved during the fragmentation step by fragmenting to more than one peer.

Page 57: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 57 of 93

8.3.2. User-profile Implementation Due to the similarities between the User Profile structure and the VFAT structure, the implementation of the User Profile service follows that of the VFAT service, in terms of components, communication and other aspects.

The UserProfileService Interface The interface to the User Profile Service allows creation, modification and deletion of profiles from any peer within the network, regardless of where the profiles are actually stored, or any fragmentation that may have occurred. The methods defined in the interface are as follows: public void addUserProfile(UserProfile user) public void addGroupProfile(GroupProfile group) public void deleteUserProfile(String username, String groupPath) public void deleteGroupProfile(String groupName) public UserProfile getUserProfile(String username, String groupPath) public GroupProfile getGroupProfile(String groupName) public void modifyUserProfile(String username, int attr_id, String attr_Val) public void modifyUserProfile(UserProfile user, String groupPath) public void modifyGroupProfile(String groupName, int attr_id, String attr_val) public void modifyGroupProfile(GroupProfile group)

Figure 8-16: The UserProfileService Interface

For User and Group profiles, corresponding UserProfile and GroupProfile classes are created. Their relation is shown in the diagram below. Note that XMLUserProfile and XMLGroupProfile are XML versions of the UserProfile and GroupProfile classes for use in the User Profile Service request/response messages as described in section 8.6.

Page 58: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 58 of 93

Profile

name:String

UserProfile

homeDirectory:String maxFileSpace:int admin:boolean

GroupProfile

+addMember:void+removeMember:void

members:Vector isRoot:boolean isLocal:boolean noOfUsers:int

XMLUserProfile XMLGroupProfile

interfaceXMLProfile

+USER_PROFILE:String+GROUP_PROFILE:String

+getDocument:Document+readDocument:void+getProfileType:String

Figure 8-17: Profiles Class diagram

Service Implementation The User Profile service is comprised of three sections. 1. The service implementation, UserProfileServiceImpl. This handles all the communication necessary to run this service, such as sending request/response messages. It also deals with processing those messages. This class will also create an instance of the UserProfileManager. 2. The UserProfileManager is responsible for the storage of profiles locally as well as fragmenting the profile tree if needed. 3. The XMLModel class is responsible for creating and parsing XML data to store profiles persistently in an XML document. All the components described are shown in the class diagram below:

Page 59: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 59 of 93

stores profiles

implementsimplementsimplements

manages local profiles

uses

UserProfileServiceImpl

-advHandler:UserProfileServiceAdvHandler-upm:UserProfileManager

-processGetProfileRequest:GetProfileResponse-processCreateProfileRequest:CreateProfileResponse-processDeleteProfileRequest:DeleteProfileResponse-processModifyProfileRequest:ModifyProfileResponse

interfaceUserProfileService

+addUserProfile:int+getUserProfile:UserProfile

UserProfileServiceAdvHandler

-localPeerID:String-discovery:DiscoveryService

+findClosestMatch:UPMAdvertisement

UserProfileManager

-model:XMLModel

+createRoot:void+checkMerge:void+receiveMoveCommand:void

XMLModel

-doc:Document

+createElement:void+createAttributeElement:void+getAttribute:String

interfaceQueryHandler

+processResponse:void+processQuery:int

interfaceModule

+init:void+startApp:void+stopApp:void

Figure 8-18: User Profile Module class diagram

The UserProfileManager and XMLModel classes together handle the persistent storage of profiles and are described in more detail below.

UserProfileManager The UserProfileManager class handles creation, deletion, retrieval and modification of locally stored profiles. It calls the XMLModel class to do the actual manipulation of the XML document. The design of the User Profile module is similar to the VFAT module and therefore it also filters full pathnames received. The filtered pathnames remove the parent groups that are not found locally. This is needed as the local profile storage on a peer may contain multiple fragments and the method of storage does no reflect the full path name. The XMLModel class therefore requires filtered pathnames in order to handle the local storage.

XMLModel Profiles are stored persistently in XML documents and the XMLModel class is responsible for reading from and writing to this persistent storage. The document consists of the two profile elements; the User Profile and the Group Profile. The elements are nested to reflect the

Page 60: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 60 of 93

position within the user profile tree. All the elements are enclosed within the main UPM element. An example of the basic structure of the document is as follows: <UPM> <GROUP> <!-- Group details --> <GROUP> <!-- Group details --> <USER> <!-- User details --> </USER> </GROUP> <USER> <!-- User details --> </USER> </GROUP> <GROUP> <!-- Group details --> <USER> <!-- User details --> </USER> </GROUP> </UPM>

Figure 8-19: The basic UPM structure

The diagram above omits any of the details of specific profiles but shows the fact that profiles may be nested and that Group Profiles may contain both User and Group Profiles. Please refer to the appendix for a sample User Profile XML document.

Providing the service As with all services, not all peers will be able to, or required to, run this service. Whether a particular peer provides the User Profile service is specified in the Peer Profile. For the User Profile service, the peer profile will specify the following information: • The path of the XML document that will store the profiles. • The maximum fragment size: the maximum number of profiles that can be held by the

peer. • The minimum fragment size: the preferred number of profiles to be held by the peer. This information is represented in the Peer Profile as follows: <PeerProfile> <!-- Other Services --> <UserProfileService ProvidesService="True"> <MaxFragSize>10</MaxFragSize> <MinFragSize>5</MinFragSize> <XMLFilePath>UserProfiles.xml</XMLFilePath> </UserProfileService> <!-- Other Services --> </PeerProfile>

Figure 8-20: The UPM section of the Peer Profile

Page 61: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 61 of 93

8.4. VPFS Access Module The VPFS access module exists on top of FMM, VFAT, UPM, and policy service modules. The major role of this module is to authenticate users, process the user input, and coordinate all file operations within the VPFS system. As a result, the access module is an interface to the VPFS applications to perform file manipulations and combines the four VPFS service modules underneath it. Within the VPFS access module, a critical class called user state is designed. The purpose of having this class is to maintain the profile information of a user who has been logged into the VPFS system. In this way, the user state information service can be provided by the same peer as the user has been logged in. There are two main types operations that can be performed in the VPFS access. These involve logging in and manipulation of files.

8.4.1. Operations in VPFS Access

Login Operations

UserShell LoginManager UserProfileService

UserState2: new

1.2: checkPassword

1.1: getUserProfile1: loginUser

Figure 8-21: Login Operation Sequence Diagram

As shown in the above diagram, three objects are involved in the login operation. The UserShell object is responsible for receiving username, password, and the user’s group path name entered by the user. It passes these three parameters to the LoginManager object. The LoginManager is responsible for obtaining a user profile from the UPM, verifying username and password, and logging users into the system. If the above three steps succeed, the LoginManager will create the UserState object which is responsible for maintaining the user state information.

Page 62: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 62 of 93

File / Directory Operations Within the VPFS Access module, a class named FileOperations is designed to co-ordinate all the file operations. The operations described in the FMM section of this chapter (Put, Get, Delete and Copy) are invoked using the VPFS Access module. In addition the following operations are also available: • Move File: This operation can be used to move a file or directory from one directory to

another within the VPFS system. This operations is based on two other operations: copy a file to the directory with the specified virtual path name and delete the original one.

• Make Directory: This operation can be used to create a new directory in the VPFS system. This includes creating a new directory inode in the VFAT module.

• Change Directory: This operation is used to allow a user to change their current working directory. This includes checking whether the user has the access permissions of that directory.

• List File/Directory: This operation can be used to list the contents of an existing directory. The contents include files and/or subdirectories. In order to achieve this, the list operation needs to get the inodes for all the files and directories from the VFAT module. Users can also see the access permissions of each file or directory by specifying -l option. This operation is similar to the Unix command ls.

• Change Access Permission: This operation can be used to change the access permissions of a file or directory within the VPFS system. This operation is similar to the Unix chmod command.

As mentioned in the beginning of this section, the VPFS Access module combines four VPFS service modules together and co-ordinates all the file manipulations. Therefore, it needs to be able to interact with the four service modules correctly during the execution of a file operation. That is the reason why the FileOperations class is designed. Here we cannot describe the sequence of events for all the file operations listed above. We hence only describe the Put File operation as an example to show how the FileOperations object communicates with other services modules. The operation of Put File is described as follows:

1. When the Put File operation is called, firstly, the putFile method in FileOperations class will check the access permission of the source file on the local file system. For example, if the user does not have read access permissions, the Put File operation will throw an exception.

2. Then the Put File method will try to get the inode of the specified virtual path name from the VFAT module to check whether the specified target file already exists. If it does exist, the system will ask the user whether to overwrite the existing file, otherwise it will just carry on processing the command.

3. The inode of the parent directory will then be retrieved from the VFAT to check whether the specified directory exists in the VPFS system and whether the user has write permissions of that directory. If the user does not have write permissions in that directory, the operation will throw an exception.

4. The FileManipulationService module will be called to find appropriate peers and store the file replicas. This method call will return the physical locations of the replicas.

5. If the above steps succeed, the Put File operation will then try to create the new file

Page 63: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 63 of 93

inode in the VFAT with the physical locations returned from the FMM. 6. If all the above five steps succeed, the operation will also succeed. Otherwise it will

rollback all the steps carried out. The sequence diagram of this operation is detailed as follows:

UserShell UserState VFATService FMMServiceFileOperations

4: createInode

2: getInode

1.2: checkWriteAccessPermissions

1.1: getUserProfile1: putFile

3: putFile

Checking if Inode already exists

Figure 8-22: "Put File" operation Sequence Diagram

8.4.2. The User Shell The VPFS user shell is very similar to the Unix operating system’s user shell. It provides an environment for the user to perform file operation. The available commands that can be used in the user shell are described as follows: • cd – This command allows users to change the current working directory which will call

the changeDirectory method in FileOperations class. • rm – This command allows users to remove a file or directory. This command will call

delete method in the FileOperations class. -r option can be specified to delete a directory recursively.

• ls – This command allows users to list the contents under a directory. If a user wants to check the access permissions and some other information about a file, -l option can be specified.

• mkdir – This command allows users to create a new directory. • chmod – This command allows users to change the access permissions of a file or

directory. • help – This command displays all the commands that can be used by the user. If a user specifies a policy parameter for a particular command, these parameters will be parsed and the appropriate storage policy object will be created. If a user does not know how to use a command, they can type the word help followed by the name of the command and they will be given a brief usage message.

Page 64: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 64 of 93

8.5. VPFS Initialisation VPFS is built using a peer-to-peer network model because not only are they more robust, with decentralised administration, they are also generally simpler to deploy than a client-server solution, especially as the number of members increases. Our aim is to create a system in which any peer, with minimal user input, can be easily configured to provide and/or access any of the VPFS services. The system should be able to dynamically discover peers in the VPFS network and the services that they provide. The JXTA framework provides all the necessary means to achieve this. In particular it has the concept of peer groups, which are groups of peers that interact and provide services to each other. It also implements discovery services which allow the discovery of peers, groups and the services they provide, dynamically, via JXTA advertisements. Peer groups have been used in the VPFS system to allow an administrator to set up a group of machines so that they will all share a common set of data stores, directory structure and user profiles. The administrator is given the means to create a root directory, root user group and a root administrator for the particular peer group. Once these are defined, any peer may join the peer group dynamically, and access and/or provide VPFS services. Peers in groups will each have a Peer Profile which states what services they provide and the properties of those services. These are created as JXTA services so that they can be controlled by the JXTA framework, ensuring the no peers outside the relevant group may access the services.

8.5.1. Peer Discovery Peer groups in the JXTA framework are organised hierarchically. All peers by default belong to the World peer group, in which they are assigned peer IDs. These IDs are unique locally and have a high probability of being unique globally. The next level down introduces net peer groups which are groups created by users to allow sets of peers to communicate. Within a peer group at least one peer must be designated as a rendezvous peer. These peers will help the initialisation of other peers by propagating advertisements about peers, peer groups and services and therefore allowing them to interact with each other.

8.5.2. Peer Groups Peer groups have two main properties - a unique ID and a list of services that they provide. In order for other peers to join the group, the properties must be advertised using peer group advertisements via a rendezvous peer. The JXTA framework provides a rich set of group management features that allow peers to join or resign from a group, as well as restricting membership dynamically. In its simplest form though, creating or joining a group consists of each participating peer sending out the peer group advertisement. In addition to this group advertisement, peers must also identify themselves. They will therefore also send out peer advertisements which will contain their group ID and their peer ID as well as other pertinent information. This will allow peers to find other members of the same group.

Page 65: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 65 of 93

Within the VPFS system, if a user wishes to create a new peer group, they must generate a file which contains IDs for the new group. This file must be copied to all machines participating in the peer group to ensure that they all use the same group IDs. When a peer is initialised, it will read the IDs from this file and be able to create the appropriate group advertisements. It will therefore be able to interact with the peers within the same group. In order to get the VPFS group started, it is necessary for the first peer in the group to provide the following pieces of information: • Details of the root administrator • The root directory • The root user group This information is gathered via the GenerateGroup application which will also generate the VPFSGroupConf.dat file. The identity of this peer does not need to be stated explicitly, as other peers will be able to obtain its advertisements via other rendezvous peers.

8.5.3. Peer Initialisation For each peer in the VPFS system, a Peer Profile must be created. This is an XML document that describes the services provided by a peer as well as the properties of each service. Each service module will read the relevant section of the Peer Profile and act accordingly. For example, if a peer provides the File Manipulation service it will read from the Peer Profile the properties of the service, such as the amount of free space available on the peer. This information will then be advertised to other peers. The peer profile is generated when a peer is first initialised. The default profile states that the peer provides none of the services. If any services need to be enabled then the profile must be edited appropriately and the application restarted.

8.5.4. The Initialisation Module The initialisation module is composed of the following components: • The DeploymentManager class which is responsible for creating/joining a group and

publishing peer group advertisements for the various services available. It will also register a new PeerInit object as a JXTA Application.

• The PeerInit class is registered as a JXTA Application. This means that it is the first piece of code that will be run when joining a group successfully. It is therefore used, as its name suggests, to initialise a peer. It is responsible for obtaining a reference to each of the VPFS services and start up any that are located on the current peer. It also starts up the user shell if necessary.

• The IDGenerator class is responsible for generating IDs for peer groups and the various services provided by the system. It is also used to create the VPFSGroupConf.dat group ID file as well as reading IDs from the file.

• The PeerProfile class represents the Peer Profile found on each peer. The Peer Profile is an XML document that describes the services of the VPFS system, which services the peer provides and the properties of the services. This class is used to not only create the Profile, but also to read it and access/modify service specific sections of the profile.

Page 66: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 66 of 93

A class diagram of the initialisation module is given below.

starts

get peer profile values

starts

initialises initialisesinitialises

starts

PeerProfile

-doc:Document-root:Node

+createPeerProfile:void+getServiceNode:Node+getServiceAttribute:String+getServiceProperty:String+getProvidesService:Boolean

UserShell

net.jxta.service.Serviceinterface

FMMService

+startService:void+getFile:void+putFile:String

net.jxta.service.Serviceinterface

VFATService

+startService:void+getInode:void+putInode:int

net.jxta.service.Serviceinterface

UserProfileService

+addUserProfile:int+getUserProfile:UserProfile+deleteUserProfile:void+addGroupProfile:void

DeploymentManager

-newGroup:PeerGroup-netPeerGroup:PeerGroup

+initializeJXTA:void+createCustomPeerGroup:void+createModuleImplAdv:ModuleImplAdvertisement

net.jxta.platform.ApplicationPeerInit

-fmService:FileManipulationService

+init:void+startApp:int

StartVPFS

+main:void

Entry point into theVPFS system

The JXTA discoveryservice will locate theappropiateimplementationclasses for theseinterfaces.

Figure 8-23: Initialisation Module class diagram

8.6. Peer Communication Module The communication taking place between peers within the context of a particular service should be transparent to components outside the module in question. Whenever a function of an interface is called, the particular implementations of that service that run on the peers communicating, will exchange messages. These messages will be in the form of XML documents and sent via the Resolver service of the JXTA framework. When necessary they will open pipes through which they can send the required files. Moreover, each service provider in a peer will advertise its service properties by propagating XML documents via the Discovery services of the JXTA framework. The structure and content of those documents corresponds to the service’s properties and their corresponding values.

Page 67: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 67 of 93

8.6.1. Request/Response Messages The request/response messages are handled via two methods provided by the QueryHandler interface of the JXTA framework: public ResolverResponseMsg processQuery(ResolverQueryMsg query) public void processResponse(ResolverResponseMsg response)

Figure 8-24: The QueryHandler Interface

Messages are embedded within the ResolverQueryMsg object and the mechanism provided allows for asynchronous communication. Each peer group has a reference to the Resolver service, which is initialised by the JXTA platform. In order to exchange messages within the context of a particular service, implementations of the QueryHandler interface must be registered with this service. The object is registered under a common name which will allow differentiation between several implementations of the QueryHandler interface on a same peer. The Resolver service can then be used to send queries to registered peers and return their responses. When a peer A sends a query via this service, with a specified query message and target peer ID as parameters, the Resolver service will propagate the query to its destination (peer B). This propagation is done via Rendezvous peers. Receipt of a query will invoke the processQuery method at peer B and an appropriate response is created and returned to peer A. Receipt of this response will invoke the processResponse method of peer A. The figure below, presents the structure of the Message classes that allow a peer to deal with the exchange of messages.

extendsextendsextendsextends extendsextends

extendsextends

VPFSMessage

-document:TextElement

+getDocument:Document+readDocument:void+toString:String

VPFSRequest

+getRequestType:String

VPFSResponse

+getResponseType:String

VFATRequest

+getInodeName:String+getInodePath:String

VFATResponse

+getInodeName:String+getInodePath:String

FMMRequest

+getSourcePathName:String+setSourcePathName:String

FMMResponse

+getSourcePathName:String+setSourcePathName:String

UPMRequest UPMResponse

Figure 8-25: Messages Class Diagram

Page 68: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 68 of 93

There is a parent abstract class, VPFSMessage, which defines the methods which all its children should implement along with the attributes they inherit. On the level below, two more abstract classes have been defined and correspond to the request and response messages that will be exchanged, namely VPFSRequest and VPFSResponse. Each one defines attributes and methods applicable to the type of message exchanged. Finally, on the bottom level, there exist the request/response classes relevant to each particular service, which inherit all attributes and methods from their parent abstract classes. Each of these classes are extended by the specific messages used by each of the VPFS services. These messages are described below in the corresponding modules.

File Manipulation Module There exist three types of request/response messages that are exchanged between peers that provide the particular service:

a. Get Request: used when a peer requires a file to be sent to it, i.e. stored locally, by another. In that case, the requesting peer will send a request message to the peer that holds the required file. The request will contain the name of the file to be sent along with an advertisement of the pipe created for the file transfer. The receiving peer will send the file through the existing pipe.

b. Copy Request/Response: used when a peer requires a file to be copied from one location to another. The requesting peer will send a request that includes the file name and the storage policies to another one that provides the File Manipulation service. The response from the latter will contain the set of peers that will store the file.

c. Delete Request/Response: used when a peer requires all the replicas of a file held on several peers, which run the File Manipulation service, to be deleted. The request contains information about the file to be deleted, whereas the response holds information as to whether the operation was successful or not.

The figures below, present the structure of the FMM Request and Response classes that allow a peer that provides the File Manipulation service to deal with the exchange of messages.

Page 69: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 69 of 93

extendsextends

extends

FMMRequest

+getSourcePathName:String+setSourcePathName:String+getDocument:Document+readDocument:void+getRequestType:String

CopyFileRequest

-sChars:StoragePolicies

+getStoragePolicies:StoragePolicies+setStoragePolicies:void+getTargetPathName:String+setTargetPathName:void

DeleteFileRequest

-currentPhase:String

+getCurrentPhase:String+setCurrentPhase:void

GetFileRequest

-pipeAdv:PipeAdvertisement

+getPipeAdvertisement:PipeAdvertisemen+setPipeAdvertisement:void

Figure 8-26: FMM Request Class Diagram

extends extends

FMMResponse

+getSourcePathName:String+setSourcePathName:String+getDocument:Document+readDocument:void+getResponseType:String

CopyFileResponse

-targetIDs:Vector-targetPathName:String

+getTargetPathName:String

DeleteFileResponse

-peerID:String-success:Boolean

+getPeerID:String+setPeerID:void+getSuccess:Boolean

Figure 8-27: FMM Response Class Diagram

Page 70: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 70 of 93

The XML code below presents a Get Request sent by a peer to another that provides the File Manipulation service: <?xml version="1.0"?> <!DOCTYPE GetFileRequest> <GetFileRequest> <SourcePathName> /ucl/profile.dat </SourcePathName> </GetFileRequest>

Figure 8-28: A GetFileRequest Message

VFAT Module The types of request/response messages that are exchanged between peers that provide the particular service are listed below. They all return a response stating whether or not the operation has been successful. In case of an error, a string detailing the error is included:

a. Create Inode: Used when a peer requires an inode of a file or directory to be created. The requesting peer sends a request to another one that provides the VFAT service. The request contains the contents of the Inode object that it wishes to create. The receiving peer will return a response.

b. Get Inode: Used when a peer requires an inode. The requesting peer will send a request to another peer that provides the VFAT service, specifying the name of the inode in question and its path. The receiving peer will return a response containing the requested inode.

c. Modify Inode: Used when a peer requires the modification of an inode. The requesting peer sends a request that contains the modified Inode object to the VFAT peer holding the particular inode. The latter will return a response.

d. Delete Inode: Used when a peer requires the deletion of an inode. The requesting peer sends a request to the VFAT peer holding the inode to be deleted, specifying the name of the inode, its type (file or directory) and its path. The receiving peer will return a response.

e. List Inodes: Used when a peer requires a listing of the inodes contained in a directory. The requesting peer sends a request to the VFAT peer that holds the particular directory information (obtained through advertisements, see section 8.6.2) specifying the directory path whose contents are required. The receiving peer will return a response that contains a Vector object with all the inodes in the particular directory.

f. Move Fragment: Used when a peer requires moving a fragment (a branch of the directory tree) of the VFAT table it holds to another peer that provides the VFAT service. The requesting peer will send a request containing the actual fragment to be moved. The receiving peer will send a response specifying whether the fragment has been moved, or not. The response will also contain information about the top-level directory of the fragment moved.

The figures below, present the structure of the VFAT Request and Response classes that allow a peer that provides the VFAT service to deal with the exchange of messages.

Page 71: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 71 of 93

extends

extends

extends

extends

extends

extends

VFATRequest

+getInodeName:String+getInodePath:String

CreateInodeRequest

-inode:XMLInode

+setInodeContents:XMLInode

DeleteInodeRequest

-isLink:Boolean-currentPhase:String

GetInodeRequestModifyInodeRequest

-inode:XMLInode

ListInodesRequest

-dirPath:String

MoveFragmentRequest

-dirPath:String-dirName:String-moveItem:MoveItem

Figure 8-29: VFAT Request Class Diagram

Page 72: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 72 of 93

extends

extendsextends

extends

extends

extends

VFATResponse

+getInodeName:String+getInodePath:String

MoveFragmentResponse

-exceptionString:String-success:Boolean-dirPath:StringdirName:String

ModifyInodeResponse

-exceptionString:String-success:Boolean

ListInodesResponse

-dirPath:String-inodes:Vector

GetResponse

-exceptionString:String-inode:XMLInode

DeleteInodeResponse

-success:Boolean-exceptionString:String-peerID:String

CreateInodeResponse

-exceptionString:String-success:Boolean

Figure 8-30: VFAT Response Class Diagram

The XML code below presents a Get Inode Request for the file /z15_3/vpfs.doc sent by a peer to another one that provides the VFAT service: <?xml version="1.0"?> <!DOCTYPE GetInodeRequest> <GetInodeRequest> <InodePath>/z15_3/</InodePath> <InodeType>file</InodeType> <InodeName>vpfs.doc</InodeName> </GetInodeRequest>

Figure 8-31: A GetInodeRequest Message

User Profile Module The types of request/response messages that are exchanged between peers that provide the particular service are listed below. They all return a response stating whether or not the operation has been successful. In case of an error, a string detailing the error is included:

a. Create Profile: Used when a peer requires the creation of a profile for a user or group. The requesting peer sends a request to another peer that provides the User Profile service. This request contains the Profile object to be created. The receiving peer sends a response specifying whether the profile has been created, or not.

Page 73: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 73 of 93

b. Get Profile: Used when a peer requires to obtain a user or group profile. The requesting peer sends a message to another one that provides the User Profile service, specifying the name of the profile, its type (user or group) and its path. The receiving peer sends a response containing the requested Profile object.

c. Delete Profile: Used when a peer requires the deletion of a profile. The requesting peer sends a request to another peer that provides the User Profile service specifying the name of the profile, its type (user or group) and its path. The receiving peer sends a response merely as a confirmation that it could delete the profile. The response contains the ID of the requesting peer, the name of the profile, its type (user or group) and its path and a Boolean specifying whether the operation was successful or not.

d. Modify Profile: Used when a peer requires the modification of a profile. The requesting peer sends a request to the peer that holds the User Profile in question. The request contains the modified UserProfile object. The receiving peer sends a response specifying whether the profile has been modified, or not.

e. List Profiles: Used when a peer requires the listing of all the profiles within a group. The requesting peer sends a request to the peer holding the User Profiles in question, specifying the name of the group whose profiles are required. The receiving peer sends a response that contains a Vector object with all the relevant User Profiles. If the group specified is empty, the Vector object returned is also empty.

f. Move Fragment: Used when a peer requires moving a fragment (a branch of the directory tree) of the User Profiles it holds to another peer that provides the User Profile service. The requesting peer will send a request containing the actual fragment to be moved. The receiving peer will send a response specifying whether the fragment has been moved, or not. If the operation was successful, the response also contains information about the top-level directory of the fragment moved.

The figures below, present the structure of the UPM Request and Response classes that allow a peer that provides the User Profile service to deal with the exchange of messages.

Page 74: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 74 of 93

extends

extends

extends

extends

extends

UPMRequest

CreateProfileRequest

-profilePath:String-profileName:String-profileType:String-profile:XMLProfile

DeleteProfileRequest

-profilePath:String-profileType:String-profileName:String-isLink:Boolean-currentPhase:String

GetProfileRequest

-profilePath:String-profileType:String-profileName:String ListProfilesRequest

-groupPath:String-profilesType:String

MoveFragment Request

-groupPath:String-groupName:String-moveItem:MoveItem

Figure 8-32: UPM Request Class Diagram

Page 75: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 75 of 93

extends

extends

extends

extends

extends

extends

ListProfilesResponse

-groupPath:String-profilesType:String-profiles:Vector

UPMResponse

GetProfileResponse

-profilePath:String-profileName:String-profileType:String-exceptionString:String-profile:XMLProfile

MoveFragment Response

-profilePath:String-profileName:String-success:Boolean-exceptionString:String

ModifyProfileResponse

-profilePath:String-profileName:String-profileType:String-success:Boolean-exceptionString:String

CreateProfileResponse

-profilePath:String-profileName:String-profileType:String-success:Boolean-exceptionString:String

DeleteProfileResponse

-profilePath:String-profileName:String-profileType:String-success:Boolean-exceptionString:String-peerID:String

Figure 8-33: UPM Response Class Diagram

The XML code below presents a Get Profile Request sent by one peer to another that provides the UPM service: <?xml version="1.0"?> <!DOCTYPE GetProfileRequest> <GetProfileRequest> <ProfilePath>/ucl/</ProfilePath> <ProfileType>user</ProfileType> <ProfileName>john</ProfileName> </GetProfileRequest>

Figure 8-34: A GetProfileRequest Message

8.6.2. Advertisements Whenever a peer requires a particular service from another e.g. it needs to store a file in a peer that provides the File Manipulation service, it needs to be able to obtain information about the targeting peer such as its free space in the file storing scenario. During the design phase of the system, three main ways that the above could be done were identified:

a. Peers could be queried, as and when is necessary. b. Peer properties could be obtained from a central repository that holds all peer profiles. c. Peers could advertise their properties through broadcast.

Page 76: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 76 of 93

Each of the above options had advantages and disadvantages over the others, depending on the service that it would be used for. Thus, for a peer that provides the File Manipulation service and wants to make its free space known to others, option c seems more appropriate, as the peer’s free space is likely to be changing regularly. Querying the peer would introduce a lot of traffic and would not be scalable as the number of peers increased, whereas obtaining it from a central repository would imply that the information would be out of date in most cases. Having a central repository would also introduce a single point of failure, which is what we are trying to avoid. In the case of a peer providing the Policy service, the most appropriate solution seemed to be to query the peer and cache the response locally, as policies are not likely to change regularly, if at all. Overall though, advertisements are suitable in both cases and it was therefore decided that all information sharing amongst peers would be via advertisements. The JXTA framework provides advertisement types, but they need to be extended so that they can be used in our context. Thus, a new advertisement type, namely VPFSAdvertisement, is defined, which will extend and implement the Advertisement interface provided by JXTA. Figure 8.35 below, presents the structure of the Advertisement classes that allow a peer to deal with advertisements:

extendsextends

extends

FMMAdvertisement

-availableSpace:long-pipeAdv:pipeAdvertisement

+getAvailableSpace:long+getPipeAdvertisement:PipeAdvertisement+getDocument:Document+readAdvertisement:void

UPMAdvertisement

-availableSpace:int-roots:Vector

+getAvailableSpace:int+getRoots:Vector+getDocument:Document+readAdvertisement:void

VFATAdvertisement

-availableSpace:int-roots:Vector

+getAvailableSpace:int+getRoots:Vector+getDocument:Document+readAdvertisement:void

VPFSAdvertisement

-PeerID:String

+getPeerID1:String+setPeerID:void+toString:String

Figure 8-35: Advertisements Class Diagram

An Advertisement Handler will be running on each peer and will be responsible for discovering a particular service. It will also be responsible for publishing the services provided by the local peer using the Discovery service of the JXTA framework. The figure below, presents the structure of the Advertisement Handler classes:

Page 77: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 77 of 93

extendsextends

extends

FMMAdvertisementHandler

-localPeerID:String-properties:PeerProperties-discovery:DiscoveryService-pipeAdv:PipeAdvertisement

+findDataStores:FMMAdvertisement[ ]

VFATServiceAdvHandler

-localPeerID:String-discovery:DiscoveryService

+findClosestMatch:VFATAdvertisemen+findSpace:VFATAdvertisement

interfaceVPFSAdvertisementHandler

+advertiseService:void

UserProfileServiceAdvHandler

-localPeerID:String-discovery:DiscoveryService

+findClosestMatch:UPMAdvertisement

Figure 8-36: AdvertisementHandler Class Diagram

File Manipulation Module The FMMAdvertisement class is used to advertise the properties of a peer that provides the File Manipulation service (FM Peer). Moreover, through these advertisements, the means by which files can be sent to a peer are advertised. These are in the form of bi-directional pipes which each peer will have to listen to for incoming connections for file transfer. The XML code below presents a FMM Advertisement sent by an FM peer: <?xml version=”1.0” encoding=”UTF-8”?> <!DOCTYPE FileManipulationAdvertisement> < FileManipulationAdvertisement > <PeerID>..</PeerID> <AvailableSpace>..</AdvailableSpace> <Status> .. </Status> <jxta:PipeAdvertisement>..</jxta:PipeAdvertisement> </ FileManipulationAdvertisement >

Figure 8-37: An FMM Advertisement

The Available Space element, will allow a peer that wishes to send a file, to choose between several FM peers based on whether they can handle the file in question. Thus, a limited form of load balancing is achieved, as the load gets fairly distributed to all peers and is not targeted towards those which can be physically close, for example. The Status element gives information about the current state of the FM Peer. This can take the values of READY (i.e. ready to accept requests), DISABLED (i.e. temporarily offline) and BUSY (i.e. possibly in communication with a lot of peers, so it cannot handle any more requests). The status of an FM peer is likely to change frequently and thus the expiry time of the advertisements is set to a relatively low value i.e. 2-3 minutes.

Page 78: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 78 of 93

VFAT Module A peer that provides the VFAT service (VFAT peer) will need to advertise the following properties:

a. Fragments Held: The fragments of the overall VFAT structure that it holds. This will allow other peers to locate a particular inode on a closest match basis.

b. Available Space: The remaining space for storing inodes. Thus, given that when a VFAT peer is reaching its maximum capacity it has to move one or more of its fragments to other peers, it will be able to select those that can handle the fragments to be moved.

The XML code below presents a VFAT Advertisement sent by a VFAT peer: <?xml version="1.0"?> <!DOCTYPE VFATAdvertisement> <VFATAdvertisement> <PeerID> the identifier of the peer providing this service </PeerID> <AvailableSpace>99</AvailableSpace> <DirectoryRoot>/ucl/sonet/staff/research/</DirectoryRoot> <DirectoryRoot>/ucl/cs/dcnds</DirectoryRoot> <IsVFAT> true </IsVFAT> </VFATAdvertisement>

Figure 8-38: A VFAT Advertisement

User Profile Module A peer that provides the User Profile service (UP peer) will need to advertise the following properties:

a. Fragments Held: The fragments of the overall profile structure that it holds. This will allow other peers to locate a particular profile holder on a closest match basis.

b. Available Space: The remaining space for storing profiles. Thus, given that when a UP peer is reaching its maximum capacity it has to move one or more of its fragments to other peers, it will be able to select those that can handle the fragments to be moved.

The XML code below presents a UserProfile Advertisement sent by a UP peer:

Page 79: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 79 of 93

<?xml version="1.0"?> <!DOCTYPE UPMAdvertisement> <UPMAdvertisement> <PeerID> the identifier of the peer providing this service </PeerID> <AvailableSpace>99</AvailableSpace> <DirectoryRoot>/ucl/is/ </DirectoryRoot> <DirectoryRoot>/ucl/cs/dcnds</DirectoryRoot> <IsUserProfileManager> true </IsUserProfileManager> </UPMAdvertisement>

Figure 8-39: A UPM Advertisement

Page 80: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 80 of 93

Chapter 9: Test Plan

9.1. Introduction The aim of this testing plan is to verify the features of the entire VPFS system. Our testing mainly focused on several file operating commands in the VPFS user shell such as Put, Get, Delete, etc. We also attempted to test whether several service modules within VPFS can collaborate with each other correctly during execution of a command. Here, our major test is in the File Manipulation Service module. The User Shell is responsible for receiving the user input, checking access permissions and obtaining file inode, etc. The File Manipulation Service module is responsible for managing files, data store selection and file transfer between several peers in the group based on storage policies specified by the user. The main testing points are described as follows: • Starting the System: Testing whether all the service modules in the system can be started

correctly. • User Shell Testing: Testing that when a user types a command, the user shell reacts

correctly. • File Operation Testing: The core service of the VPFS system is to allow users to

manipulate their files. Here we attempt to test the following major commands under different test conditions.

o Put o Get o Delete o Change directory o List directory o Change mode o Make directory

• Concurrency Testing: Testing the system’s stability and reliability when several users manipulate the same file at the same time.

• Platform Independence Testing: Testing the system’s ability to run under a number of different architectures and platforms.

• Wide Area Network Testing: Testing the system’s ability to function over geographical dispersed sites.

9.2. Strategy

9.2.1. Testing Tools

Java Logger The Java Logging APIs in the java.util.logging package were used during the testing of the system. The Logger class has the ability to generate customisable log messages to help observe the high level flow of operations within the system. Each message contains the time of the message as well as the names of the class and method which created them. Some of the main features of the package are:

Page 81: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 81 of 93

• Multiple Ways to Store, Display and Represent Log Messages: Log messages can be sent to the console, memory, an output stream, a file or even a listening socket. The messages can also be formatted as either simple text messages or as XML.

• Controlling Messages Logged: The package allows you to assign a priority level to the log messages. There are 6 levels: SEVERE, CONFIG, INFO, FINE, FINER and FINEST with SEVERE being the highest priority level and FINEST being the lowest. The number of messages being produced can be changed by changing the logging threshold. If the threshold was set to INFO then only messages of level INFO and above, would be logged.

• Message Filtering: Messages can be filtered using custom filters. An example would be a filter that does not allow the logging of consecutive messages with the same level.

For this system, log messages of level INFO or higher are displayed on screen and all messages are stored in a log file. Through these levels, the logging messages indicate their type and severity. For example, a message of type SEVERE would be used if there was a serious error such as the inability to successfully initialise the JXTA framework.

Eclipse Debugger Logging messages are useful to see the high level flow within a system, but to see finer detail or to track down specific problems, a debugger was used. The development tool Eclipse contains a built-in debugger which was used extensively. It allowed us to do the following: • Pause and resume program execution. • Trace through code on a line by line basis. • Analyse variables and their values at specific instances. • Assign values to variables manually. • Evaluate expressions based on live objects.

9.2.2. Unit Testing A unit test refers to testing the methods within an individual class. Unit testing is done to ensure that individual methods work as expected and therefore that the overall class works as expected. This is done at the unit stage rather than the integration stage as any problem could be more easily pinpointed. Unit testing was done by creating Test classes containing static main methods that called the appropriate method within the desired class. The unit tests were carried out individually by those responsible for the particular classes.

9.2.3. Integration Testing Integration testing took place when integrating several related components. Each component was tested during unit testing and then combined with other classes to be tested again using a new test class. This was to ensure that the individual components interfaced correctly with others to form a cohesive integrated module.

Page 82: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 82 of 93

9.2.4. Systems Testing When all the individual modules were completed, they were integrated into a working system. This system was tested by testing the individual user functions available under a variety of circumstances. The results of these tests are given in the Appendix.

9.3. Testing Environments Three testing environments were used during the Systems testing. The majority of testing was carried out in the MSc DCNDS lab B01.

9.3.1. Environment A • Four Sun machines running Solaris 8 • One machine running the VFAT service • One machine running the User Profile service • All four machines running the File Manipulation service

9.3.2. Environment B • One Sun machine running Solaris 8 • One machine running Windows XP • One machine running Linux 2.4 • Each machine running a number of VPFS services, alternating periodically.

9.3.3. Environment C • One Sun machine running Solaris 8, running all VPFS services. • One machine running Linux 2.4, running only the User service. This machine was located

outside the UCL network.

Page 83: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 83 of 93

Chapter 10: Evaluations and Future Developments

10.1. Introduction This chapter incorporates a comparison between the work carried out in the previous project and the current work. It includes an evaluation of the system based on the results of the tests carried out. It then analyses whether the objectives have been met and the last section describes the future enhancements that can be done to the VPFS system.

10.2. Comparisons with Previous Work This project was done in a previous year, as mentioned in Chapter 1. This section will therefore discuss what was reused from that project and what was changed. It will also outline the flaws in the previous project and how they were overcome.

10.2.1. General Differences The main difference of our project from the previous one is the emphasis on modularity and robustness. Increased modularity means that the system is easier to understand and more easily extendable. A system of this nature requires a high level of robustness. This aspect was not looked at in great detail in the previous project and therefore we have introduced a higher degree of robustness to all aspects of the project.

10.2.2. Communication and Advertisements There were a few problems in this section, some of which were related to a change in a key JXTA API and the others related to a lack of structured design for the communications layer. The first problem was a change in the method signature of one of the methods in the JXTA QueryHandler interface. This interface is implemented by all the VPFS services and is responsible for processing query/response messages. It is therefore a core component of the communication layer. This was solved by analysing the basic structure and functions of the method and re-implementing it to match the revised interface. The second problem was the fact that components of the communication layer were scattered throughout the other service modules. The majority of this code was duplicating functionality and was in fact usually duplicated code. Such a structure is undesirable as any change made to the behaviour would need to be changed in each of the affected locations. It would be easy to forget to update one of the sections, leaving the system in an inconsistent state and making it harder to track down problems. This was solved by creating a dedicated communications module and a new package for the module. This was done by analysing the existing code to see where it was duplicated and creating a structured communications hierarchy. This consolidated the repeated code in classes "higher up" in the hierarchy and also removed some of the communications infrastructure from the separate service modules into the new communications module. The

Page 84: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 84 of 93

specific request/response messages for each service were left where they were as it did not make sense to move them into the generic communications module. The individual messages were modified to fit the new communications hierarchy and reused.

10.2.3. Hash Values File integrity was not taken into account in the previous project. For example, a file may have been replicated across a number of peers. When it comes to retrieving a replica of the file, there is no provision for checking whether the replica being received is identical to that originally created, or even that it is the same as the other existing replicas. The file may have been corrupted due to problems on a particular disk, or even modified by someone with local access to a peer holding a replica. This was solved with the introduction of MD5 hashes being stored for each file. This hash value is stored in the file inode along with the other attributes of the file. Whenever a file is placed onto the VPFS system, the hash value of the file is calculated and stored. When file retrieval is attempted, the hash value of the replica is compared with that stored in the inode. If they differ, then this replica is clearly no longer valid and an attempt is made to get the next replica. The invalid replica location is also removed from the inode. If this removal were to be logged, it could serve as an indication of which peers are consistently providing invalid replicas. This could indicate a failing disk on that peer or some other similar error. While not the ideal solution, the use of hash values offers rudimentary file integrity and is useful as it will work on any type of data, such as encrypted data.

10.2.4. Replication Thresholds Replication of files is an integral part of both projects. In the previous project though, each replication level corresponded to a fixed number of replicas. This was problematic because if a peer that held a replica disconnected, the system would try to create a new replica so as to meet the required replication level. Moreover, it would need to create new replicas for all other files located on that peer. This would introduce unnecessary traffic as that peer might reconnect within a short period of time. This was solved by the introduction of replication thresholds for each level. For example, replication level three has a lower threshold of 4 replicas and an upper threshold of 8. With a range, the system becomes more immune to the effect of peers connecting and disconnecting dynamically. With the dynamic nature of the network, guaranteeing a replica range is more viable than a specific number of replicas.

10.2.5. File Store Synchronisation As mentioned earlier, the previous project did not take into account the potentially dynamic nature of the network. In particular, they did not take into account what actually happens when a peer that has previously been a file store, reconnects to the network. There is no indication of what the behaviour should be.

Page 85: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 85 of 93

This has been solved by introducing the idea of file store synchronisation. This is a process that will happen when a file store peer first connects to the network. The peer will examine the files within its file store and retrieve the corresponding inodes from the VFAT. It will then compare the hash value of the local files with those in the VFAT, and either keep or discard the files in the file store based on the result. At the end of synchronisation, the peer will hold only valid files in its file store and may begin processing FMM requests.

10.2.6. File Transfer The previous project used file transfer to replicate files across the network. While in theory this worked, in practise there was a major error due to problems with the programming. The problem was the fact that only 64 bytes of a file were ever stored in the system. This was clearly a major flaw. This was solved by redesigning the way the file transfer was achieved, in order to ensure that the entire file was correctly transferred. In addition, as the previous project used JXTA version 1.0, there were no security features that could be used to achieve secure file transfer. However, JXTA version 2.0 provides a set of security algorithms that can be used to achieve that. Within our design, an encryption mechanism was included in the FileStorage class which could encrypt the file data if a user specifies a security parameter.

10.2.7. File Storage Policies In the previous project there was a fixed number of file storage policies: availability, permanency, security and connectivity. These policies were hard coded into the system and it was therefore not easily extensible. This was solved by creating a new Policy module to manage an arbitrary number of policies. Each of the policies would be represented by a Policy object which would contain a name, a description and some code. The name would be displayed to users when they wished to see a list of available policies. The description would contain not only a description of the policy, but also the parameters the policies would accept and their meaning. Again, this would be displayed to users when requested. The code would be the implementation of the actual policy. For example, the code for implementing a security policy would be different from that for an availability policy. The introduction of a separate module meant that policies could be easily added to, or removed from, the system. Due to time constraints, the implementation of the policy module had to be abandoned, although the framework is still there, ready for implementation.

10.3. Objectives Met In order to evaluate the system, it is useful to analyse the original stated aim of the project. As presented at the beginning of Chapter 1, the goal of this project was to build a platform independent file store system in a Grid environment. The system should place files on

Page 86: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 86 of 93

physical stores based on policies defined by the user and provide a transparent access to the different storage resources. To achieve such a goal, certain objectives were set by the group, as described in Chapter 3. These objectives are used in this chapter as the basis for evaluating the system.

10.3.1. Objective 1 – Distributed File Store This objective has been fully met. The VPFS system provides a file store facility allowing users to share files. A well defined interface of the system provides a set of commands that allow users to manipulate files. The storage structure of files is hierarchical, enabling files to be ordered in directories. It also encompasses access control mechanisms that enable users to specify access permissions for their files.

10.3.2. Objective 2 - Storage of Files Driven by a Policy Specification This objective has been partially met. It was given a low priority as the client did not need such a high degree of control for file storage. Therefore, we only implemented one policy, the level of availability, which we deemed to be important to achieve system robustness. Thus the storage of files is driven only by this policy. The user is able to specify what level of replication the system should use for their files. The system can be enhanced in the future with more policies.

10.3.3. Objective 3 - Interoperability in a Grid Environment This objective has been more or less fully met. Firstly the VPFS system supports location transparency, providing an abstraction to the user that the system is in fact distributed. Users view it a single local file store. Secondly, the system is platform independent. Utilising technologies such as JXTA for the peer-to-peer infrastructure and the Java programming language, which are both platform independent, enabled the deployment of the system across multi-platform infrastructures. The peer-to-peer infrastructure of the VPFS system enables peers to connect or disconnect dynamically, without affecting the deployment of the system. Thus it provides a certain level of reliability. The system also supports replication mechanisms thus providing high availability of files and hence eliminates a single point of failure. In addition, it also provides a good utilisation of storage resources as it allows all the machines participating in the system to store files. Unfortunately, the system does not support users that are dispersed across different geographical locations. Thus it limits the use of the system to within a local area network

10.3.4. Objective 4 – Meet Client’s Needs Only a relatively small amount of this objective has been met. In terms of manipulation of files, our system is capable of storing files transparently across a number of hosts. It also partially improves the scalability of our client’s system as it no longer relies on mount points in order to write result data. It is only partially successful because mount points are still

Page 87: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 87 of 93

needed to read the main database. This is due to the fact that we have not incorporated the ability to read data remotely; only retrieve it. Finally, due to time constraints, the group was not able to develop an API for the VPFS system, whose purpose would be to enable integration with the client’s application. As such, the VPFS system is a standalone tool.

10.4. Future Enhancements Due to the project’s limited timescales, certain functions need yet to be implemented before VPFS can be a fully functioning system. Thus, the group believes that any future work should concentrate on the following areas, not necessarily in the order shown:

a. Creation of the system API: In order for the system to be integrated with an existing application and not functioning as a stand-alone tool, an API should be designed to take make use of the current implementation.

b. Implementation of the read/write API: The way the system was designed, a user uses the put function to place a file onto VPFS and the get one to retrieve it locally, before viewing or processing it. If the files handled are quite large in size though, and especially if a user would only need to retrieve a file for viewing purposes, transferring it from the remote host to the local one, would be greatly inefficient and time consuming. Thus, another skeleton API for a read function has been added and needs to be implemented, for a user to be able to view a file straight from the remote host. For similar reasons a skeleton write API has also been added and needs to be implemented.

c. Security integration on all levels of the software: • Profile Authenticity:

Authenticating the profiles is essential in a system that provides security. One way this could be achieved is through the use of Certification services that make use of Public Key infrastructure. A group of nodes could be designed to provide that service, in a similar way that others provide the VFAT, User Profile or File Manipulation services. Those peers would create digital certificates for peer profiles with a specified validity period and the VPFS system could be implemented to accept only the peers with a certified profile. It should be apparent that having a group of peers providing this service, instead of a single one, is preferable as it eliminates the single point of failure.

• Secure File Transfer: The JXTA framework provides the means to achieve secure communication via the use of secure pipes. They have been included in the recent version of JXTA and could be easily used to replace the pipes currently used in the VPFS system.

• Encryption/Decryption of request/response messages:

Apart form just securing the file transfer, one could argue that encrypting the messages prior to transfer and decrypting them after receiving them is also essential. That is the case especially if the files transferred contain sensitive

Page 88: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 88 of 93

data. Otherwise, someone could intercept the messages transferred, realise that a file is about to be transferred to an FM peer and get access to the file’s data.

• Secure Storage: In the VPFS system, when a user puts a file onto it, that file could end up in any of the participating peers. Thus, it might be better if they were encrypted to prevent access from anyone that has access to the participating peers. Any file modification, can already be picked up, as the files are hashed and this value is stored in their inode.

d. Replication of the tree models of VFAT/UP modules:

To improve the system’s robustness, as mentioned in the System Design chapter, the VFAT/UP fragments need to be replicated to several hosts. Brief research done on this area indicates that there might be some Java packages to assist programmers to accomplish this task.

e. Introduction and implementation of more policies:

So far, the only policy actually implemented has been the replication one. Others could be easily added, as the framework for their implementation already exists.

Page 89: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 89 of 93

Chapter 11: Conclusions The aim of this project was to design and implement a virtual policy-driven file store that would reside on top of our client’s system, with the goal of solving the problems it currently faces, as discussed in Chapter 4. GenTHREADER incurs the following problems:

• Scalability • Bottleneck server • Poor utilisation of storage

VPFS was intended to make the system much more flexible, provide a better utilization of storing resources and eliminate any bottleneck issues. The current version of our system does help to eliminate some of the problems of our client’s system. As a system by itself, it does not incur any of the problems that GenTHREADER currently faces. Unfortunately though, it does not fulfil all the requirements of our client and thus the VPFS cannot be integrated with GenTHREADER at the moment. One requirement that the VPFS does not fulfil is the remote read/write operations. This requirement is essential for GenTHREADER as the cluster of machines would be able to read and process data from the servers remotely as it is currently doing, via the use of mount points. In addition, the current version of the VPFS operates as a stand alone program. An API is needed in order for GenTHREADER to utilise our system. The way our system currently works can be seen as a stand alone virtual archiving file system instead, where users can organise, manipulate and share their archived files. Users can archive their files by transferring them from the local file system of their machines, to the virtual archive store. In order to retrieve a file, they utilise the get command which basically performs a service similar to the ftp operation and transfers the file from any of the machines that the file may reside to the local machine of the user. Users view the system as a single virtual store, where the system may utilise a number of machines. The VPFS system can be extended in the future to incorporate an API, supporting the remote read/write operations and an SSH authentication process. These aspects, when incorporated would aid with the integration of the two systems. The following section illustrates how the two systems would work together. As described in Chapter 4 our client has two main servers (FREKE and TITIN) and a cluster of approximately 170 machines. The cluster connects only to FREKE and if they need data from TITIN, then a symbolic link will be established between FREKE and TITIN. The machines in the cluster are used as computational resources where data are processed. The data results are then stored back to FREKE. The way that the VPFS system was intended to integrate with GenTHREADER can be illustrated in the diagram below on the next page.

Page 90: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 90 of 93

CLUSTER

P o w e rE d g e2 4 5 0

P o w erE dg e2 4 5 0

FREKE Server

TITIN Server

FMM

FMM

FMM

FMM

VFAT

Requesting inode

Sendinginode

Requesting fileof the inode

Sending file ofthe specifiedinode

Processingand locating filestore to storedata results

Figure 11-1: VPFS in conjunction with GenTHREADER

Most of the machines in the cluster will be configured to provide the File Manipulation service (FMM) as shown above. This will enable them to act as file stores so that the data results can be stored on any of these machines. The machines will process the data, and once complete, they will put the result data into the VPFS network. This will eliminate the need to send the data back to FREKE every time analysis is completed, thus removing a big load from this server. In addition, the cluster’s hard disk storage capacity (in the terabyte range) will not be wasted. The servers will also be configured with the FMM service. Some other machines, as illustrated in the above diagram will be configured with the VFAT service. These machines will perform the mapping between virtual and physical location of files. This will aid in locating which machine holds what data including the servers. In this way, we remove the need of configuring mount points between the cluster machines and the servers and hence make the system more easily scalable. In addition the cluster machines will be able to access the TITIN server directly when needed. This eliminates the procedure of establishing symbolic links between the servers when machines need data from TITIN. As such this will remove extra load from FREKE. In the above scenario, the cluster machines store results within the VPFS network rather than on server, which reduces the traffic between the cluster and the server. However the results must eventually be consolidated on the server. As such, it could be configured to perform the get command periodically (e.g. once a week) to retrieve all the current result data from the VPFS network. The data stored in VPFS could then be removed, clearing the way for new data to be stored. This is illustrated in the diagram below:

Page 91: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 91 of 93

PowerEdge2450

FREKE Server

FMM

Cluster

Get Command

Transferring data

FMM Figure 11-2: Consolidation of Data

The JXTA framework will handle the communication between hosts and in collaboration with the service that the VFAT module provides, will make GenTHREADER much more scalable. The administrator will be able to add new servers, remove existing ones or even migrate data, from one server to another, easily. Through the JXTA framework, the new hosts will advertise their resources to the rest of the hosts in the network. The machines that hold the VFAT module will then update their directory trees (add new inode entries and remove or modify existing ones) based on the advertisements they receive. The VPFS system will improve the overall system of our client. It will aid to eliminate the issues it currently faces:

1. FREKE server will no longer be a bottleneck for the network. 2. The utilisation of storage resources will be improved dramatically. 3. GenTHREADER will be more scalable.

Page 92: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 92 of 93

References Web Sites P2P Frameworks JXTA Community, JXTA project, [online] http://www.jxta.org/ Jini Community, The Community Resource for Jini Technology, [online], http://www.jini.org/about/technology.html JINI, AR - Jini Architecture Specification , [online], http://wwws.sun.com/software/jini/specs/jini1.2html/jini-spec.html#1029470 Globus Community, The Globus Project, [online], http://www.globus.org/ Kelly Truelove, (2001, April 25th), The JuxtaNet, [online], http://www.openp2p.com/pub/a/p2p/2001/04/25/juxtanet.html P2P StratVantage, P2P4B2B – Non-Commercial Peer-to-Peer Efforts, [online], http://www.stratvantage.com/directories/p2pworkgroups.htm Susan Breidenbach, (2001, July 30th), Feature: Peer-to-peer potential, [online], http://www.nwfusion.com/research/2001/0730feat.html File Systems N. Dwight Barnette, (2001, May 29th), File System Types, [online], http://courses.cs.vt.edu/~internet/notes/chap1/ostypes.html Martin Hinner, (2000, August 22nd), Filesystems How To, [online], http://www.tldp.org/HOWTO/Filesystems-HOWTO.html Charles M Kozierok, Hard Disk Logical Structures and File Systems, [online], http://www.pcguide.com/ref/hdd/file/ Distributed File Systems James Gwertzman, (1995, April 25th), Distributed File Systems, [online], http://www.eecs.harvard.edu/~vino/web/push.cache/node7.html CSCI.4210 Operating Systems Distributed File Systems, [online], http://www.cs.rpi.edu/~ingallsr/os/mod10.2.html Daniel A. Menascé, (1997, August 27th), Distributed File Systems Part I, [online], http://cs.gmu.edu/~menasce/osbook/dfs1/ NPACI Research Technology Thrust Data-Intensive Computing, SRB - The Storage Resource Broker (Version 1.1.8), [online], http://www.npaci.edu/DICE/SRB/OldReleases/SRB1_1_8/SRB.htm#Major%20features%20of%20the%20SRB

Page 93: Z15 Group 3 - VPFS 2003 - Group Report · User Profile module: handles the manipulation and storage of the user profiles. e. VPFS Access module: handles the co-ordination between

VPFS 2003 DCNDS Group 3

Page 93 of 93

Grid IBM, Power Grid – Enabling on demand business with Grid computing, [online], http://www-1.ibm.com/servers/eserver/central/feature.html?ca=eservercentral&me=w&met=inba&p_creative=b Ian Foster, Carl Kesselman, Steven Tuecke, The Anatomy of the Grid Enabling Scalable Virtual Organizations, [online], http://www.globus.org/research/papers/anatomy.pdf SDSC , SRB Storage Resource Broker, [online], http://www.npaci.edu/DICE/SRB/