EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A...

19
1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412 COMP4905 Honours Project, Winter 2011 Carleton University, Ottawa, Canada Supervisor: Prof. Michel Barbeau Date: April 13 th , 2011

Transcript of EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A...

Page 1: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

1

EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage

Prepared by: Andrew McCallum

100708412

COMP4905 – Honours Project, Winter 2011

Carleton University, Ottawa, Canada

Supervisor: Prof. Michel Barbeau

Date: April 13th, 2011

Page 2: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

2

Abstract

Backups are a critical piece in the use of computers, yet many consumers do not even contemplate the

situation they may find themselves in should their computer fail. The goal of this project was to design

and implement a peer-to-peer backup solution using storage that is sourced from members of the same

backup (or storage) community and provide similar functionality to popular cloud-based storage

solutions. To facilitate the peer-to-peer component of the proposed solution the BitTorrent protocol

was used, while a management protocol was developed to facilitate other facets of the solution such as

storage community and member management. Using BitTorrent technology and the developed

management protocol, EggBasket was born. Currently the solution supports backing up and recovery of

shared folders, as well as folder watching for modifications.

Page 3: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

3

Table of Contents

1 Introduction .......................................................................................................................................... 4

1.1 Proposed Solution ......................................................................................................................... 4

1.2 Summary of Results ...................................................................................................................... 5

1.3 Outline of the Report .................................................................................................................... 5

2 Technology Background ........................................................................................................................ 6

2.1 BitTorrent ...................................................................................................................................... 6

3 Implementation .................................................................................................................................... 7

3.1 Structure of the Solution............................................................................................................... 7

3.1.1 Client-side ................................................................................................................................................ 7

3.1.2 Server-side ............................................................................................................................................... 8

3.2 Third-Party Libraries ...................................................................................................................... 9

3.3 EggBasket Management Protocol ............................................................................................... 10

3.3.1 RegisterPacket ....................................................................................................................................... 10

3.3.2 TorrentPacket ........................................................................................................................................ 10

3.3.3 StoragePacket ........................................................................................................................................ 10

3.3.4 RestorePacket ........................................................................................................................................ 11

3.3.5 ResponsePacket ..................................................................................................................................... 11

3.4 Server Configuration ................................................................................................................... 11

3.5 Client Configuration .................................................................................................................... 12

4 Results ................................................................................................................................................. 12

4.1 Performance ............................................................................................................................... 13

4.2 User Interface ............................................................................................................................. 13

4.3 Normal Program Flow ................................................................................................................. 14

5 Future Work ........................................................................................................................................ 14

6 Conclusion ........................................................................................................................................... 15

7 Works Cited ......................................................................................................................................... 16

8 Appendix A – Class Diagrams .............................................................................................................. 17

8.1 EggBasket Server ......................................................................................................................... 17

8.2 EggBasket Client .......................................................................................................................... 18

8.3 Packet Types ............................................................................................................................... 19

Page 4: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

4

1 Introduction The cost per megabyte of hard drive storage has been decreasing as a result of technological

innovations since their introduction in 1956 (Grochowski & Halem, 2003); larger capacities are now

possible and available. As a result, computers nowadays are being delivered with ever-increasing

capacity hard disk drives. For the average consumer, it might prove to be impossible for them to make

use of that available capacity under normal usage and over the expected lifetime of the system.

With hard drives being cheaply available in large capacities, the overall quality and intended lifetime of

the drive should be questioned; especially with consumer grade drives. Consumer grade drives are not

engineered and produced with the same requirements in mind compared to enterprise grade drives,

and as such they are priced comparably lower. A hard drive failure could be catastrophic to the average

user, with all of their “eggs in one basket” (i.e. family pictures, videos, etc.). The only real way to avoid

this is simply to back-up the data... But to where? For the average user, they may have no method to

backup their data; they may not even think about backing up their data until it's too late. Even if there is

a backup method in place, they are not protected from the threat of physical damage (i.e. fire, flooding,

etc.); off-site backups add yet another layer of complexity to the mix.

Of course, the popular option that comes to mind is cloud storage. Dropbox is a good (and popular)

example of the cloud storage solution that provides the functionality of syncing files between all of your

computers and the cloud; not only backing up your files but allowing you to access them from a web

interface on the go. The service is available for free and you receive 2 GB of free storage to start with,

but for any storage requirements above that they charge for on a subscription-based model. Cloud-

based storage is a good option, but if you’re looking for more storage capacity and are trying to avoid

extra monthly costs you’re out of luck. An easy and free option that should be available is to backup

your friend’s files and in turn they backup yours; a peer-to-peer backup solution. Everyone has some

free space available on their hard drives, so why not let peers backup to your system and have yours

backed up in the process?

1.1 Proposed Solution To remedy the absence of a free peer-to-peer backup solution, this project aims to implement one using

crowd-sourced storage, with multiple users sharing some of the available free capacity of their hard

drive with others, in effect creating a “storage community”. Using this solution, the threat of hard drive

failures can be mitigated by offloading copies of critical files to other trusted computer systems but

distributing pieces of them amongst all peers to maintain confidentiality of files. In addition to this,

users may choose to allocate more of their provisioned amount of storage in this community to

redundancy (i.e. using more provisioned space for extra copies) but have less usable provisioned storage

as a trade-off.

Configuration of the client should be simple and straightforward, and after initial configuration of the

client there should be no further intervention required on the part of the user. The only indication that

the user should have that the client is running and functioning properly is the presence of an icon in the

system tray.

Page 5: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

5

1.2 Summary of Results As a result of this project a client/server solution entitled “EggBasket” has been developed to allow for

peer-to-peer file backups. It has been implemented using the BitTorrent protocol, in addition to the

development of the necessary client/server components.

The server-side software is responsible for running the

BitTorrent tracker and managing storage communities &

their members. On the client-side, the software is

responsible for downloading the torrents of other

community members, seeding its own backup data, as

well as watching its shared folders for changes. All of

the communication between the server and its clients is

facilitated by both the BitTorrent protocol and a

management protocol developed solely for this software.

1.3 Outline of the Report For the remainder of this report we will look at the technology behind this solution, the implementation

of the solution, results from this project, and finally we will investigate future work that could be

conducted on this project. The Technology Background mainly looks into the BitTorrent protocol that

this solution heavily relies on. For Implementation, the report outlines the architecture of the solution,

the design of the management protocol, as well as the third-party libraries used and server/client

configuration. Results will look into the performance and user experience associated with this software,

while Future Work investigates improvements that can be made to this project.

Figure 1 Intial configuration of EggBasket client

Page 6: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

6

2 Technology Background The largest technological piece that this solution relies on is BitTorrent; without this technology the

solution would not be architected in the same way as it has been. BitTorrent technology allows for a

very mesh-based peer-to-peer network to be created to share files; relying only on a centralised server

to keep track of peers available.

2.1 BitTorrent Since its introduction in 2001 (Stone, 2007), BitTorrent technology has become one of the de facto

technologies in use by the peer-to-peer file sharing community. This can be seen by the fact that as of

October 2010, BitTorrent traffic in North America accounts for 34.31% of upstream traffic and 8.39% of

downstream traffic during peak hours. (BitTorrent Still Dominates Global Internet Traffic, 2010)

The technology works by first having a server in place that is accessible on the Internet, known as the

tracker. The tracker is responsible for keeping a record of the torrents it is “tracking” and a record of all

the peers who are active on those torrents. With regards to peers, they can be sub-divided into two

classifications, known as “leechers” and “seeders”. The terms leeching and seeding are inherit when it

comes to BitTorrent technology, with leeching implying that the peer does not have 100% of the torrent

downloaded and is thus downloading (leeching) off of other peers, whereas peers who are seeding have

a full copy of the torrent’s contents and are sharing the contents with leechers. The feature that

differentiates BitTorrent technology from standard file downloading is the fact that even if a peer is

leeching (i.e. doesn’t have everything downloaded), other leechers can download pieces of the data that

they do have and therefore drastically decrease the amount of time required to download the full

contents of the torrent. The performance of BitTorrent in comparison to the standard methods of

downloading files from the Internet can have measurable results; for example, the download of the film

“X-Men 3” took one hour less from BitTorrent’s now defunct Torrent Entertainment Network (TEN) than

with the download methods of traditional online movie rental locations. (Stone, 2007)

Other BitTorrent terminology includes:

Pieces – All of the data contents defined for a given torrent are broken up into pieces of a

specified size (usually 512KB, 1MB, 2MB, or 4MB) so as to make it easier to distribute amongst

peers and not have to deal with differing file sizes.

Swarm – The number of peers that are active for a given torrent; includes both seeders and

leechers in the statistic.

Share ratio – The ratio for a client defined by the amount downloaded from the swarm versus

the amount the client has uploaded. Any value less than 1.0 is viewed as a negative share ratio.

(BitTorrent vocabulary, 2011)

Recently, BitTorrent has come under a lot scrutiny in North America, with some ISPs turning to traffic-

shaping practices to limit the amount of traffic being used by the technology. In Canada specifically, Bell

Canada uses traffic-shaping practises on its own Bell Sympatico customers but has extended it to

wholesale customers as of 2008 citing that it first started traffic-shaping practises “in an effort to curb

Page 7: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

7

abuse of the network by a small minority of users who were using peer-to-peer applications to share

large files, including movies, which was slowing down speeds for all users”. (Bell rejects call to curb

traffic shaping, 2008) This kind of traffic management is detrimental to the legal uses of BitTorrent

technology, such as the distribution of game updates and Linux ISO images.

3 Implementation For the implementation of the project, the Java programming language (specifically the Java 6 API) was

used along with the Eclipse IDE being used for the development environment. In addition to these tools,

the implementation of this project could not have been completed without the use of third-party

libraries as the BitTorrent client/tracker implementation was one of the libraries used.

3.1 Structure of the Solution

Figure 2 Architecture of EggBasket solution

From an overall structure point-of-view, the architecture of this solution looks relatively simple as per

the BitTorrent type of architecture. Behind the scenes though, the picture is very different; the

implementation relies heavily upon multi-threading as there are many functions that need to be carried

out constantly. Within the implementation, the source code for the server and client components are

fairly disjoint with the exception of the basic network implementation (i.e. packet definitions, etc.) being

used in both.

3.1.1 Client-side

On the client-side, the application thread initialises the

GUI components and ensures that the client has been

previously registered with the server before starting the

actual client. In the case that the client has not

previously registered, the GUI components take over

and have the user configure the application for the first

use.

After the client has been configured for a storage

community, the main thread of the client is responsible Figure 3 Threads for EggBasket client

Page 8: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

8

Figure 4 Threads for EggBasket server

for initiating the connection with the management server. After the connection has been established,

the client performs the following actions:

registers with the management server by sending a HELLO RegisterPacket,

refreshs the server with a newly-generated torrent for the client’s shared folders,

spawns a new TorrentTask to seed the torrent,

asks for all of the torrents for other community members and spawns TorrentTasks for all of the

torrents received, and

spawns a single instance of FolderWatcher to watch all of the shared folders for changes.

Once all components have been initialised, the client thread simply waits until a modification in one of

the shared folders occurs. If a file/folder in one of the shared folders is modified in some way, the client

will review the events and then refresh the torrent and send the updated torrent to the management

server for refreshing of all the other community members.

Besides ensuring that the client’s torrent is in sync with the shared folders, the client also has to receive

and process any refreshes that have occurred on other clients. This functionality is provided by the

management socket thread, which receives the REFRESH TorrentPacket and starts a new download task

using the new torrent provided.

3.1.2 Server-side

Upon starting up the EggBasket server component, the

following actions occur:

clear out the previous list of torrents and peers,

load previously created storage communities and

members from semi-persistent storage,

start the thread for the server’s listening socket to

accept incoming requests, and

spawn a thread to start the tracker service.

The order of thread spawning within the server component can be seen in Figure 4, with the main

thread at the top and each level underneath representing the threads created by the thread above.

The server component does not have any GUI components as it is part of the backend pieces of this

solution and therefore does not necessarily have to present a nice appearance. The main purpose of the

EggBasket server is to run the tracker, to communicate with all of the EggBasket clients, and maintain

the model consisting of storage communities and community members.

When a request comes into the management server, the listening socket accepts the connection and

spawns a new socket processor thread to process and act upon the contents of the packet. By doing this,

the server can handle multiple requests at one time, as opposed to processing all incoming packets in a

single thread which in effect creates a large bottleneck for the server.

Page 9: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

9

3.2 Third-Party Libraries In order to reduce the time for implementation of the solution (and to reduce the rewriting of code

where possible), the following third party libraries were used in this project:

Apache Commons IO,

XStream (and its required libraries),

jpathwatch, and

Bitext BitTorrent Client/Tracker implementation library (and its required libraries).

The Apache Commons IO library is used within the project to perform file-related functions that are

supplementary to the standard Java IO functions readily available with the standard Java Development

Kit (JDK). This library facilitates the avoidance of re-implementing functionality that is already widely

available.

As mentioned in the previous section, the EggBasket server provides a level of semi-persistence to the

storage community and community member data. This functionality allows the server to load the

aforementioned objects back into the current instance of the JVM. The library that has been utilised for

this functionality is the XStream library. In order to use the XStream library to save and load objects to

the file system, the classes that are going to be persisted had to be annotated with the necessary

XStream annotations. Compared to other methods of achieving this functionality, the XStream library is

a good option as the persisted objects are stored as XML. Compared to saving serialised objects to the

file system, objects stored as XML are human-readable and neatly formatted.

To watch the shared folders specified by the client in an efficient manner the jpathwatch library proved

to be an excellent choice as it relies upon JNI (Java Native Interface) to use the underlying operating

system functions intended for the folder-watching functionality. This functionality is supposed to be

included in the release of Java 7, but this library is compatible with Java 5. This compatibility is excellent

to have as users could potentially have Java 5 installed, and it allows the project to remain using the Java

6 API and not forcing the solution to be implemented using the Java 7 API.

The library that is core to the solution being developed for this project is the BitTorrent protocol, client,

and tracker implementation provided by the Bitext BitTorrent library. This library is one of the few Java

BitTorrent implementations that are tuned towards being used as a library. In order to use the Bitext

library in this project, the source version had to be used in order to conduct a lot of modifications to the

code. Most of the modifications were cosmetic and changes to make use of generics in that were

introduced in Java 5.0. Aside from these house-keeping changes, there were a lot of modifications to

make the library compatible for the solution that had been envisioned, as well as some bug and

arithmetic fixes. Some functions of the library were extracted from specific classes and changed into

interfaces, allowing for specific functionality to be swapped in with ease. This allowed for functionality

like an ACL for piece authorisations to be introduced into the code.

Page 10: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

10

3.3 EggBasket Management Protocol In order to implement and allow for the extensive functionality that was planned for, a separate

protocol had to be developed. The BitTorrent protocol does allow for some extensions, but as the

project planned to have a management server managing storage communities and their members it

seemed a lot simpler and less complicated to have both protocols run side-by-side.

In addition to this, rather than uploading client torrents to the BitTorrent tracker using the conventional

HTTP methods, the torrent is instead “tunnelled” to the tracker through the management protocol. By

uploading client torrents using this method, one possible exploit hole to the overall system is closed as

there is no longer a need for a file upload component of the tracker to be running.

When implementing the management protocol, a design decision was made to use Java object

serialisation rather than writing directly to the output stream of the socket. This proves to be the better

(and easier to implement) method when the transferring of torrents over the socket connection to

clients is taken into consideration. Multiple request types were required and they have been grouped

into multiple packet types. The packet types include: RegisterPacket, TorrentPacket, StoragePacket, and

ResponsePacket; these are outlined below as well as in a class diagram (see Appendix A – Class

Diagrams).

3.3.1 RegisterPacket

The RegisterPacket packet type is responsible for handling messages of the following types:

REGISTER_COMMUNITY, JOIN_COMMUNITY, and HELLO. As expected, the REGISTER_COMMUNITY and

JOIN_COMMUNITY messages have to do with the creation of and joining a storage community

respectively. The HELLO message is the first thing clients send to the management server to register

themselves as active; the server verifies that they have access to the community they are configured to

be a part of at this point. Clients are the only side of the solution that utilise this packet type, as the

server has no need to send messages of this type to other systems.

3.3.2 TorrentPacket

The TorrentPacket packet type is responsible for handling messages of the following types: REFRESH,

REMOVE, and GETALL. When the files in a watched directory change, the peers within the storage

community have to be notified and updated with a “refreshed” torrent of the client’s files. As a result, a

TorrentPacket with the REFRESH message is sent to the management server with a new torrent and the

management server fires off a copy of the torrent to all of the other peers in the storage community.

The REMOVE message type is currently unused but is meant to be used in the instance that a peer has

failed or left the storage community and as a result the torrent for the peer in question should be

removed from all of the members of the storage community. For the GETALL message, this is used to

retrieve the torrents for all of the members of the storage community; clients use this message when

they first start in order to get the most up-to-date versions of the files being seeded by their peers.

3.3.3 StoragePacket

The StoragePacket packet type is responsible for handling messages of the following types: ALLOCATION,

and REDUNDANCY. The ALLOCATION message indicates a change in the storage allocation of the client

Page 11: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

11

to the storage community, be it an increase or decrease. Compared to the ALLOCATION message, the

REDUNDANCY message serves as a notification to the management server to either increase/decrease

the redundancy level applied to the client; which in turn affects how the pieces are distributed and

assigned to peers within the storage community. The only logic that is required on the client-side with

regards to redundancy is to take the level into account when calculating the client’s allocated space in

the storage community (as a larger redundancy level will decrease the amount of useable storage

available to the client).

3.3.4 RestorePacket

The RestorePacket packet type is responsible for all functionality relating to the restoration of data in

the event that a failure has occurred. There are two messages associated with this packet:

CAN_RESTORE, and DO_RESTORE. The difference between the two messages is that the first message is

meant to verify that the specified client ID can be restored, whereas the latter message DO_RESTORE is

meant to signal that the client will be conducting a restore. In order for the client to be able to perform

a restore, the resulting ResponsePacket is populated with the latest torrent that is registered with the

management server.

3.3.5 ResponsePacket

The ResponsePacket packet type is the type of packet that is always received as a result of a request

that requires a response. The available response types for this packet are: SUCCESS, CONFIRM, and

FAILURE. Depending on the result returned, the data contained within the packet is populated

accordingly. The main difference between the two positive outcome results is that SUCCESS implies that

there is a piece of specific data/object that is being returned, whereas a CONFIRM response only implies

that the functionality requested was completed with no issues or that the required information was

updated.

3.4 Server Configuration For configuration on the server-side, all that is required is the configuration of the tracker component

provided by the Bitext BitTorrent library. The two files in question are Mapper.xml and tracker-

config.xml; providing the class mapping for the HTTP service, and storage directory specification along

with other constants respectively.

The class mapping in the Mapper.xml configuration file used by the Simple Web library provides the

necessary information to set up the tracker service implemented by the TrackerService class. This

information includes a service definition which points to the TrackerService class (which in turn

implements the Service interface), and a resolution definition which defines which path in a HTTP URL

maps to the specified service. In the case of the Bitext tracker implementation, the “/announce*” path

maps all requests to http://tracker-server/announce to the TrackerService service.

In addition to the tracker web service configuration outlined above, there is also the Bitext-specific

configuration for the tracker service. Specifically, the tracker-config.xml file defines directories and files

for storing pertinent torrent-related data, as well as the name of the tracker and what port it listens on.

With regards to files and directories, the only required directory is the store for the torrent files and two

Page 12: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

12

XML documents are required; one to keep track of the torrents being tracked on the tracker and a

second to record the peers that are currently active on specific torrents.

3.5 Client Configuration On the client side, there is no configuration

required on the part of the end user that is

completed outside of the GUI provided.

The client provides an initial configuration

window when the software is run for the

first time on the computer (refer to Figure 1)

and asks for a storage community name to

join/create. After that initial configuration,

the user only has the options presented to

them in the main options window available

to customise (see Figure 5); this includes options for defining shared folders, setting the level of shared

storage and redundancy level desired, as well as the option to reset membership (i.e. start over at the

initial configuration).

On the backend, the client options are stored in the registry (in the case of Windows) under the

following key:

[HKEY_CURRENT_USER\Software\JavaSoft\Prefs\eggbasket]

Under this key, there are multiple entries that correspond to the options presented in the options

window. The configuration entries include:

• client_id – This is the unique identifier for the instance of the EggBasket client installed on

the local computer.

• community – This entry represents the storage community that the client with the above

client identifier is registered and an existing member of.

• storage_alloc – Configures the amount of storage that the user has decided to share with

the other members of their storage community.

• redundancy – Sets the amount of redundancy for the files being backed up, as configured by

the user; 1.0 is the default setting for this option.

• folders – Defines what folders are to be backed up by peers in the user’s storage community,

represented in the registry as a vertical bar separated list of folders (i.e. shared|docs|home).

4 Results This project was successful in implementing a tangible solution, albeit with only basic functionality and

lots of room for future work. Currently, the core functionality of the solution works; users are able to

backup up their shared folders to peers within the storage community and if required, the client can be

used to restore data in the event of a failure. As for the storage allocation and piece ACLs, the ground

Figure 5 Main options window for EggBasket client

Page 13: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

13

work is in place but the logic is still required in order to have those two features functioning properly

(refer to Figure 6 for an example of a storage allocation scheme).

With the focus of this project

implementation concentrating

on the core functionality of the

proposed solution, currently

only the default “shared”

folder is allowed. That being

said, the folder watching

functionality is implemented

and does trigger the refreshing of torrents. Torrent refreshing functionally work at the moment, but the

threads from seeding the previous version of the torrent do not clean up after being told to shut down

and the method for file storage does not easily allow for torrent updates. The latter is due to the fact

that currently data from peers are stored in their raw piece form which was intended to allow for

implementation of piece ACLs with more ease, but in order to prevent potentially re-downloading data

there has to be a method implemented to map old pieces to new pieces (as the hashes would have

changed, etc.).

4.1 Performance From a performance perspective, the solution functions as expected; providing fast downloads in a LAN

environment. Most of the waiting for downloading torrent data is due to timed intervals for tracker

updates and the like. In terms of efficiency of the source code, the jpathwatch library allows us to watch

folders using the native functions provided by the operating system. This method is recommended as

the only other method is to poll the file system periodically, wasting resources and not keeping the

torrent refreshed as files change.

4.2 User Interface The user interface component of this project was intended to be very

minimal and not in the way of the user (see Figure 7). This is why a

system tray icon was chosen to display that the client is running as

opposed to no indication of the client’s status at all.

The only other interface components besides the tray icon are the two options-related windows for

configuration of the client. That being said, aside from configuration needs, the tray icon should be able

to indicate the status of the client in all instances.

Figure 7 EggBasket system tray icon

Shared A B C D Total

Allocate to

Alloca

te fr

om20 A 2 3 11 16

10 B 2 1.5 5.5 9

15 C 3 1.5 8.25 12.75

55 D 11 5.5 8.25 24.75

Total 16 9 12.75 24.75

Alloca

te fr

om

Figure 6 Example of storage allocation scheme

Page 14: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

14

4.3 Normal Program Flow When a client connects to the management server initially, it first has to register with the server and

then upload a newly generated torrent to refresh all other peers. After that is all completed, the client

simply seeds the data (refer to Figure 8).

Figure 8 Client registration and seeding

When another peer connects to the existing swarm (or has to refresh its torrent due to file modification),

the existing peers are updated with the new torrent (see Figure 9). Once the peers have refreshed the

client’s torrent the downloading of pieces can commence.

Figure 9 Refreshing a client's torrent, piece downloaded by peer

The above case will repeat multiple times over the lifetime of the client, as other peers will be modifying

their files and will require a refresh.

5 Future Work Over the course of this project, there were features that were intended to be included but were not able

to be included due to time constraints and implementation setbacks. Some of these features include

compression/encryption, measurement of peer availability, and storage community recovery from a

peer failure.

In addition to this features that were intended to be included, there are also features out of the scope of

this project that could be implemented; one possible idea is a web-frontend for managing storage

communities and user accounts. Another idea would be to implement UPnP (Universal Plug ‘n’ Play)

within the EggBasket client to allow for automatic mapping of ports to forward. A quick search on the

Internet revealed the Cling library (http://teleal.org/projects/cling) that could be integrated into the

solution to provide UPnP functionality.

Page 15: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

15

6 Conclusion In closing, the EggBasket solution shows a lot of potential and that this type of peer-to-peer backup

solution using BitTorrent technology does work. Not all of the points in the original project proposal

were implemented, but the core functionality is in place and is ready to be built upon and tweaked for a

stable release.

Page 16: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

16

7 Works Cited Bell rejects call to curb traffic shaping. (2008, April 17). Retrieved April 12, 2011, from CBC News:

http://www.cbc.ca/news/technology/story/2008/04/17/tech-bell.html

BitTorrent Still Dominates Global Internet Traffic. (2010, October 26). Retrieved April 12, 2011, from

TorrentFreak: http://torrentfreak.com/bittorrent-still-dominates-global-internet-traffic-101026/

BitTorrent vocabulary. (2011, April 03). Retrieved April 12, 2011, from Wikipedia, The Free Encyclopedia:

http://en.wikipedia.org/w/index.php?title=BitTorrent_vocabulary&oldid=422139538

Grochowski, E., & Halem, R. D. (2003). Technological impact of magnetic hard disk drives on storage

systems. IBM Systems Journal, 338-346.

Stone, B. (2007, February 26). Software Tool of Pirates Gets Work in Hollywood. Retrieved April 12, 2011,

from New York Times, The: http://www.nytimes.com/2007/02/26/technology/26bit.html

Page 17: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

17

8 Appendix A – Class Diagrams

8.1 EggBasket Server

Page 18: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

18

8.2 EggBasket Client

Page 19: EggBasket: A Peer-to-Peer Backup Solution Using Crowd ... · PDF file1 EggBasket: A Peer-to-Peer Backup Solution Using Crowd-Sourced Storage Prepared by: Andrew McCallum 100708412

19

8.3 Packet Types