General-Purpose, Internet-Scale Distributed Computing with Linked Process

57
General-Purpose, Internet-Scale Distributed Computing with Linked Process Linked Process Marko A. Rodriguez T-5/Center for Nonlinear Studies, Los Alamos National Laboratory http://markorodriguez.com September 10, 2009

description

There are many distributed computing protocols in existence today. Some serve as a solution for scientific computing, some as a middleware solution to large- scale systems engineering, and others as an “easy-to-use” service solution on the Web. What most of these protocols have in common is that they require a strong “handshake” between the machines utilizing each other’s resources. This coupling has rendered many distributed protocols to only be useful for a collection of machines owned and operated by a single organization (e.g. MPI/PVM computing) or for use by foreign machines with a very specific use case (e.g. RPC/Web Services computing). The former allows for general-purpose distributed computing and the latter allows for Internet-scale distributed computing. What if both types of functionality were to be merged? What does a general-purpose, Internet-scale distributed computing protocol look like? Linked Process [ http://linkedprocess.org ]

Transcript of General-Purpose, Internet-Scale Distributed Computing with Linked Process

Page 1: General-Purpose, Internet-Scale Distributed Computing with Linked Process

General-Purpose, Internet-Scale Distributed

Computing with Linked Process

Linked Process

Marko A. RodriguezT-5/Center for Nonlinear Studies, Los Alamos National Laboratory

http://markorodriguez.com

September 10, 2009

Page 2: General-Purpose, Internet-Scale Distributed Computing with Linked Process

1

Abstract

There are many distributed computing protocols in existence today. Some serve

as a solution for scientific computing, some as a middleware solution to large-

scale systems engineering, and others as an “easy-to-use” service solution on the

Web. What most of these protocols have in common is that they require a

strong “handshake” between the machines utilizing each other’s resources. This

coupling has rendered many distributed protocols to only be useful for a collection of

machines owned and operated by a single organization (e.g. MPI/PVM computing)

or for use by foreign machines with a very specific use case (e.g. RPC/Web Services

computing). The former allows for general-purpose distributed computing and

the latter allows for Internet-scale distributed computing. What if both types of

functionality were to be merged? What does a general-purpose, Internet-scale

distributed computing protocol look like?

Linked Process [ http://linkedprocess.org ]

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 3: General-Purpose, Internet-Scale Distributed Computing with Linked Process

2

A General-Purpose Requirement

• General-purpose: it is required that the code executed is notnecessarily defined by the executing device, but instead can bedefined by the requesting device.

? Language-agnostic: it is required that distributed code can, inprinciple, be written in any computer language.

? Safe: it is required that the execution of code be confined by clearlyspecified permissions on the executing device.

? Accessible: it is required that various types of computing resourcesbe accessible when permissions allow.

The notion of “general-purpose” is not defined according to a single dimension as there are various general-purpose approaches

which each attain certain types of generality. Please be generous in your interpretation of this term for the time being.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 4: General-Purpose, Internet-Scale Distributed Computing with Linked Process

3

An Internet-Scale Requirement

• Internet-scale: it is required that any device with an Internet connection(from a cell phone to a supercomputer) be able contribute and leveragecomputing resources.

? Decentralized: it is required that the computing resources are notcentralized or controlled by any one party.

? Discoverable: it is required that devices be discoverable by otherdevices needing to leverage their resources.

? Transient: it is required that devices coming online and offline areeasily incoporated and removed.

The extreme notion of “Internet-scale” goes beyond the 32-bit addresses of the IP protocol. There are more than 4,294,967,296

devices on the Internet. Thus, at the extreme, “Internet-scale” refers to all devices that can communicate and be communicated

with through the Internet.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 5: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P4

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 6: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P5

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 7: General-Purpose, Internet-Scale Distributed Computing with Linked Process

6

A Short Note on Other Protocol Discussions

• The symbol “ ” means that this feature is good with respect to thetwo previous requirements.

• The symbol “ ” means this feature is bad with respect to the twoprevious requirements.

There are always tradeoffs in computing. These are not “objective”valuations of the protocols discussed next. Valuations are in terms of therequirements set forth for the design of Linked Process. Linked Processwon’t solve all problems—it is “yet another distributed computingprotocol” that has a collection of unique features that make it useful forproblems with the aforementioned requirements.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 8: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P7

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 9: General-Purpose, Internet-Scale Distributed Computing with Linked Process

8

General-Purpose: Distributed Computing with MPI

• The Message Passing Interface (MPI) is a language agnostic protocolfor inter-process communication.

• Processes (i.e. threads of execution) communicate by passing databetween each other (i.e. messages).

? send(&x, p2): send data pointed to by x to process p2.

? recv(&y, p1): receive data from process p2 and store it at y.

int x[100];......send(&x, 2);

int y[100];......recv(&y, 1);

process 1 process 2

time

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 10: General-Purpose, Internet-Scale Distributed Computing with Linked Process

9

General-Purpose: Distributed Computing with MPI

marko> more hosts.txtmachine1machine2

marko> mpirun -machinefile hosts.txt -np=3 myProgramspawning myProgram on machine1...spawning myProgram on machine2...spawning myProgram on machine1...Executing...Done. Thank you, compute again.

marko>

Within myProgram the code branches depending on which “rank” its process is (e.g., with respect to the above example, rank is

either 1, 2, or 3). This way, each processor is doing a task particular to its self/rank.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 11: General-Purpose, Internet-Scale Distributed Computing with Linked Process

10

General-Purpose: Distributed Computing with MPI

MPI has been around since the early 1990’s and is a thoroughly appliedprotocol with various language ports (however, MPI tends to be more“C/Fortran”-ish as its intended use if high-performance computing).[Language-agnostic]

MPI developers have access to all machine resources—the limitingfactor being the operating system. [Accessible]

MPI implementations have large libraries of useful distributed computingpatterns (e.g. scatter/gather, broadcast, reduce, etc.). What you canthink of is what you can do. [General-purpose]

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 12: General-Purpose, Internet-Scale Distributed Computing with Linked Process

11

General-Purpose: Distributed Computing with MPI

MPI requires the MPI agent mpirun to have “ssh” access to thephysical devices that processes will be spawned on (i.e. the operatingsystem becomes the security manager). [Safe]

MPI processes have low-level access to the computing resourcesof the underlying machine and thus, introduces a security risk forforeign/unknown code. In short, you most likely own all your machines.[Decentralized]

MPI requires a set of machines and the compilation of all code beforeexecution. [Transient]

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 13: General-Purpose, Internet-Scale Distributed Computing with Linked Process

12

An Artist’s Interpretation of MPI

127.0.0.3

[1,2,7,9]

['c','b']

[1]

[42,31]

127.0.0.2

127.0.0.1

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 14: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P13

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 15: General-Purpose, Internet-Scale Distributed Computing with Linked Process

14

Internet-Scale: Distributed Computing withWeb Services

“Web services are frequently just Internet ApplicationProgramming Interfaces (API) that can be accessed over a network,such as the Internet, and executed on a remote system hosting therequested services.”

—from Wikipedia’s Web Services article.

• A Web Service is like an API.

• A Web Service is hosted by a remote device and can be accessed byanyone over the network.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 16: General-Purpose, Internet-Scale Distributed Computing with Linked Process

15

Internet-Scale: Distributed Computing withWeb Services

• REST-based (REpresentational State Transfer) Web Services make useof simple HTTP-based APIs. REST “verbs” are GET, PUT, POST, andDELETE.

http://chart.apis.google.com/chart? cht=p3&chd=t:60,40&chs=250x100&chl=LANL|Sandia

resource

parameters

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 17: General-Purpose, Internet-Scale Distributed Computing with Linked Process

16

Internet-Scale: Distributed Computing withWeb Services

• RPC-based (Remote Procedure Call) Web Services perform a set offunctions with a specification for sending process requests and receivingprocess results (e.g. Web Service Description Language – WSDL).

boolean aMethod(String x, int y);double bMethod(double z);

127.0.0.2

void cMethod() { Object[] params = {"marko", 29}; Stub s = new Stub("aMethod", params); boolean b = s.execute();}

127.0.0.1

Web Service

Service Requestor

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 18: General-Purpose, Internet-Scale Distributed Computing with Linked Process

17

Internet-Scale: Distributed Computing withWeb Services

Most modern languages have libraries to support the various WebServices models. They usually make use of “Web” protocols (e.g. HTTP)and encodings (e.g. XML, SOAP, JSON). [Language-agnostic]

Limited functionality and strict interfaces ensures that underlyingdevices can not be compromised. [Safe]

Web Service models have discovery mechanisms to locate services thatperform a particular function and take particular types of inputs andproduce particular types of outputs (e.g. UDDI). [Discoverability]

Web Services are web addressable and intended for use by anyone (notjust the developer of the service). [Internet-scale]

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 19: General-Purpose, Internet-Scale Distributed Computing with Linked Process

18

Internet-Scale: Distributed Computing withWeb Services

Web Services are defined for particular use cases and thus, thecomputing resources offered by a Web Service are defined by the developerof the Web Service. [General-purpose]

• e.g. Google Charts codebase is defined and can be used, but only forwhat it was created for (namely, to make graphical charts).

Web Services are tied to the Internet Protocol for device addressing andthus, reduces the types of devices that can offer services. [Internet-scale]

• i.e. its difficult to run a typical HTTP-based Web Service off my cellphone without some intermediary gateway mechanism.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 20: General-Purpose, Internet-Scale Distributed Computing with Linked Process

19

An Artist’s Interpretation of Web Services

f(x)

g(x)

f(object)

object

g(object)

object

127.0.0.1

127.0.0.2

127.0.0.3

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 21: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P20

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 22: General-Purpose, Internet-Scale Distributed Computing with Linked Process

21

Core Features of Linked Process

• Linked Process entities are not addressed by IP addresses. Theiraddressing scheme is location independent.

? Implication: Any device with an Internet connection cansupport or leverage a Linked Process cloud. Linked Process cloudssupport Internet-scale distributed computing.

• Linked Process allows users to execute any code on a remote device aslong as the code does not violate set security permissions.

? Implication: Code is migrated to remote devices for execution.Linked Process clouds support general-purpose distributedcomputing.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 23: General-Purpose, Internet-Scale Distributed Computing with Linked Process

22

An Artist’s Interpretation of Linked Process

127.0.0.1

127.0.0.2

127.0.0.3

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 24: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P23

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 25: General-Purpose, Internet-Scale Distributed Computing with Linked Process

24

Internet-Scale: XMPP Communication Protocol

• Linked Process rides atop the eXtensible Messaging and PresenceProtocol. This is what gives Linked Process its Internet-scale quality.

• XMPP was developed as an open protocol for Instance Messaging (GTalk,iChat, Jabber, etc.). Servers to cells phones can send and receive chatmessages.

• Interesting aspects of XMPP that make it useful for Internet-scaledistributed computing.

? XMPP creates a communication layer of abstraction above IP.? XMPP servers are XML packet routers between XMPP clients.? XMPP is an asynchronous message passing protocol.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 26: General-Purpose, Internet-Scale Distributed Computing with Linked Process

25

Internet-Scale: XMPP creates a communication layer ofabstraction above IP

• XMPP clients are identifier by Jabber IDs (JID).

? an example XMPP client JID is [email protected]? XMPP clients are IP independent.

• XMPP clients log into XMPP servers.

? an example XMPP server JID is lanl.gov? XMPP servers are IP dependent.

• XMPP clients maintain the same JID irrespective of their physicallocation (i.e. IP address). Think of how your IM chat client operates.

? [email protected] is my JID irrespective of its logged into the XMPPserver from a Los Alamos IP, New York IP, or Swedish IP.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 27: General-Purpose, Internet-Scale Distributed Computing with Linked Process

26

Internet-Scale: XMPP servers are XML packet routersbetween XMPP clients

<packet to="[email protected]" from="[email protected]" />

<packet to="[email protected]" from="[email protected]" />

Server

<packet to="[email protected]" from="[email protected]" />

Client

Server

127.0.0.1

rpi.edu

[email protected]/1234127.0.0.2

lanl.gov

127.0.0.4

127.0.0.3

[email protected]/5678

Client

t=1

t=2

t=3

[email protected]/1234 and [email protected]/5678 are fully-qualified client JIDs. Many clients (i.e. applications) can exist off the

same bare JID (e.g. [email protected]). Also, addresses can be fully-qualified to route the packet to a particular client.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 28: General-Purpose, Internet-Scale Distributed Computing with Linked Process

27

Internet-Scale: XMPP is an asynchronousmessage passing protocol

Server

<stream>...

<stream>...

Client

Server

<stream>...

<stream>...

<stream>...

<stream>...

127.0.0.1

rpi.edu

[email protected]/1234127.0.0.2

lanl.gov

127.0.0.4

127.0.0.3

[email protected]/5678

Client

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 29: General-Purpose, Internet-Scale Distributed Computing with Linked Process

28

Internet-Scale: XMPP is an asynchronousmessage passing protocol

• My outgoing stream from my [email protected] XMPP client to thelanl.gov XMPP server. Note that any XML can be sent betweenclients. This is what makes XMPP “extensible.”

<stream><!-- here is a packet --><message from="[email protected]" to="[email protected]">

<body>It is a near must that you read my blog.</body></message><!-- here is a packet --><iq from="[email protected]" to="[email protected]">

<spawn_vm vm_species="groovy" vm_id="ABCD" /></iq><!-- here is a packet --><message from="[email protected] to="[email protected]">

<body>What is up with that Mike guy?</body></message>

</stream>

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 30: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P29

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 31: General-Purpose, Internet-Scale Distributed Computing with Linked Process

30

General-Purpose: Language-Agnostic Code Migration

• Linked Process supports the migration of code (i.e. software, computinginstructions, etc.) between devices.

• Migrated code is intended to make use of the computing resources ofthe device (e.g. clock cycles, software APIs, hardware components, datasets, etc.)

• Migrated code can be in any computer language as long as the executingdevice maintains an appropriate virtual machine to execute code in thatlanguage.

• Devices in Linked Process serve as “computing sandboxes” that can beleverage for the execution of any code as long as the code does notviolate set security permissions.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 32: General-Purpose, Internet-Scale Distributed Computing with Linked Process

31

General-Purpose: Code Permissions

permission description type

job timeout milliseconds for which a job may execute text-single

vm time to live milliseconds for which a virtual machine may exist text-single

shutdown farm exit the farm process boolean

execute program execute a program boolean

read file read from a file list-multi

write file write to a file list-multi

delete file delete from a file list-multi

open connection open a socket connection boolean

listen connection wait for a connection request boolean

access print job initiate a print job request boolean

... ... ...

This set of permissions is not exhaustive. The Linked Process specification has a collection of REQUIRED and RECOMMENDED

permissions. Moreover, deployers may which to extend the collection to support environment specific conditions (e.g. database

access). Finally, these permissions are made available through disco#info service discovery (XEP-0030).

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 33: General-Purpose, Internet-Scale Distributed Computing with Linked Process

32

General-Purpose: Linked Process Hierarchy

Cloud Countryside

Villein

Farm

Registry

Virtual Machine Job

Linked Process entities contain/maintain/manage/etc. other Linked Process entities.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 34: General-Purpose, Internet-Scale Distributed Computing with Linked Process

33

General-Purpose: Linked Process Entities

L o P Cloud: A top-level construct which groups all farms, registries, and virtual machines

to which a villein has access.

Countryside: Many entities can exist on a single countryside (a bare JID).

Farm: A farm is the gateway to the device’s resources and exists on a countryside.

In general, there is one farm for each device. [SUPPORTS A CLOUD]

Virtual Machine: A virtual machine is spawned from a farm and is the primary

engine of computation in a cloud. [SUPPORTS A CLOUD]

Villein: A villein is an application that leverages a cloud for computational resources

(e.g. clock cycles, software, data sets, etc.). [LEVERAGES A CLOUD]

Registry: A registry is responsible for maintaining a roster of countrysides

and publishing only those countrysides that have active farms on them (based on

<presence/>).

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 35: General-Purpose, Internet-Scale Distributed Computing with Linked Process

34

General-Purpose: Linked Process Packets

• <spawn vm/>: create a virtual machine of a particular species(i.e. language such as JavaScript, Ruby, Python, Groovy, etc.).

• <submit job/>: execute the provided instructions/expressions.

• <ping job/>: determine the status of a previously submitted job.

• <abort job/>: cancel a previously submitted job.

• <manage bindings/>: set or get virtual machine variables.

• <terminate vm/>: destroy the virtual machine process.

NOTE: This presentation does not discuss interactions with registries, just farms and

virtual machines.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 36: General-Purpose, Internet-Scale Distributed Computing with Linked Process

35

General-Purpose: A Villein and [email protected]/1234 [email protected]/LoPFarm/60KES71Y

This is a screenshot from the LoPSideD GUI package. In practice, villein and farms usually don’t have this GUI front-end. This

package was developed to make it easier for developers to debug their Linked Process code. However, for farm providers, its a

way to see villeins communicating with their farm and to inspect the flow of packets and the state of existing virtual machines.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 37: General-Purpose, Internet-Scale Distributed Computing with Linked Process

36

General-Purpose: A Villein and [email protected]/1234 [email protected]/LoPFarm/60KES71Y

This villein and farm are on different physical devices. The villein is made aware of the farm because the villein is subscribed to

the farm’s countryside. Thus, all <presence/> packets coming from the countryside are delivered to the villein. The

subscriptions of the villein are available in its roster.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 38: General-Purpose, Internet-Scale Distributed Computing with Linked Process

37

General-Purpose: Basic Communication Sequence

<spawn_vm/>

<submit_job/>

<submit_job/>

<terminate_vm/>

<terminate_vm/>

127.0.0.1 127.0.0.2

<spawn_vm/>get

result

get

result

get

result

[email protected]/1234 [email protected] f472fb16...

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 39: General-Purpose, Internet-Scale Distributed Computing with Linked Process

38

General-Purpose: Spawning a Virtual Machine

• GET sent from a villein to farm...

<iq from="[email protected]/1234"to="[email protected]" type="get" id="xxxx">

<spawn_vm xmlns="http://linkedprocess.org/2009/06/Farm#"vm_species="javascript" />

</iq>

• RESULT sent from the farm to the villein...

<iq [email protected]"to="[email protected]/1234" type="result" id="xxxx">

<spawn_vm xmlns="http://linkedprocess.org/2009/06/Farm#"vm_id="f472fb16..." vm_species="javascript" />

</iq>

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 40: General-Purpose, Internet-Scale Distributed Computing with Linked Process

39

General-Purpose: Spawning a Virtual Machine

[email protected]/1234 [email protected]/LoPFarm/60KES71Y

The use of disco#info (XEP-0030) allows a villein to discover what features and other information a farm supports. This is how

the villein knows that the farm allows for the spawning of Python, JavaScript, Groovy, and Ruby virtual machines.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 41: General-Purpose, Internet-Scale Distributed Computing with Linked Process

40

General-Purpose: Spawning a Virtual [email protected]/1234 [email protected]/LoPFarm/60KES71Y

The spawned virtual machine has an identifier that is unique to its parent farm. The virtual machine maintains a state that is

altered through job submissions and binding updates. The virtual machine’s state is destroyed when the virtual machine is

terminated.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 42: General-Purpose, Internet-Scale Distributed Computing with Linked Process

41

General-Purpose: On the Nature of Virtual Machines

• Virtual machines are controlled by a farm—the farm serves as the“operating system” to control resource consumption and permissions ofa virtual machine.

• Virtual machines maintain their state throughout their lifetime. In otherwords, in general, the order in which jobs are executed matters.

• Virtual machines are specific to a particular computer language andcan be naturally thought of as an “XMPP-wrapped” runtime terminal.(e.g. groovy> 1 + 2;).

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 43: General-Purpose, Internet-Scale Distributed Computing with Linked Process

42

General-Purpose: Submitting a Job

• GET sent from a villein to virtual machine (indirectly through the farm)...

<iq from="[email protected]/1234"to="[email protected]" type="get" id="xxxx">

<submit_job xmlns="http://linkedprocess.org/2009/06/Farm#"vm_id="f472fb16...">

var temp=0;for(i=0; i&lt;10; i++) {

temp = temp + 1;}temp;

</submit_job></iq>

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 44: General-Purpose, Internet-Scale Distributed Computing with Linked Process

43

• RESULT sent from the virtual machine to the villein (indirectly throughthe farm)...

<iq from="[email protected]"to="[email protected]/1234" type="result" id="xxxx">

<submit_job xmlns="http://linkedprocess.org/2009/06/Farm#vm_id="f472fb16...">

10.0<submit_job/>

</iq>

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 45: General-Purpose, Internet-Scale Distributed Computing with Linked Process

44

General-Purpose: Submitting a [email protected]/1234 [email protected]/LoPFarm/60KES71Y

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 46: General-Purpose, Internet-Scale Distributed Computing with Linked Process

45

General-Purpose: Submitting a [email protected]/1234 [email protected]/LoPFarm/60KES71Y

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 47: General-Purpose, Internet-Scale Distributed Computing with Linked Process

46

General-Purpose: On the Nature of Jobs

• Jobs are executed synchronously where they are processed according toa FIFO (first in, first out) queue.

• A job is a “chunk” of code in the language of the virtual machine. Jobscan be as simple as setting a variable (e.g. i = 1 + 2;) to as complexas a class definition or full program.

• Jobs can make use of the software packages (APIs) existing on the device.For example, Groovy code can import Java classes made available bythe farm and instantiate them.

• If the expressions/code in a job violates the permissions of the virtualmachine, that job is rejected with a permission denied error.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 48: General-Purpose, Internet-Scale Distributed Computing with Linked Process

47

General-Purpose: Terminating a Virtual Machine

• GET sent from a villein to virtual machine (indirectly through the farm)...

<iq from="[email protected]/1234"to="[email protected]" type="get" id="xxxx">

<terminate_vm xmlns="http://linkedprocess.org/2009/06/Farm#"vm_id="f472fb16..." />

</iq>

• RESULT sent from the virtual machine to the villein (indirectly throughthe farm)...

<iq from="[email protected]"to="[email protected]/1234" type="result" id="xxxx">

<terminate_vm xmlns="http://linkedprocess.org/2009/06/Farm#"vm_id="f472fb16..."/>

</iq>

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 49: General-Purpose, Internet-Scale Distributed Computing with Linked Process

48

General-Purpose: Terminating a Virtual [email protected]/1234 [email protected]/LoPFarm/60KES71Y

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 50: General-Purpose, Internet-Scale Distributed Computing with Linked Process

49

General-Purpose: Terminating a Virtual Machine

[email protected]/1234 [email protected]/LoPFarm/60KES71Y

A terminated virtual machine releases all of its resources.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 51: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P50

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 52: General-Purpose, Internet-Scale Distributed Computing with Linked Process

51

LoPSideD: A Java Implementation of the Linked ProcessProtocol

• LoPSideD Farm: A farm that currently support JavaScript, Ruby, Python, and Groovy

virtual machines.

• LoPSideD Registry: A registry for advertising and locating farms.

• LoPSideD Villein API (Application Programming Interface): Classes to build villeins

that leverage a Linked Process cloud.

• LoPSideD Farm/Villein GUI (Graphical User Interface): A user interface for managing

a farm, for communicating with a farm, and generally useful for debugging (e.g. XMPP

packet sniffing mechanisms).

LoPSideD

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 53: General-Purpose, Internet-Scale Distributed Computing with Linked Process

52

LoPSideD Villein API

• Commands: Provides support to spawn/terminate virtual machines,submit/ping/abort jobs, and manage bindings.

• Proxies: Provides a collection of proxy data structures that makes theunderlying XMPP protocol relatively invisible to the developer.

• Patterns: Provides support for various distributed computing patternssuch as asynchronous, synchronous, scatter/gather, etc.

• Demos: Provides a collection of simple demos such as a distributedprime finding and distributed Web of Data analysis.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 54: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P53

Outline

• An Introduction to Other Distributed Computing Protocols

? General-Purpose Distributed Computing with MPI? Internet-Scale Distributed Computing with Web Services

• An Introduction to the Linked Process Protocol

? Internet-Scale Distributed Computing with Linked Process? General-Purpose Distributed Computing with Linked Process

• An Introduction to the Linked Process Protocol Implementation

• Current and Future State of Linked Process

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 55: General-Purpose, Internet-Scale Distributed Computing with Linked Process

54

The Current State of Linked Process

• The Linked Process protocol specification is nearly complete forsubmission to the standards track of the XMPP Standards Foundation.This means that the protocol that has been presented is still in a relativelyvolatile state and various mechanics of the protocol may change throughthis standards process.

• The LoPSideD implementation is nearly ready for a first version release.

• An experiment demonstrating the use of Linked Process to distributedcomputing on the Web of Data is currently being conducted.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 56: General-Purpose, Internet-Scale Distributed Computing with Linked Process

55

The Future State of Linked Process

• Move the Linked Process specification through the standards processfrom experimental, to draft, and ultimately to standard status.

• Develop implementations of the Linked Process Villein API in otherlanguages (currently there are plans for a Ruby and Python port).

• Add more virtual machine species to the LoPSideD Farm implementation:Scheme/Lisp, Tcl, PHP, SmallTalk, etc.

• Work with projects that are in need of the distributed computing solutionoffered by Linked Process.

• Work with more developers to expand the implementation base.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009

Page 57: General-Purpose, Internet-Scale Distributed Computing with Linked Process

L o P 56

Acknowledgements

• Joshua Shinavier (Rensselaer Polytechnic Institute): codesigner of theprotocol and codeveloper of LoPSideD.

• Peter Neubauer (Neo Technology): evangelist and tester of theLoPSideD codebase.

• Mick Thompson (Santa Fe Complex): provided the machines for thedeployment of the first Linked Process cloud.

• Jack Moffitt and Peter Saint-Andre (XMPP Standards Foundation):for support through the standards process.

Please visit Linked Process at http://linkedprocess.org.

Center for Nonlinear Studies – Los Alamos, New Mexico – September 10, 2009