Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported...

22
Actor Model and Data Flow Programming Paradigm (using Ported Network Graphs) Brett Viren Physics Department DUNE DAQ DFWG – 21 Feb 2020

Transcript of Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported...

Page 1: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Actor Model andData Flow Programming Paradigm

(using Ported Network Graphs)

Brett VirenPhysics Department

DUNE DAQ DFWG – 21 Feb 2020

Page 2: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Outline

Actor Model

Data Flow Programming Paradigm

Brett Viren (BNL) actor + graph 21 Feb 2020 2 / 22

Page 3: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Motivation

Our distributed DAQ software needs cohesive design patterns inorder to keep software development, configuration, operation,etc manageable.

The DAQ workshop began the process of discussing some likelypatterns and many of them are already in use in existingDUNE-related offline and online software.

This presentation covers two such patterns identified in ourrecent discussions.

Brett Viren (BNL) actor + graph 21 Feb 2020 3 / 22

Page 4: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Two Behavioral Structure Patterns

• Actor Model describes a set of independent code units thatintercommunicate asynchronously.• Data Flow Programming Paradigm describes a graph with

edges providing data transfer between nodes representingcode units.→ Will specialize “graph” into “ported graph” (PGraph) and then

“ported network graph” (PNGraph).

It is natural, but not required, to implement DFP nodes as actors.

Brett Viren (BNL) actor + graph 21 Feb 2020 4 / 22

Page 5: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Actor Model

Actor Model (Hewitt, Bishop, Steiger, 1973)

Application

Socket pipeThread actor

«thread»Actor Function

Socket pipe

actor_function ( pipe, userdata )

creates

An actor is a function started in a thread communicat-ing with its creator over a bidirectional pipe following amessage passing protocol.

• Typically, the actor function is called with some user data to use forconfiguration/initialization.

• The pipe is an exclusive-pair of connected sockets, one end for the actorand one end for the application.

• After thread launch, the pipe is a tether for actor protocol with app.Typically, the actor protocol is very simple.

• The actor may also communicate with other actors or in general with the“outside word”.

Brett Viren (BNL) actor + graph 21 Feb 2020 5 / 22

Page 6: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Actor Model

Simple Actor Protocol and Lifetime

ApplicationActor

World

Application

Actor

World

create

ready

message

compute

go dootherthings

message

terminate

shutdown

• App and actor communicate over a pipe(pipe not explicitly drawn)

• The “ready” message allows app to delay,typically actor notifies immediately.

• App then goes on to do other things.

• Actor enters its “main loop”: process externalinput, computes, polls pipe for input from app.

• App sends “terminate” via pipe, actor performsany cleanup and the actor function exits.

• Actor may also shutdown on its own but still waitsfor “terminate” prior to function exit.

• Some cases may need more complexity.→ Actor protocol may be more complex.→ App may have many actors.→ Actors may have actors.→ App may respond to actor termination.

Brett Viren (BNL) actor + graph 21 Feb 2020 6 / 22

Page 7: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Actor Model

Synchronous API to Asynchronous Actor Protocol

A detail:X Actors are nicely object oriented, simplifying app

by hiding behavior.× But, now app must have actor protocol message

handling code!

• Hide message handling behind synchronous API.◦ App calls method, API sends message, waits to

recv() reply (if appropriate), interprets message,returns result to app, app continues.

• Fact of life, not all async can be totally hidden.◦ Expose pipe to app for explicit polling.◦ Provide a poll() type method, app calls

periodically. Either returns value from oldestmessage sitting in recv() socket queue or a“false” result when queue is empty.

APIImplementation

Application API pipeActor

Application API pipe

Actor

construct

create

methodcall

msg

msg

methodreturn

Brett Viren (BNL) actor + graph 21 Feb 2020 7 / 22

Page 8: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Data Flow Programming ParadigmStructure overall job as code units which receive and/orsend data to other code units forming a directed (andpossibly cyclic) graph. “Program” by drawing lines(graph edges) between code units (graph nodes).

Simplistic view of DFP paradigm.

Brett Viren (BNL) actor + graph 21 Feb 2020 8 / 22

Page 9: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Ported Graph

Specialize from simple, directed graph to ported graph.

Ao1

o2

in B1 out

in B2 out

i1

i2C

A port is an identified, edge-attachment point on a node.Depending on the DFP system policy, a port may:• follow a specific protocol (flow-in, flow-out, query/response, etc).

• pass only specific data types or operate in a type-free manner.

• restrict edge multiplicity (allow zero, require exactly one, allow multiple).

Note: policy requires validation! Think on ways to perform this.Brett Viren (BNL) actor + graph 21 Feb 2020 9 / 22

Page 10: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Ported Graph AbstractionA powerful, practical feature of ported graphs

Ao1

o2

in B1 out

in B2 out

−→ XB1:out

B2:out

X represents all of A and the input ports of B1 and B2.

• A ported subgraph can be abstracted by “removing” all fully-populatedports and presenting a new graph with fewer nodes and ports.

• Resulting subgraph is (apparently) much simpler.

• In practice: experts configure their subsystems in detail and provideabstracted subgraphs. These are connected to produce another graphwhich can be abstracted, etc, until the entire system is configured.

Note: allows validation the divide-and-conquer strategy.

Brett Viren (BNL) actor + graph 21 Feb 2020 10 / 22

Page 11: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Ported Network Graph

Networking adds some complexity.

Ao1

o2

tcp://a.b.c.d:1234

tcp://a.b.c.d:1235

in B1 out

in B2 out

i1

i2C

tcp://a.b.c.e:1236

tcp://a.b.c.e:1237

mark bind() as • and connect() as �

• Networking requires a socket to bind() or connect() via an address.

• PNGraph edges must conceptually “pass through” this address.

• An address is a node, thus PNGraphs are bipartite:◦ ported nodes: data transformation◦ address nodes: data transportation.

Brett Viren (BNL) actor + graph 21 Feb 2020 11 / 22

Page 12: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Address ResolutionNetwork addressing makes PNGraphs more complex than PGraphs. Specifyingexplicit addresses is brittle (eg, collisions are possible).

Ao1

o2

(A,o1)

(A,o2)

in B1 out

in B2 out

i1

i2C(C,i1)

(C,i2)

Robust simplicity: discover addresses given node/port names.

• Every port known by (node, port) name 2-tuple.

• bind() needs no configuration, pick first unused TPC/IP port number.◦ publish 3-tuple: (node, port, address)

• connect() configured with 2-tuple: (node, port) names◦ Ports resolve node/port names to address via discovery mechanism.

Brett Viren (BNL) actor + graph 21 Feb 2020 12 / 22

Page 13: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Node Resolution

Further abstraction and simplification: peers discover portaddresses based on attribute matching rules instead ofhard-wiring (node,port) name 2-tuple in configuration.

• Node publishes arbitrary key/value attributes (“discovery headers”).◦ A node may have a “type” or a “role” or a “class” or “category” or a

“favorite color”....

• Peers discover node attributes and apply attribute matching rules.

• Matching port addresses also held in node’s “discovery headers”.

→ Peer, “I want to connect to all PUB ports of Hit Finders of APA 123”

→ Discovery mechanism, “you want this list of addresses: [...]”

Brett Viren (BNL) actor + graph 21 Feb 2020 13 / 22

Page 14: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Discovery Mechanisms

Two basic approaches:• centralized service (eg DNS), service bind to “well known

address”, peer must connect and CRUD records andpropagation may be required if service is redundant(latency). Peer must poll/query service to learn records ofother peers.• distributed protocol (eg Zyre), “network is the service”, peer

publishes to the network, peers get immediate updates (nopoll/query), no single point of failure, fundamentalredundancy, sub-second latency possible. Bonus: networklearns when peers appear, disappear or go quiet.

Note: Zyre is implemented as a ZeroMQ actor, so an application or individualnode can use it with very little coding to worry about.

Brett Viren (BNL) actor + graph 21 Feb 2020 14 / 22

Page 15: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

Modeling Our GraphsEven with simplifying strategies, our graphs will still be complex.

We should develop a way to model our graphs independentfrom “merely” producing configuration files.

• Express model in some data language.◦ (eg, PTMP and Wire-Cell Toolkit uses Jsonnet)

• Maintain models in version control.◦ (use a text-based modeling language)

• Develop parameterized models (eg, “use Napa per TC alg”).◦ (eg, exploit Jsonnet’s functional programming)

• Produce visualization for debugging model and foroperational displays.

◦ (Jsonnet to GraphViz dot conversion is easy and exists)

• Validate policy (eg, find unconnected ports).◦ (eg. apply constraints with Jsonnet functions).

• Transform to applications configuration files/objects.◦ (eg, compile to JSON, load to DB, load Jsonnet directly, etc)

Brett Viren (BNL) actor + graph 21 Feb 2020 15 / 22

Page 16: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

ZIO: an implementation of Ported Network Graph

ZIO is a next generation PTMP and also applies to problems withhighly parallel offline applications (Wire-Cell Toolkit).

It supports Ported Network Graph pattern with three classes:

• Node an identified, coherent set of ports, provides portcreation, online/offline transitions.• Port a light wrapper around a ZeroMQ socket,bind()/connect(), online/offline.• Peer a simplifying wrapper around ZeroMQ’s Zyre

mechanism, discovery header caching and matching.

Available in Python and C++ flavors. Both taste similar.https://brettviren.github.io/zio

Brett Viren (BNL) actor + graph 21 Feb 2020 16 / 22

Page 17: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Data Flow Programming Paradigm

ZIO node/port/peer ExampleHere, in Python, C++ is similar.

node = zio.Node("nodename")log = node.port("logger", zmq.PUB)log.bind()node.online(favorite_color="purple")

msg = zio.Message(...)log.send(msg)

node = zio.Node("other")port = node.port("slurp", zmq.SUB)port.connect("nodename", "logger")node.online()

logmsg = port.recv()

A zio.Peer is used inside zio.Node to resolve (node,port) to an address.Resolution can be extended to support node resolution described above.

Brett Viren (BNL) actor + graph 21 Feb 2020 17 / 22

Page 18: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Last Slide

• Actor Model and the DFP paradigm in general and thePorted Network Graph pattern in particular, are twobehavioral structure patterns which are central to distributedsystems. The will show up, either implicitly or explicitly.• As concepts alone, they at least provide a language with

which we can define and discuss our DAQ design.• Implementations available for adoption or inspiration (in

Wire-Cell Toolkit, PTMP and ZIO, and of course elsewhere).• Providing an implementation shared by all/most DAQ apps

will assist in simplifying software development, opening it upa larger pool of developers.

End note: additional patterns (eg, Interface, Factory, Plugin, (H)FSM, ...) are maybeworth future presentations.

Brett Viren (BNL) actor + graph 21 Feb 2020 18 / 22

Page 19: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

FIN

Brett Viren (BNL) actor + graph 21 Feb 2020 19 / 22

Page 20: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

ZeroMQ Actor Construction Examples

In C/C++ (CZMQ)

void actor_func(zsock_t* pipe, void* args);struct UD {...} ud;zactor_t* actor = zactor_new(actor_func, (void*)&ud);zsock_signal(actor, 0); // terminate

In Python (PyZMQ/Pyre)

def actor_func(ctx, pipe, arg1, arg2): passactor = ZActor(ctx, actor_func, "name", 42)actor.pipe.signal() # terminate

The high-level ZeroMQ C++ interface package ZMQPP alsoprovides direct support for actors.The simpler CPPZMQ does not but one can easily DIY withstd::thread.

Brett Viren (BNL) actor + graph 21 Feb 2020 20 / 22

Page 21: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Application Construction Patterns

Plugin dynamically loaded shared libraries provide classesor functions following some contracted interface.

Factory construct and possibly later retrieve objects as aninterface type based on an implementation typeand possibly an instance name.

These two may interact:

→ App asks Factory for an unknown implementation type

→ Factory asks Plugin to load plugins until found.

These are very useful, if rather pedestrian.I assume we will have them in some shape.We can dive into details another day.

Brett Viren (BNL) actor + graph 21 Feb 2020 21 / 22

Page 22: Actor Model and Data Flow Programming Paradigm (using ... · Data Flow Programming Paradigm Ported Graph Specialize from simple, directed graph to ported graph. A o1 o2 in B1 out

Details on Synchronous API to Asynchronous Actor

«class»SomeActorAPI

Thread _actor;

SomeActorAPI ( userdata );void terminate();Value query();Result check();

Socket pipe();optional

• Constructor creates actor thread and waits for “ready”message, execution returns to app.

• Actor function is then asynchronously running.

• The terminate() method takes care creating andsending “terminate” message.

• A query method may directly wait for and return replyvalue.

• Async results via check method, app calls periodicallyand uses only if valid, socket buffering soaks up delays.

• May provide app-side pipe socket so app may poll itwith complex protocols.

Brett Viren (BNL) actor + graph 21 Feb 2020 22 / 22