Scientific Applications of The Data Distribution Service

Scientific Applications ofData Distribution Service

Svetlana Shasharina#, Nanbor Wang,Rooparani Pundaleeka, James Matykiewicz and

Steve Goldhaber# [email protected]

http://www.txcorp.com

Outline

• Introduction– Tech-X Corporation– Some driving scientific applications– Common requirements

• QuIDS project• Workflow management• Security experiments• Python DDS implementation

• Summary

Tech-X Corporation• Founded in 1994, located in Boulder CO• 65 people (mostly computational physics, app math,

applied computer science)• Have been merging CS (C++, CORBA, GRID, MPI, GPU,

complex via and data management) and physics andlooking at DDS

• Funded by DOE, NASA, DOD and sales• Applications in

– Plasma modeling (accelerators, lasers, fusion devices,semiconductors) and beam physics

– Nanotechnology– Data analysis

http://www.txcorp.com

Large Synoptic SurveyTelescope (LSST)

• On ground digital camera to build inChile to start in 2020 (?). Fundedby DOE, NASA, university, privatesector

• Up to 2000 images/day +calibration data -> 30 TB/day

• Processed locally and reprocessedand archived in Illinois (NationalCenter for SupercomputingApplications)

• Uses OpenSplice for controlsoftware

• Can we help with datamanagement: orchestration ofsteps and monitoring of dataprocessing?

NoVA• NoVA: NuMI Off-axis ve (electronneutrino) Appearance experiment• Will generate neutrino beams atFNAL and send it to a detector inAsh River, Minnesota (500 mile in < 2 ms)• DOE funded (many labs and universities)• RMS (Responsive Messaging System) is DDS-based

system to pass control and status messages in the NoVAdata acquisition system (two types of topics but has manyactual topics to implement point-to-point communications)

• Will eventually need to go over WAN, and provide 10 Hzstatus transmissions between ~100 applications

• Simplifies OpenSplice using traits (like simd-cxx) tominimize the amount of data types and mapping topics tostrings

SciDAC-II LQCD• LQCD: Lattice Quantum Chromodynamics

(computational version of QCD: a theory of stronginteraction involving quarks and gluons making uphadrons like protons and neutrons)

• DDS is used to perform monitoring of clusters doingLQCD calculations (detect job failures, evaluate nodesloads and performance, start/kill etc)

• Topics for monitoring and controls of jobs andresources

• Use OpenSplice

Common themes for scientificapps and DDS

• RT issues are not well estimated• Common usability needs

– Support for scientific data formats and data products (from andout of topics): domain schemas and data transformation tools

– Control and monitoring topics (can we come up with reusableschema?)

– Simple APIs corresponding to expectations of scientists– Ease of modification (evolving systems not just production

systems)– QoS cookbook (how to get correct combinations)– General education

• Is DDS good for point-to-point (Bill talks only to Pete)• How one uses DDS without killing the system (memory etc)

• Other requirements– Site and community specific security– WAN operation (Chicago and Berkeley, for example)

Common extra expectations• Automated test harness:

– How one tests for correct behavior and QoS– Avoid regression in rapidly evolving system modified by a

team• Interacting with databases (all data should be archived and

allow queries)• Can we do everything using DDS to minimize external

dependencies?– Efficient bulk data transfer (usually point-to-point and BIG

triangles :-)– Workflow engine (workflow: loosely coupled applications

often through files and can be distributed, whilesimulations is typically tightly coupled on a HPC resource)

• Interacting with Web interfaces and Web Services

QuIDS: to address some issues• QuIDS: Quality Information Distribution System• Helping the applications above through Phase II SBIR from

DOE (HEP office)• Collaboration of Tech-X and Fermilab• Goals (we will talk about the ones in red in rest of this talk):

– Implement a DDS-based system to monitor distributedprocessing of astronomical data• Simplifying C++(done with simd-cxx?) and Python APIs• Support for FITS and monitoring and control data, images,

histograms, spectra• Security• WAN• Testing harness

– Investigate of of DDS for workflow management

QuIDS, bird eye view (courtesy ofFNAL)

CampaignManager

Monitor

Application(s)

MCTopic

SciTopic

MCTopic

MCTopic SciTopic

Computa-onal Domain

W R R W

R W

MCTopic

W

RW W R

QuIDS at FNAL computationaldomain

Workflowof apps

Generic workflows: do we need all?• Workflow is something outside of HPC (loosely coupled and

can tolerate even WAN, while simulation is something thatgoes to qsub…)

• Kepler (de-facto workflow engine expected for DOEapplications):– Support for complex workflows– Java based– Heavy and hard to learn– Not portable to future platforms (DOE supercomputers might not have

Java at all)• Real workflows in astronomy are simple (do not

expressivness of full programming language or pi-algebra)– Pipelines– DAGs

• How one implements such workflows using DDS?

Initialize Task(0)

Task(0)

Task(1)

Task(1)

Task(1)

Task(N-1)

Task(N-1)

Task(N-1)

Finalize

Tasks can be continued by differentworking processes: data can be passedbetween them (the Worker(1)performs Task(1) using data fromWorker (0))

Task(0)Worker(0)

Worker(2)

Parallel pipeline: most ofastronomy workflows

ddspipe: current implementation ofworkflow engine

• Parallel pipeline job consist of– Initialization phase (splitting data into manageable pieces) running on

one node– Parallel worker processes doing data processing tasks (possible not

sharing the same address space)– Finalization step (merging data into a final image or movie)

• There is an expected order in tasks, so that tasks can be numbered andoutput data of a previous step as input to next

• Design decisions for now:– Workers do not communicate to each other– Workers are given work by a mediating entity: tuple space manager

(no self-organization)– No scheduling for now (round-robin: tasks and workers are queued in

the server)– Workers can get data coming from a task completed by a different

worker (do not address the problem of data transfer now)• All communication is via DDS topics while data to work on is available

through local files to all workers

GDS = Tuple Space but wewant more (?)

• Tickets:– Task ticket (id, indata, out data, status)– Task status: standby, ready, running, finished– Worker ticket (id, task ticket, status)– Worker status: ready, busy

• Classic tuple space = set of task ticket and we could use only thembut… instead of dealing with a self-organized (wild) system, wewould like to implement– Workflow: M sequences of tasks with matching in and out data– Scheduling (based on policies, resources, location of workers)– Fault-tolerance (detecting and rescheduling of incomplete tasks)

• Hence: we decided to have a class TupleSpaceServer to addressthese (currently just pipeline and queues and no FT)

ddspipe classes:• Initializer

– Splits data, possibly creates workers and workspaces, publishes (forall initial work tickets with correct specification of the workflow

• TupleSpaceServer– Changes status in task tickets in accordance with the workflow order:

once a worker reports that task n done, a ticket for task n-1 with <n-1in-data> = <n out-data> is changed to ready and worker topic ispublished (with the worker id next in the queue). Once a workerreports that is doing this work, the status is changed to running etc.

• Workers– Publish their status– Listen to task assignment (matching its id to the one in the worker

ticket)

• Finalizer– Whatever to finish up (merge data and clean)

States of Tasks in Tuple Space

Task0Standby

Task0Ready

Task0Running

Task0Completed

Initialticketstates

jobStatus,Run

Tuplespaceinternalscheduling

Task1Standby

Task1Ready

Task1Running

Task1Completed

taskStatus,Task0,Completed


TasknStandby

TasknReady

TasknRunning

TasknCompleted

taskStatus,Taskn-1,Completed


Eventualticket

statesTaks executed sequentially

Sequences proceed independently in parallel

Tuple Space Manager MaintainsTasks Tickets and Schedules Tasks

Tuple SpaceManager

JobID

taskID

seqID

statusJobInitializer

Worker Worker Worker Worker

JobFinalizer

ticket

jobStatus

jobStatus

Worker

Worker

workerStatus

workTicket

ticket Idle Workers

jobStatus

Run Completed

Disposable

Status and next steps ofddspipe (beyond what bash can

do :-)• Prototype working

– Although we do have some issues with memory and bugs– I would like to experiment with no queuing: next task open for

grabs if one of the tasks of the previous stage is finished• Next steps

– User definition of workflow– Multiple jobs– Separation of worker manager from task manager?– Implementing workers doing slit-spectroscopy based on

running R– DAG support– Some scheduling (balance between data transfer and mixing

data between slow and fast workers?)– Data transfer implementation and in-memory data exchange

Security for scientific projects:from nothing to everything

• OpenSplice enterprise edition provides Secure NetworkingService:– Security parameters (i.e., data encryption and authentication) are

defined in Security Profiles in the OpenSplice configuration file.– Node authentication and access control are also specified in the

configuration profile.– Security Profiles are attached to partitions which set the security

parameters in use for that partition.– Secure networking is activated as a domain Service

• Scientists are not used to pay • Used to:

– Authenticate and authorize on connection– Rules are in admin area of a virtual organization (DDS

should consult)

Providing security in communityedition of OpenSplice

• Tried to replace the the lower networking layer ofOpenSplice with OpenSSL and see how one can provideauthentication, authorization and encryption

• OpenSSL is an open source toolkit:– Secure Sockets Layer and Transport Layer Security (new

standard, replacing SSL)– General purpose cryptography library– Public Key Infrastructure (PKI): e.g., certificate creation, signing

and checking, CA management

Switching to OpenSSL++ in thenetworking layer of community

OpenSplice

RT Net

Real Time Publish/Subscribe

Data Centric Publish/Subscibe

Applications

UDP / IP

DDSi

• The UDP/IP layer handles the interface to the operating-system’s socket interface

• Switching to OpenSSL allowed us to establish securetunnel between two sites

• But the configuration should be done per each two nodes!

Real Time Publish/Subscribe

Data Centric Publish/Subscibe

Applications

OpenSSL

RT Net DDSi

Future Directions in SecurityWork

• Waiting for new development and will be happy to use toimplement what is expected by DOE labs (ourcollaborators) security

• Explore user data etc fields to address applicationsspecifics if this is not addressed by the security profile

PyDDS: Python bindings forDDS communications

• Started with SWIG for wrapping generated bindings:works fine but needed manual wrapping of multiplegenerated classes

• Next worked with Boost.Python for wrapping ofcommunication layer (set of classes that are used inthe IDLPP generated code to call into OpenSplice forcommunication) so that there will be no need to wrapgenerated bindings

• Problem: need to take care of inheritance manuallyand deal with several handlers that are unexposed Cstructs used in forward declarations

Status and next steps forPyDDS

• Hand-wrapping of C++ bindings using SWIG works:#!/usr/bin/pythonimport timeimport TxddsPyqos = TxddsPy.Qos()qos.setTopicReliable()qos.setWriterReliable()writer = TxddsPy.RawImageWriter(”rawImage")writer.setTopicQos(qos)writer.setWriterQos(qos)data = TxddsPy.rawImage()writer.writeData(data)

• Next:– Investigate wrapping communication classes so that we

expose minimum for Boost.Python– Or develop tools for generating glue code needed to

Boost.Python using a string list of topics

QuIDS summary and futuredirections

• We have prototyped DDS-bases solutions for astronomy dataprocessing applications– Tools for bringing FITS data into DDS– Simple QoS scenarios (getting prepublished data)– Parallel pipeline workflows– Security studies– Python APIs

• Possible next steps– Language to describe workflows for user input into the system– More complex workflows and concrete implementations– Implementing FNAL security requirements– WAN communication between FNAL and LBNL going through firewall– Streamlining glue code generation for Python– Testing harness– Bulk data transfers– Archiving data into databases– Web interfaces

Acknowledgements

• Ground Data System (GDS) team from Fermi NationalAccelerator Laboratory

• PrismTech• Nikolay Malitsky (BNL)• OpenSplice mailing list and its contributors• US Department of Energy, Office of High Energy

Physics

Scientific Applications of The Data Distribution Service

Technology

Transcript of Scientific Applications of The Data Distribution Service