Tigres: Template Interfaces for Agile Parallel Data ......Tigres: Design templates for common...

Tigres: Template Interfaces for Agile Parallel Data-Intensive Science

Lavanya Ramakrishnan

LRamakrishnan@lbl.gov

http://tigres.lbl.gov

Tree Files (ps and .pdf files)

blast blast

clustalw clustalw

dnapars protpars

drawgram

(CS Biased) View of Workflow Challenges: Gene2Life Molecular Biology Analysis

•  Mostly simple sequential workflow

•  Repetitive •  Tracking

–  Provenance, metadata, etc

•  “Iterative” –  Swap programs and

data sets •  Desktop to HPC/

Nucleotide or amino acid.

search

alignment

analysis

visualization

Pre Interproscan

Interproscan Interproscan

Post Interproscan

256 processors

(CS Biased) View of Workflow Challenges: MotifNetwork

•  Mid-sized “compute-intensive” workflow

•  Mix of single processor and multiprocessor tasks

•  Intermediate data formatting/logic

•  Move •  Share

data preparation

analysis

aggregation

processing

People use ad-hoc scripts, keep notes in text files and encode metadata in file names

Source: Jeff Tilson

Big Data is here … Larger volumes of data More dynamic content Significant variety in data types Large amounts of unstructured data Increased rate of data arrival Need for faster data processing rates …..

… It is not getting easier

MapReduce and Hadoop Ecosystem

Reduce

Computation performed on large volumes of data in parallel Provides scaling, data locality, fault tolerance Higher-level tools have evolved for specific data analysis There are challenges in using MapReduce/Hadoop

for scientific workflows

Tigres: Design templates for common scientific workflow patterns

"LightSrc" Domain templates

Base Tigres templates

Scale up

Application "LightSrc-1"

Application "LightSrc-2"

Create andDebug

Implement templates as a library in an existing language

Tigres Templates

Task1 Sequence

Taskn Task1 ... ...

Parallel

TaskN Task1

Taskn Task1

Key Aspects of Tigres

•  Targeted for large-scale data-intensive workflows –  Motivated by “MapReduce” model –  No centralized managed model

•  Library model embedded in existing languages such as Python and C –  “Extend current scripting/programming tools” –  API-based, embedded in code

•  Light-weight execution framework –  “As easy to run as an MPI program on an HPC resource” –  No persistent services

•  Scientist-Centered Design Process –  Get feedback from user before writing all the code

Design

Execution Environment

API Implementation Optimizations Scientist-Centered

Design Process

Tigres Design Process

Create a workflow 1.  Define input types 2.  Define task 3.  Assign input values

4.  Repeat 1-3 above for other tasks

5.  Create appropriate input

arrays 6.  Create appropriate tasks

arrays 7.  Create (and run) the

template

Task45 ...

Task40

Task55

Task50 ...

input1_task1 {Type: Object_a} input2_task1

{Type: int}

Templates

•  Sequence ( name, task_array, input_array ) –  e.g., output [ ] = Sequence (“my seq”, task_array_12,

input_array_12) •  Parallel ( name, task_array, input_array )

–  e.g., output[ ] = Parallel(“abc”, task_array_12, input_array_12)

•  Split ( name, split_task, split_input_values, task_array, task_array_in ) –  e.g., Split( “split”, task_x1, input_value_1, spl_t_arr,

spl_i_arr) •  Merge ( name, task_array, input_array, merge_task,

merge_input_values) –  e.g., Merge( “merge”, syn_t_arr, syn_i_arr, task_x1,

input_value_1)

Scientist-Centered Design Process

•  Use Google docs for an interactive step-by-step exercises with “facilitator” and “human compiler” –  white/black board didn’t work

•  Preparation –  terminology, basic template, example, exercise –  15 minutes preparation time

•  Testers ( ~6) –  Developers, web design/UI staff, application scientists

Concept understanding by user Changes to Nomenclature Support in C also important

Priorities for first prototype: Desktop to NERSC Monitoring Intermediate state management

Impact of Scientist-Centered Design

Design

Execution Environment

API Implementation Optimizations Scientist-Centered

Design Process

What did we learn?

•  Documentation clarity was key –  Majority of our participants “coded-by example”

•  Nomenclature was important –  Confusion with initial terminology

•  Keep API simple –  Dependencies/output – two different styles within and

outside a template can be confusing •  Support extended API

–  Optional parameters, different programming styles •  Execution Semantics were important

–  Monitoring, logging It took days for first stub implementation rather

than weeks (or months)!

Summary

•  “Scientist-friendly” programming API to manage workflows

•  Plan to test API with different user groups

•  Core team –  Deb Agarwal (PI), Lavanya Ramakrishnan, Daniel Gunter –  Gilberto Pastorello, Valerie Hendrix, Ryan Rodriguez

•  CS Research groups –  Shane Canon –  John Shalf

•  Science research groups –  Cosmology - Alex Kim, Rollin Thomas, Stephen Bailey –  Gamma Ray - Dan Chivers –  Advanced Light Source - Dula Parkinson –  HEP - Paolo Calafiura –  Materials – Kristin Persson

Tigres Team

Website: http://tigres.lbl.gov Contact: LRamakrishnan@lbl.gov

Monitoring

•  Initialize –  init (tigres-destination, user-destination)

•  User Logging –  setLevel(level) enumeration FATAL upto TRACE –  write(level, name, key-value pairs)

•  Query –  getStatus(type, names) –  getInfo(name, key-value-pairs)

Input and Task

•  InputTypes ( name, types[ ] ) –  e.g., input_type1 = InputTypes(“Types1”, {“int”, “string”})

•  InputValues ( name, values[ ] ) –  e.g., input_value1 = InputValues(“Values1”, {1, “hello”})

•  InputArray ( name, input_values[ ]) –  e.g., input_array_12 = InputArray(“Array12”, {input_value_1, input_value_2})

•  Task ( name, type, impl_name, input_types, env) –  e.g., task_f1= Task(“A”, FUNCTION, “myfunc”, input_type1))

•  TaskArray ( name, task[ ] ) –  e.g., task_array_xy = TaskArray(“xy”, {task_f1, task_x1})

Tigres: Template Interfaces for Agile Parallel Data ......Tigres: Design templates for common...

Documents

Transcript of Tigres: Template Interfaces for Agile Parallel Data ......Tigres: Design templates for common...

Club Tigres - Andre Pierre Gignac - Don Tachon

Workflow Automation Templates for Processing Mosaic ... · Workflow Automation Templates for Processing Mosaic Dataset Imagery . ... based on image type ... • Checkboxes are used

Mark Rees Microsoft Consulting Services OFC409 Windows Workflow Foundation (WF) Primer Creating WF programs in Visual Studio Creating workflow templates.

ArcGIS Workflow Manager: Getting Started with Out-of-the-Box Templates Michael Broadbent.

AC 10.0 - How to Customize Notification Templates for AC Workflow

Using Workflow - Web viewThis check box, when selected, allows you to assign Microsoft Word templates to use with process codes for Workflow emails. ... Workflow Assistant

TIGRES! - Detroit Tigersdetroit.tigers.mlb.com/det/downloads/y2010/fiesta_tigres.pdfTIGRES! DETROIT TIGERS 5TH ANNUAL 5a CELEBRACIÓN ANUAL DE LOS TIGRES DE DETROIT For information

8 steps converging arrows workflow diagram chart software power point templates

Assessment E (Workflow Scheduling) - va.gov E (Workflow – Scheduling) ... provider scheduling templates, ... demand and to effectively manage current performance and plan for the

EM B06: Integrating Symantec Solutions with Workflow Templates B06.pdf · Common Use Cases for Workflow EM B06: Integrating Symantec Solutions with Workflow Templates 4 Use Case Example

Workflow Templates Library - Ayehuayehu.com/wp-content/uploads/2015/03/Ayehu_-eyeShare... · 2016. 9. 27. · Ayehu eyeShare includes a list of built-in workflow templates which are

On Saturday, Los Tigres del Norte will become the first ... Saturday...Los Tigres del Norte get a star on Hollywood Walk of Fame -- finally! Los Tigres del Norte help unveil UCLA Mexican

Data mining workflow templates for intelligent discovery ... · Data Mining Workﬂow Templates for Intelligent Discovery Assistance and Auto-Experimentation Jorg-Uwe Kietz 1, Floarea

Workflow Redesign Templates Presentation

ArcGIS Workflow Manager: An Introduction · 2015-07-30 · Prepare workflow geodatabase • Import users and templates Manage user access • Add users to groups • Assign privileges

Nombres clase tigres

AC 10.0 - How to Customize Notification Templates for AC Workflow[1]

PARADIGMATIC OVERLAPPING IN TRES TRISTES TIGRES John ...

Phoenix Workflow Templates · Certara’s Phoenix Technology Services team has developed workflow templates for CDISC SDTM and SEND data submissions, Non-clinical Toxicokinetic (TK)

Clase tigres