Perl Object Layer & Pipelines

Pipelines –Steve Fischer John Iodice Deborah Pinney Mark Heiges Ed Robinson

Perl Object Layer–Brian Brunk Mark Gibson Dave Barkan

Pipeline Introduction

• Sequential steps of – Plugin calls – Script calls – Cluster jobs

• Purpose– Codifies the process of creating the data set– Reduces human resources– Reduces human error and omissions

Two Pipeline Types

• Resources pipeline– Downloads resources from external sources– Loads resources into database– Example: NRDB files

• Analysis pipeline– Extract data from database– Run analysis programs on data

• On main or cluster server

– Put value added data back into database

Resource Pipeline

• Invoked by:– loadresources xmlfile propfile

• Take a tour of a resources XML file

Resources Repository

• Destination of downloads• Houses files in a file system• Serves as a cache for files• Has API to access files by name and version• If you request an existing file by name and

version, repository returns it without downloading– But the wget arguments must match (these are

remembered by the repository)• Particularly useful if multiple projects want to

synchronize their data input

Analysis Pipeline

• Take a tour of the analysis pipeline file

• Take a tour of the Steps.pm file

• Take a tour of the property file

Pipeline Directory Structure

• The directory which houses all the information for the pipeline including:– Input data– Logs– Result data– Pipeline control information:

• Which steps have been completed• Property files to control cluster

• Structured for easy comprehension• Take a tour of the directory structure

Analysis Pipeline API

• GUS::Pipeline::Manager.pm– Declares properties– Prevents steps from rerun– Calls plugins– Executes commands– Eases communication with cluster

• GUS::Pipeline::MakeTaskDirs.pm– Helps make directories expected by distribjob on the

cluster• GUS::Pipeline::TaskRunAndValidate.pm

– Helps run a series of tasks on the cluster

• Manages the distribution of tasks across a compute cluster

• Handles the case of a very large number of inputs which are processed independently and uniformly

• For example, blasting a set of EST against a genome

• Now available for clusters using PBS cluster scheduler

• http://core.pcbi.upenn.edu/tools/liniactools.html

Perl Object Layer

http://www.cbil.upenn.edu/~brunkb/PERL_Objects.html

Perl Object Layer

• Simplifies database interactivity

• Manages parent-child relationships

• Manages submits (inserts,updates and deletes)– Submits children recursively– Automatic versioning– Sets default attributes (Ex. row_user_id)

• Enforces read/write permissions

• Code generator - objects consistent with db

• Extracts meta data from db

• Prints to XML and parses XML into objects

DbiDatabase Module

• Creates login to the database

• Allows use of all database objects

• Has methods to get meta information– Ex: getTable(tableName) returns a DbiTable

for access of FK and PK attributes

• DbiDatabase object automatically instantiated by plugins

• DbiDatabase objects must be explicitly instantiated in scripts

Object Constructor

• TableName->new($hashRef)

Retrieving objects from DB

• retrieveFromDB(\@attributesToNotRetrieve)

• Returns 1 if successful – Constrains attribute values

• Returns 0 if not successful– No rows or multiple rows

Getting and Setting Attributes

• Attributes can be set using the individual object– Preferred, for additional functionality – Ex: setRowUserId($userId);

• Attributes can be set using the superclass– set('row_user_id',$userId);

• Get methods use similar syntax– getRowUserId()– get('row_user_id')

Managing submits to database

• submit($notDeep, $noTran)– $notDeep = 1 only submits self but not

children – $noTran = 1 does not begin or commit a

transaction

• addToSubmitList($object)– Additional $object gets submitted after

main object and its children are submitted

Managing Parents

• setParent($p)

• getParent($className, $retrieveIfNoParent ,\@doNotRetrieveAttributes)

• retrieveParentFromDB($className ,\@doNotRetrieveAttributes)

Managing Memory

• undefPointerCache()– MUST be called in each loop to allow

garbage collection. – Removes all child and parent pointers so

they can not be retrieved.

• All other methods are automatic– addToPointerCache($ob) – getFromPointerCache($object_reference) – removeFromPointerCache($ob)

Managing deletes

• Deletes occur in two steps

• markDeleted($doChildren)– Mark self deleted– If $doChildren = 1 then does this recursively

• Deletes occur with submit

Managing Children

• getChildren($className, $retrieveIfNoChildren, $getDeletedToo, $where,\@doNotRetrieveAttributes)

• getAllChildren($retrieve, $getDeletedToo, $where)

• retrieveChildrenFromDB($className, $resetIfHave, $where,\@doNotRetrieveAttributes )

• retrieveAllChildrenFromDB($recursive, $resetIfHave)

Methods for dealing with sequence

• getSequence()

• setSequence($sequence)– removes returns and non-sequence

characters and then sets.

• GetFeatureSequence()– retrieves substring of sequence to which

that feature points

• toFasta($type)– If $type = 1 id used is the aa(or

na)_sequence_id - otherwise it is the source_id

Printing

• ToString()

• toXML($indent, $suppressDef, $doXmlIds, $family)– $suppressDef = 1 default attributes below

modification_date are suppressed– $doXmlIds = 1 will print XML ids in the

object tags– $family = 1 will print parent/child

relationships in object tags rather than nesting children

Checking read and write permissions

• checkReadPermission()

• checkWritePermission()

Perl Object Layer & Pipelines

Documents

Transcript of Perl Object Layer & Pipelines

Perl Switches with the perl command

Scripting Layer for Android + Perl

Modern Perl for Non-Perl Programmers

What is Continuous Delivery? - NetMcr€¦ · Operations to live • Deployed to live manually using legacy Perl scripts • 75 Apps • 350 Go Pipelines • 2016/17 • 80+ Deployments

Learning To Program With Perl - Babraham Bioinf Introduction.pdf · Learning to program with Perl 4 Section 1: Getting Started with Perl What is Perl / perl? Perl is a high-level

perl For System Administration - Perl Trainingperltraining.com.au/notes/sysadmin.pdf · This is revision 1.2 of Perl Training Australia’s "Perl for System Administrators" training

Dumping Perl 6 (French Perl Workshop)

Perl 4: Applied Perl - O’Reilly Mediaarchive.oreilly.com/oreillyschool/courses/Perl4/Perl 4 Applied Perl... · Perl 4: Applied Perl Lesson 1: Introduction: ... Lesson 5: Installing

201ier%20trimestre%20198… · equipement de base C option compatible incompatibilité. CELESTRON PERL PERL IW,.eoo PERL NOVA 120,810 PERL MIZAR AR 120 PERL wzAR H 120 12¶720

Unix, Perl and BioPerl - barc.wi.mit.edubarc.wi.mit.edu/education/bioinfo2005/unix-perl/slides/Unix_Perl_3... · Unix, Perl and BioPerl III: Sequence Analysis with Perl - Modules

Perl Programming - 01 Basic Perl

Programming Perl - Perl Training · This is version 1.39 of Perl Training Australia’s "Programming Perl" training manual. Table of Contents 1. About Perl Training Australia ...

Perl family: 15 years of Perl 6 and Perl 5

Perl Pipelines - Cold Spring Harbor Laboratorygorgonzola.cshl.edu/pfb/2010/LectureNotes/perl_pipelines/perl... · Perl Pipelines Using perl as bioinformatics glue Simon Prochnik with

The GUS 3.0 Perl Object Layer

Perl 1997 Perl As A System Glue

Perl for System Administration - Perl Training Australia

Perl: A Short Introduction for Bioinformaticians · Perl: A Short Introduction for Bioinformaticians 2 What is Perl? Perl is an interpreted (scripting) language Perl is (almost) platform-independent

Database Programming with Perl - Perl Training Australia

Perl 101 - The Basics of Perl Programming