GUS Plugin System

21
GUS Plugin System Michael Saffitz Genomics Unified Schema Workshop July 6-8th, Philadelphia, Pennsylvania

description

GUS Plugin System. Michael Saffitz Genomics Unified Schema Workshop July 6-8th, Philadelphia, Pennsylvania. Plugin Overview. Small Perl programs that load and manipulate data within GUS Written using the GUS Plugin API and Perl Object Layer Provide automatic support for: Data Provenance - PowerPoint PPT Presentation

Transcript of GUS Plugin System

Page 1: GUS Plugin System

GUS Plugin System

Michael Saffitz

Genomics Unified Schema Workshop

July 6-8th, Philadelphia, Pennsylvania

Page 2: GUS Plugin System

Plugin Overview

Small Perl programs that load and manipulate data within GUS

Written using the GUS Plugin API and Perl Object Layer

Provide automatic support for: Data Provenance Object layer and database connectivity Standardized documentation Command line argument processing Logging Error Handling

“Supported” and “Community” Plugins provided with GUS

Page 3: GUS Plugin System

Supported Plugins

Have been tested in Oracle and Postgres and are confirmed to work

Portable

Useful beyond the site that developed them

Meet the GUS Plugin Standard

Page 4: GUS Plugin System

Community Plugins

Fail to meet one or more of the criteria above Have not been tested

Provided as a general resource to the community

Page 5: GUS Plugin System

Plugin Life Cycle

Plugin Initialization Documentation Command Line Arguments

Data Loading Reading, Parsing, Querying

Data Manipulation Insert or Update? Restart Logic

Data Submission

Page 6: GUS Plugin System

GUS Supported Plugins InsertArrayDesignControl.pm InsertAssayControl.pm InsertBlastSimilarities.pm InsertExternalDatabase.pm InsertExternalDatabaseRls.pm InsertGOEvidenceCode.pm InsertGeneOntology.pm InsertGeneOntologyAssoc.pm

InsertRadAnalysis.pm InsertReviewStatus.pm InsertSecondaryStructure.pm InsertSequenceOntology.pm LoadArrayDesign.pm LoadArrayResults.pm LoadFastaSequences.pm LoadGusXml.pm LoadNRDB.pm LoadRow.pm LoadTaxon.pm

Page 7: GUS Plugin System

Plugin Shell

package GUS::Supported::Plugin::LoadRow;

@ISA = qw(GUS::PluginMgr::Plugin);

use strict;use GUS::PluginMgr::Plugin;

sub new { … }

sub run { … }

Page 8: GUS Plugin System

Plugin Initialization

sub new {my ($class) = @_;my $self = {};bless($self, $class);

$self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $',

name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation });

return $self;}

Page 9: GUS Plugin System

Declaring Arguments stringArg({name => 'externalDatabaseVersion', descr => 'sres.externaldatabaserelease.version for this instance of

NRDB', constraintFunc => undef, reqd => 1, isList => 0 }),

fileArg({name => 'gitax', descr => 'pathname for the gi_taxid_prot.dmp file', constraintFunc => undef, reqd => 1, isList => 0, mustExist => 1, format => 'Text' }),

Page 10: GUS Plugin System

Argument Types

String Integer Boolean Table Name Float File Enumeration Controlled Vocab

Local, Database Term Pairs for “dinky” CVs

Page 11: GUS Plugin System

Declaring Documentation

my $tablesDependedOn = [['GUS::Model::DoTS::NRDBEntry', 'pulls aa_sequence_id from here when id and extDbId match requested']];

my $documentation = {purposeBrief => $purposeBrief,purpose => $purpose,tablesAffected => $tablesAffected,tablesDependedOn => $tablesDependedOn,howToRestart => $howToRestart,failureCases => $failureCases,notes => $notes

};

Page 12: GUS Plugin System

Plugin Initializationsub new {

my ($class) = @_;my $self = {};bless($self, $class);

$self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $',

name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation });

return $self;}

Page 13: GUS Plugin System

Plugin Shell

package GUS::Supported::Plugin::LoadRow;

@ISA = qw(GUS::PluginMgr::Plugin);

use strict;use GUS::PluginMgr::Plugin;

sub new { … }

sub run { … }

Page 14: GUS Plugin System

Run Method

“Entry point” for plugin Concise overview/“table of contents” for plugin:

sub run {my ($self) = @_;my $rows = 0;my $rawData = $self->readData();my @parsedData = $self->parseData($rawData);foreach $data (@parsedData) {

$data->submit(); $rows++;

}return “Inserted $rows ”;

}

Page 15: GUS Plugin System

Accessing Data

Command line arguments: $self->getArg(‘nrdbFile);

Through Objects: my $preExtAASeq =GUS::Model::DoTS::ExternalAASequence->new

({'aa_sequence_id'=>$aa_seq_id});$preExtAASeq->retrieveFromDB();

Direct Database Access: my $dbh = $self->getQueryHandle();

my $sth = $dbh->prepare(…);

Page 16: GUS Plugin System

Persisting Data

Saving & Updating: $obj->submit(); Will cascade and submit children

Delete: $obj->markDeleted(1);

$obj->submit();

Page 17: GUS Plugin System

Logging and Error Handling

For general logging, use logging functions Printed to STDERR $self->log(“message”)

For error handling: Either die() immediately or Write errors to a file (for recoverable errors)

Restart functionality Check for object existence Check, but ensure loaded from a valid proper invocation Store data from previous run and use as a filter

Page 18: GUS Plugin System

Clearing the Cache

Historical: Perl previously had poor garbage collection support

Default capacity of 10000 objects

At the bottom of the outermost loop: $self->undefPointerCache();

Page 19: GUS Plugin System

Data Provenance

Tracks plugin revisions-- Name, Checksum, Revision

Tracks parameters that a specific plugin is executed with

Algorithm

AlgorithmImplementation

AlgorithmInvocation

AlgorithmParamKey

AlgorithmParamKeyType

AlgorithmParam

Page 20: GUS Plugin System

Plugin Evolution

Changes abound: Data file formats Schema

Be flexible in writing plugins-- command line configuration

Be clear about what schema objects you use

Page 21: GUS Plugin System

Plugin Standard

See Developer’s Guide: http://gusdb.org/documentation/3.5/developers/developersguide.html