knb.ecoinformatics seek.ecoinformatics

22
http://knb.ecoinformatics.org http://seek.ecoinformatics.org Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara

description

http://knb.ecoinformatics.org http://seek.ecoinformatics.org. Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara. - PowerPoint PPT Presentation

Transcript of knb.ecoinformatics seek.ecoinformatics

Page 1: knb.ecoinformatics                                       seek.ecoinformatics

http://knb.ecoinformatics.org http://seek.ecoinformatics.org

Science Environment for Ecological Knowledge: EcoGrid

Matthew B. JonesNational Center for Ecological Analysis and Synthesis

University of California Santa Barbara

Page 2: knb.ecoinformatics                                       seek.ecoinformatics

Science Environment for Ecological Knowledge

Research Objectives

Access to ecological, environmental, and biodiversity data Enable data sharing & re-use Enhance data discovery at global scales

Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of data

Address data heterogeneity issues Enable communication and collaboration for analysis Enable re-use of analytical components

Collaborators NCEAS, UNM, SDSC, U Kansas Vermont, Napier, ASU, UNC

Page 3: knb.ecoinformatics                                       seek.ecoinformatics

SEEK Components

Science Environment for Ecological Knowledge

Kepler Modeling scientific workflows

EcoGrid Making diverse environmental data systems interoperate

Semantic Mediation System “Smart” data discovery and integration

Knowledge Representation WG Taxon WG BEAM WG Education, Outreach, Training

Page 4: knb.ecoinformatics                                       seek.ecoinformatics

Scientific Workflows

Model the way scientists work with their data now Mentally coordinate export and import of data among software

systems

Workflows emphasize data flow

Output generation includes creating appropriate metadata The analysis workflow itself becomes metadata The workflow describes the data lineage as it has been

transformed Derived data sets can be stored in EcoGrid with provenance

Query EcoGrid to find data

Archive output to EcoGrid with workflow

metadata

Page 5: knb.ecoinformatics                                       seek.ecoinformatics

Kepler: scientific workflows

• Collaborative effort of SEEK, SciDAC/SDM, GEON, Ptolemy Project

Page 6: knb.ecoinformatics                                       seek.ecoinformatics

Kepler understands EML data

Page 7: knb.ecoinformatics                                       seek.ecoinformatics

Kepler: molecular biology example

Page 8: knb.ecoinformatics                                       seek.ecoinformatics

SEEK EcoGrid

Goal: allow diverse environmental data systems to interoperate

Hides complexity of underlying systems using lightweight interfaces

We have standardized data via EML, need standard APIs Integrate diverse data networks from ecology, biodiversity, and

environmental sciences

Data systems Any system can implement these interfaces Prototyping using:

Metacat, SRB, DiGIR, Xanthoria, etc.

Supports multiple metadata standards EML, Darwin Core as foci

Page 9: knb.ecoinformatics                                       seek.ecoinformatics

EcoGrid client interactions

Modes of interaction Client-server Fully distributed Peer-to-peer

EcoGrid Registry Node discovery Service discovery

Aggregation services Centralized access Reliability Data preservation

Page 10: knb.ecoinformatics                                       seek.ecoinformatics

EcoGrid Query Interfaces

Provides a mechanism for search and retrieval of metadata and federated data

Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval

Different levels of compliance Low barrier for participation Bulk of data will be accessible through Type I

ResultQuery

Page 11: knb.ecoinformatics                                       seek.ecoinformatics

Query Interfaces Implemented

Initial prototype to support query and retrieval from: Storage Resource Broker (SRB) Metacat Distributed Generic Information Retrieval (DiGIR) Xanthoria

Encourage additional experimentation with and feedback based on other system implementations

Page 12: knb.ecoinformatics                                       seek.ecoinformatics

EcoGrid Query Level I

Basic, entry level exposure of data and metadata for EcoGrid and SEEK

Response contains data – intended for direct communications rather than 3rd party indirection

ResultsetType query(SessionID,QueryType)

byte[] get(SessionID,objectID)

Result Query

Page 13: knb.ecoinformatics                                       seek.ecoinformatics

Query Conditions

Language independent representation of a query structure

Transformed into the appropriate native language of the data store

Example:<AND> <condition operator="LIKE“ concept="ScientificName">peromyscus%

</condition> <condition operator="NOT EQUALS“

concept="DecimalLatitude">NULL</condition>

</AND>

Query

Page 14: knb.ecoinformatics                                       seek.ecoinformatics

Specifying the Resultset

Specify the list of concepts (fields) to be returned in the resultset

Simple paths used to identify elements or document subtrees

Effectively flattens the structure of the records, but allows generic representation

Example: <returnfield>/ScientificName</returnfield>

<returnfield>/Longitude</returnfield>

<returnfield>/Latitude</returnfield>

Query

Page 15: knb.ecoinformatics                                       seek.ecoinformatics

Full Query Example

<egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org"

xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-

query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace

prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>

<returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE"

concept="Genus">Peromyscus</condition></egq:query>

Query

Page 16: knb.ecoinformatics                                       seek.ecoinformatics

Query Result Set Structure

<rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1

../../src/xsd/resultset.xsd">

<resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <system id="1">http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2</system> </resultsetMetadata>

<record number="1" system="1" identifier="mvz1"> <returnField name="ScientificName">PEROMYSCUS LEUCOPUS NOVEBORACENSIS</returnField> <returnField name="Longitude">100</returnField> <returnField name="Latitude">200</returnField> </record> …</rs:resultset>

Result

Page 17: knb.ecoinformatics                                       seek.ecoinformatics

EcoGrid Query Level II

More detailed handling of results Uses RSIDs to identify resultsets- handles

that can be passed to a third party

RSID search(SessionID,query)

Resultset retrieve(SessionID,RSID,start,numrecs)

query decodeResultsetIdentifier(SessionID,RSID)

statusinfo getResultStatus(SessionID)

int transfer(SessionID,sourceURL,destURL,ObjectID)

Page 18: knb.ecoinformatics                                       seek.ecoinformatics

EcoGrid Write

Used to push data back to sources (e.g. publishing EML documents)

Depends on the availability of an authentication and access control system

put(sessionID, objectID, object, type)

delete(sessionID,objectID)

Page 19: knb.ecoinformatics                                       seek.ecoinformatics

Data Instance Query

New requirement to support direct query and retrieval with arbitrary data sets

Generally no common schemas between different instances

Could either Push data instance to service that can query object (e.g.

the SRB) Implement interface at the data instance location

Simple JDBC / SQL interface?

dbSchema getDataSchema(sessionID,objectID)

dbResultset search(sessionID,objectID,SQL)

Page 20: knb.ecoinformatics                                       seek.ecoinformatics

Building the EcoGrid

AND

LUQ

NTL

Metacat node

Legacy system

LTER Network (24) Natural History Collections (>> 100)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)

SRB node

DiGIR node

VCR

VegBank node

Xanthoria node

HBR

Page 21: knb.ecoinformatics                                       seek.ecoinformatics

Metadata-driven analysis cycle

Page 22: knb.ecoinformatics                                       seek.ecoinformatics

Acknowledgements

This material is based upon work supported by:

The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.

The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.

The Andrew W. Mellon Foundation.

PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)