eagle-i: a national network of biomedical research...

31
www.eagle-i.org www.eagle-i.org eagle-i: a national network of biomedical research resources Cambridge Semantic Web Meetup, June 2011 Daniela Bourges-Waldegg eagle-i system architect, on behalf of the eagle-i Consortium

Transcript of eagle-i: a national network of biomedical research...

Page 1: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

www.eagle-i.org www.eagle-i.org

eagle-i: a national network of biomedical research resources

Cambridge Semantic Web Meetup, June 2011

Daniela Bourges-Waldegg eagle-i system architect, on behalf of the eagle-i Consortium

Page 2: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Outline

Introduction and motivation

•  The eagle-i consortium and network

•  Why eagle-i?

The eagle-i architecture and software stack

•  Layered ontology model

•  Ontology-driven development

Challenges of producing and consuming linked data

Concluding remarks

Page 3: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

eagle-i Consortium – a national network 9 institutions diverse in geography, culture

and resources

Page 4: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher A: Starting new project

needs:

1.  Expertise (technical skill set)

2.  Knowledge (understanding of domain)

3.  Material Resources (plasmids, antibodies, organisms, equipment, services…)

Page 5: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher A: Starting new project

needs:

1.  Expertise (technical skill set) ---- ✔ 2.  Knowledge (understanding of domain)

3.  Material Resources (plasmids, antibodies, organisms, equipment, services…)

Page 6: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher A: Starting new project

needs:

1.  Expertise (technical skill set) ---- ✔

2.  Knowledge (understanding of domain) ---- ✔ 3.  Material Resources (plasmids, antibodies,

organisms, equipment, services…)

Page 7: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher A: Starting new project

needs:

1.  Expertise

2.  Knowledge

3.  Material Resources

•  Create

•  Purchase

•  Borrow/Collaborate

Page 8: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher A: Starting new project

needs:

1.  Expertise

2.  Knowledge

3.  Material Resources

•  Create •  start now •  control quality •  time •  money

•  Purchase •  fast and easy •  costly •  may not be available

•  Borrow/Collaborate •  free •  faster than remaking •  collaborative •  uncertainty

$ ?"

Page 9: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher B: Finishing a project

Has produced:

1.  Expertise

2.  Knowledge

3.  Material Resources

Page 10: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher B: Finishing a project

Has produced:

1.  Expertise

2.  Knowledge

3.  Material Resources

Next Project, Publications

Page 11: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

Researcher B: Finishing a project

Has produced:

1.  Expertise

2.  Knowledge

3.  Material Resources

1.  Deep Freeze •  always have it •  never know where to find

it 2.  Toss

•  reduce clutter •  save on space and energy •  Gone for good – may need

it again 3.  Organize

1.  always have it 2.  always find it 3.  easily share/collaborate 4.  save time and money in

long run 5.  takes time in the short run

Page 12: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Why eagle-i? the problem

1.  Deep Freeze •  always have it •  never know where to find

it 2.  Toss

•  reduce clutter •  save on space and energy •  Gone for good – may need

it again 3.  Organize

•  always have it •  always find it •  easily share/collaborate •  save time and money in

long run •  takes time in the short

run

1.  Create •  start now •  control quality •  time •  money

2.  Purchase •  fast and easy •  costly •  may not be available

3.  Borrow/Collaborate •  free •  faster than remaking •  collaborative •  uncertainty

Page 13: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

The goal of eagle-i

Provide a mechanism to allow researchers who need, to connect to researchers who have.

Reduce redundancy in resource development.

Connect researchers with resources that they don’t know that they need.

Page 14: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

JSU Data Center

eagle-i ontology

Search Application

Federated Network (SPIN)

Repository (RDF)

Data Tools

NIF, PubMed, Entrez Gene,

etc.

The eagle-i architecture

Page 15: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

eagle-i design principles

Ontology-centric architecture

  Data collection and search user interfaces driven by ontology

  Repository performs certain types of ontology-based reasoning

  ETL components transform data to ontology-conformant instances Why?

 Applications can seamlessly adapt to ontology evolution without code changes

Data is stored as RDF and follows Linked Open Data principles

  Query any eagle-i repository via a SPARQL endpoint

  All eagle-i resource instances are linkable (an instance is simply an URI) Why?

  Storage model best-adapted to ontology-conformant data

  Flexibility, extensibility

Page 16: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

The eagle-i software stack

Data collection

clients

Data tools

eagle-i ontology

Search Application

Sesame RDF store

Page 17: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

REST API

The eagle-i software stack

Sesame RDF store

Data tools

Search Application

eagle-i-app-dataTools.owl

eagle-i-app.owl

Application- specific Ontologies

Ontology Memory Model

EIOntModel API

Jena/Pellet

Domain Ontologies

ero.owl

mesh-diseases.owl ro.owl iao.owl

Bfo.owl etc… Data

collection webapp (GWT)

Data management

webapp (GWT)

ETL

Lucene Search UI (GWT)

Page 18: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

eagle-i ontology

Page 19: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

eagle-i data collection tool

Type browser: allows navigation of an ontology branch

eagle-i primary types

Object property:

ontology term

Object property:

instance list

Embedded instance

Required property

Datatype property

Page 20: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

eagle-i data collection tool

Workflow support

Page 21: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

eagle-i search

Faceted search

Autocomplete from instances and ontology

Page 22: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

eagle-i search

Instance pages with materialized properties

Page 23: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Layered ontology model

Modeling dichotomy

 Eagle-i ontology is a domain model aimed at capturing biological knowledge

 Application needs a model from which to derive behavior

Complexity

• Eagle-i ontology is interoperable; it builds on an upper ontology and imports numerous terms

• Not all ontology constructs translate into user-level constructs

Layered ontology model

• Application ontologies annotate domain ontologies with application-specific information and restrictions

Page 24: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Thing

Research Project

Human Study

Entity

Processual entity

Planned process

Occurrent

Epidemiological study Qualitative human study Quantitative human study

GWAS

Property 1

Property 2

Example

Page 25: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Ontology-driven development: process observations

Developing ontology-driven applications requires close collaboration between software developers and ontologists

•  Separation of concerns principle •  Process for owning, editing and annotating ontology files •  Annotations with a pure UI goal that require domain knowledge can be problematic

The applications provide ontology developers with a mechanism to rapidly test and refine their models for different usage scenarios

•  Data collection

•  Data retrieval

Page 26: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Challenges of producing and consuming linked data

Producing Linked Data

  Need to enforce ontology constraints

  ETL: in addition to producing ontology-conforming class instances, ETL processes need to inter-link them

Consuming Linked Data

•  Need to view the data through an ontology lens

•  Filter-out administrative and non-conforming triples

Page 27: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Concluding remarks

eagle-i is a proof-of-concept system

 A software suite

 A network of institutions

 An operational system with curated data

The eagle-i software and know-how are applicable to other problem spaces and domains

•  Ontology-driven framework goal: instantiate software stack for any ontology •  No code changes to core framework •  Annotate new domain ontology with eagle-i application ontology

eagle-i coming soon to open.med.harvard.edu

Page 28: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

www.eagle-i.org www.eagle-i.org

Demo scenarios

Page 29: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Overview

o  Scenario description

o  Entry of data into the Web Tool

o  Curation and publishing of data

o  Searching on data in the repository

o  How ontology integration makes resources visible

Page 30: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Scenario

Primary Scenario: Relapsing Fever – Host-Pathogen Interactions & Human Exposure

Dr. Olivier Lucas studies mechanisms of and ecological risk for infection with Borrelia hermsii, the tick-borne Relapsing Fever agent. He believes he has identified a role for IL-17 in disease resolution in a mouse model and would like to examine contributing immune cell populations. He’s also hoping to begin a study assessing B. hermsii exposure/seroconversion within rural populations in Montana. Lastly, he has received some departmental funds to support a work-study position in his lab.

Dr. Lucas wants to…

1. Advertise his vacant work-study research opportunity.

2. Obtain an IL-17 receptor antibody for his mouse work.

3. Locate a source of human biospecimens from MT for his seroconversion study.

Page 31: eagle-i: a national network of biomedical research resourcesfiles.meetup.com/1336198/eagle-i@SemWebMeetup.pdf · eagle-i: a national network of biomedical research resources Cambridge

Supporting Scenarios

Supporting Scenario A: Mucosal Immunity and Th17 Populations

Dr. David Pascual studies mucosal immunity and contributing T cell populations. He has developed a monoclonal antibody for the IL-17 receptor and now that this work has been published, would like to share his antibody.

Dr. Pascual wants to…

1. Advertise his IL-17 receptor mAb to potential collaborators.

Supporting Scenario B: Lipid Profiles and Cardiovascular Disease Risk

Dr. Donna Williams is a human health researcher studying cardiovascular disease risk factors in rural, geographically-isolated Montana communities. Some time ago she completed a study in which blood draws were obtained to assess total lipid profiles. Sera from these individuals was collected and frozen back for a potential analysis of inflammatory mediators but she’s since shifted her research focus.

Dr. Williams wants to…

1. Put this frozen sera to good use.