BioIT Europe 2010 - BioCatalogue

50
The Reality of Web Services in the Life Sciences Professor Carole Goble [email protected] University of Manchester, UK myGrid Project BioIT World Europe 2010, Hannov http:// www.biocatalogue.org

description

BioCatalogue presentation at BioIT Europe Hannover 2010 by prof Carole Goble

Transcript of BioIT Europe 2010 - BioCatalogue

Page 1: BioIT Europe 2010 - BioCatalogue

The Reality of Web Services in the Life Sciences

Professor Carole [email protected]

University of Manchester, UKmyGrid Project

BioIT World Europe 2010, Hannover

http://www.biocatalogue.org

Page 2: BioIT Europe 2010 - BioCatalogue

Web Services

• Programmatic Interfaces to Services.

• Machine-Machine communication

• Software Lego™ that works across the web and underpins enterprise SOA.

• Standard interfaces.• Two big families:

– SOAP and REST.

Page 3: BioIT Europe 2010 - BioCatalogue

Programmatic Interfaces to Services on the up…..

• Specialisation and segregation of methods from monolithic servers.

• Component packaging.• Publishing data and analyses.• Tools / resources integration.• Applications, analytic workflows,

workbenches and enterprise platforms

• Agile software development• Remote and in house execution • Loosely coupled systems.

http://ww

w.m

yexperiment.org/w

orkflows/15

8.html

Page 4: BioIT Europe 2010 - BioCatalogue

Service Providersand Consumers

• Core facility (EMBL-EBI, DDBJ, NCBI …)

• EMBL-EBI 8-10million hits/month• 329 services

• Community projects and labs

• Single Investigator projects

• Enterprises (e.g. Pharmas)

Public Private

Page 5: BioIT Europe 2010 - BioCatalogue

Web Service Rhetoric

• Pistoia Alliance

• BioIT Alliance

• ELIXIR

• But not all rosy … see Christian Hauck’s talk 16.00 Thursday.

Page 6: BioIT Europe 2010 - BioCatalogue

Web Service Technology Standards

• Simple Object Access Protocol– Remote Procedure Call based– HTTP transport protocol only– Web Service Description Language in

XML, UDDI registry– Extensible

• Representational State Transfer– Resource (document) style– HTTP and URI application protocol– XML and JSON responses, usually– GET / PUT / POST – Lightweight, webby

Page 7: BioIT Europe 2010 - BioCatalogue

Bio Service Special Flavours

• Distributed Annotation Services (www.biodas.org)

• BioMOBY (www.biomoby.org)

• SADI

• SSWAP (iPlant Collaborative)

Page 8: BioIT Europe 2010 - BioCatalogue

Where…can I find them? advertise mine?

What…do they do? can I use them?

How…do they work? up to date? reliable?

Who…provides them? recommends them? knows about them?

Reusing Public and Third Party Web Services

Page 9: BioIT Europe 2010 - BioCatalogue

Web Service Description Language

<wsdl:message name="getGlimmersResponse">

<wsdl:part name="getGlimmersReturn" type="xsd:string"/> </wsdl:message> <wsdl:message name="aboutServiceRequest"/> <wsdl:message name="getGlimmersRequest">

<wsdl:part name="in0" type="xsd:string"/> <wsdl:part name="in1" type="xsd:string"/> <wsdl:part name="in2" type="xsd:string"/> <wsdl:part name="in3" type="xsd:string"/> <wsdl:part name="in4" type="xsd:string"/> <wsdl:part name="in5" type="xsd:string"/> <wsdl:part name="in6" type="xsd:string"/> <wsdl:part name="in7" type="xsd:int"/> <wsdl:part name="in8" type="xsd:string"/>

Pathport Web service from the Virginia Bioinformatics Institute http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsd

Name of the service

Uninformative names for parameters

What kind of string?

Page 10: BioIT Europe 2010 - BioCatalogue

Services In the Wild

Find• EMBOSS clustalw program called ‘emma’

Execute• SOAP / REST / Quasi-REST / REST-like

Understand• Input0:string, Output0: string• What does SeqRet actually do?• Example data? Parameter configurations?

Input-Output correlations?

Use• Quality of Service, Monitoring, Robustness• Volatility, Sustained, License, Conditions of Use

Page 11: BioIT Europe 2010 - BioCatalogue

Cataloguingto avoid reinvention

• Investigator and project specific registries

• Community lists• Specialist

registries

• General catalogues and search engines

Page 12: BioIT Europe 2010 - BioCatalogue

An Open, Public, Curated, Boutique Cataloguefor Web Services serving the Life Sciences for the

Bioinformatics Community

http://www.biocatalogue.orgLaunched June 2009

Nucl Acids Res, June 2010, Web Servers issue doi: 10.1093/nar/gkq394

Page 13: BioIT Europe 2010 - BioCatalogue
Page 14: BioIT Europe 2010 - BioCatalogue
Page 15: BioIT Europe 2010 - BioCatalogue

UNDERSTANDand USE

UNDERSTANDand USE

Page 16: BioIT Europe 2010 - BioCatalogue

Prot

ein

Seq.

Alig

nmen

t

Prot

ein

Stru

ctur

e P

redi

ction

Prot

ein

Func

tion

Pred

ictio

n

Nuc

leoti

de S

eq. A

lignm

ent

Rna

stru

ctur

e pr

edic

tion

Gen

e Pr

edic

tion

Text

Min

ing

Ont

olog

y

Phyl

ogen

y

Mic

roar

ray

Sequ

ence

Ret

rieva

l

Iden

tifier

Ret

rieva

l

Stru

ctur

e Re

trie

val

Lite

ratu

re R

etrie

val

Gen

omic

s

Prot

eom

ics

Syst

ems

Biol

ogy

Bios

tatis

tics

Chem

oinf

orm

atics

Service Coverage1719 services – SOAP and REST

– 92% with service description– 57.5% with all ops/methods described

>60 classifications Big players: EBI, NCBI, DDBJ etc….

60 operations on chemistry and chem-informatics data

Page 17: BioIT Europe 2010 - BioCatalogue

[June 09 - Sep10]

Steady use: 2K+ unique IPs/month.

Page 18: BioIT Europe 2010 - BioCatalogue

• Chiefly public services• Community contributed

– Service Providers: 127– Third Parties: 92

submitters– 420 registered members– 27 countries

(UK>Spain>USA>Canada)

• Partners and registries– EMBRACE Registry,

SeekDa!, (BioMOBY, DAS)

• Automated crawling• Manual mining

Building Content and Community

Page 19: BioIT Europe 2010 - BioCatalogue

EMBL-EBI

DDBJ

NCBI

But these statistics have to be interpreted…..

Page 20: BioIT Europe 2010 - BioCatalogue

Curation

Chang

e log

s

Quantitative Annotations

Tags

Semantic Annotations

Ontologies

FunctionalCapabilities

Provenance

OperationalCapabilities

OperationalMetrics

Use Policy

Social Status

Ratings

AttributionFree text

Instrumentation

Usable and Useful

Understandable

Annotations

Bio-Services• EDAM• myGrid• BioMOBY…

Bioontologies• OBO

Foundry• BioPortal…

Services• WSMO• SAWSDL• SA-REST…

Page 21: BioIT Europe 2010 - BioCatalogue

Incremental Annotation50,672

• accumulate, aggregation, types, attribution

Page 22: BioIT Europe 2010 - BioCatalogue

Archived ServiceArchived Service

AnnotationsAnnotations

AttributionAttribution

Page 23: BioIT Europe 2010 - BioCatalogue

TaggingTagging

Social Social

Annotate AnythingAnnotate Anything

CategoriesCategories

Page 24: BioIT Europe 2010 - BioCatalogue

OperationsInputsOutputs

OperationsInputsOutputs

Example useExample use

Page 25: BioIT Europe 2010 - BioCatalogue

• Availability• API changes• Test script

sandbox

• Based on EMBRACE Registry Monitoring Framework

• Availability• API changes• Test script

sandbox

• Based on EMBRACE Registry Monitoring Framework

Social SharingFeeds

Social SharingFeeds

Page 26: BioIT Europe 2010 - BioCatalogue

WSDL, SAWSDL, SA-REST, WSMORDF and SPARQL

Service annotationformats

Gadgets, Apps

Customised and Private instances

A service / resource

Open Source (BSD)Open Platform

Read (Write) REST APIs

EDAM, BioMOBY, myGrid, OBO family, BioXSD

Annotation Ontologies

Page 27: BioIT Europe 2010 - BioCatalogue

People Powered ContentReward and AttributionSensitivities

Tools

Bringing a Community together

Automation

Core Contribution& CurationCoordinationGovernance

Content Capture & Curation

Page 28: BioIT Europe 2010 - BioCatalogue

GovernanceBlackhole

• Submission• Content• Ownership / submitter /

curator responsibilities• Responsibility migrations• Service update• Metadata update• Notifications• Withdrawal• Take-down• Archiving• Preservation

Page 29: BioIT Europe 2010 - BioCatalogue

Curating third party services is HARD

The Reality of Web Services in the Life Sciences

The Reality of (Expert) Crowd Sourcing Contributions

for a Web Service Catalogue

Page 30: BioIT Europe 2010 - BioCatalogue

Eight years ago Lincoln Stein said…

“An interface is a contract between data provider and

data consumer”

Stein L Creating a bioinformatics nation. Nature 2002;417:119-120.

Page 31: BioIT Europe 2010 - BioCatalogue

A Public interface means a Public Service

• Thinking local not global– Local configuration bake-ins – Scalability – I/O and load– Interface granularity and interaction

chattiness

• Interface churn– Silent API volatility– BioCatalogue Change logs– Web Interface trumps API– Local application trumps dependent

external ones

Ensembl API: updated on every release, not backward compatible with obscured versioning.

BioMART: exposed internal identifier formats and then changed them.

Page 32: BioIT Europe 2010 - BioCatalogue

Preservation

(Public) Service Sustainability

Staff/funding/project churn• 2 year availability, responsibility migration/hole, service

decay -> application decay• 58% developed by students, 24% stated not maintained • (Schultheiss et al. (2010) PLoS Comp Biol (in review))• 146 services archived, >90% availability

Sustainability strategyMake it portable, Provide documentationUse existing frameworks and practicesInvolve the community and know your usersPlan sunset or migrationFunding models for sustainability

Page 33: BioIT Europe 2010 - BioCatalogue

Schultheiss et al. (2010) PLoS Comp Biol (in review)

Page 34: BioIT Europe 2010 - BioCatalogue

Geek UsabilityQuasi-Standards

• http://xml.nig.ac.jp/rest/Invoke?• service={x}&method={y}&...

• Which service? Need to know precisely what is expected for every service at the same endpoint

• http://xml.nig.ac.jp/{service}/{method}?...• Service-method pairs

y

like

http://BASE/op?parameter={value}

Page 35: BioIT Europe 2010 - BioCatalogue

Usability: The What and How are Implicit knowledge

• No or lots of docs, poor examples• Complexity• Interfaces and Operation• Service families

Service

OperationOperationOperationOperationOperationOperationOperationOperationOperationOperation

Input

Output

Parameters

Errors

Page 36: BioIT Europe 2010 - BioCatalogue
Page 37: BioIT Europe 2010 - BioCatalogue

Behaviour families

Function

Polymorphic

Patterns

e.g. KEGG, TFmodeller

e.g. searchSimple operation in BLAST DDBJ

e.g. InterProScan (EBI), RapidMiner, Soaplab Server

Domain Tasks

Invocable operations

Page 38: BioIT Europe 2010 - BioCatalogue

query database program

searchSimple

Polymorphic One operation

multiple functional unitsBLAST (DDBJ)

1 Operation: searchSimple

5 Functional units

PD: protein sequence databaseND: nucleotide sequence database

proteinBlast

blastp proteinPD

nucleotideBlast

blastn nucleotide ND

proteinNucleotideBlast

tblastn nucleotideND

nucleotideProteinBlast

blastx protein PD

nucleotideBlastFrameTranslation

tblastx nucleotide ND

Page 39: BioIT Europe 2010 - BioCatalogue

Server Wrapper Pattern

• SOAPLab services operations

• clear | describe | getLastEvent | getResults | getResultsInfo | getStatus | run | runAndWaitFor | terminate | waitfor |

• All 100 or so services have same WSDL document.

Page 40: BioIT Europe 2010 - BioCatalogue

The SOAP/REST technical view over services is not enough

Need a functional / task-oriented view

Page 41: BioIT Europe 2010 - BioCatalogue

Functional Unitannotation

• Service description abstraction

• Services as functional tasks

• Within the boundary of a service

• Independent from technology used

Service

OperationOperationOperationOperationOperationOperationOperationOperationOperationOperationW

SD

LR

ES

TD

AS

[Missier, et al 2010 Functional Units: Abstractions for Web Service Annotations]

Page 42: BioIT Europe 2010 - BioCatalogue

Complexitybecause it’s a database really

SABIO–RK Service only

Taverna workflow

find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions.

Page 43: BioIT Europe 2010 - BioCatalogue

Reflections

• Writing reusable, reliable (public) services with good and stable interfaces for others is hard

• A service interface is different to a web interface or a database query interface.

• Public interfaces – internal interfaces mismatch• Publishing an interface is a publishing step.• Technologist – User mismatch• Eat your own dog food• Takes resource, time and trouble• But will pay off! We can’t afford to reinvent.

Page 44: BioIT Europe 2010 - BioCatalogue

Enterprise Concerns:real or perceived?

• Security– HTTPS trusted peers inside a firewall– WS-Security and OAuth (REST)– Or is it fear of using external data?

• Performance– Signature granularity and chattiness– Data shipping vs reference shipping– XML and JSON are not the only

formats

• Governance– Service Level Agreements

Technical or social issues?

Page 45: BioIT Europe 2010 - BioCatalogue

Collaborative Curating

• Socialising the community• Rewarding contributors

• 10:90 long tail rule• Content feedback spiral

• Feedback sensitivities• Reputation protection

• Widen - Smart application feeds

• Resourced core content team

Page 46: BioIT Europe 2010 - BioCatalogue

Cost of Crowd Curation

Page 47: BioIT Europe 2010 - BioCatalogue

Take home

• Emerging, evolving, exciting and challenging Web service ecosystem

• BioCatalogue draws together services, knowledge and community to provide intelligence.

• Crowd collaboration to scale contribution, core to coordinate

• Open effort – contribute or adopt• Core resource – for Alliances and Journals

• Social + technical challenges• Christian Hauck’s talk 16.00 Thursday.

Page 48: BioIT Europe 2010 - BioCatalogue

Credits

Thomas LaurentHamish McWilliams

Franck Tanoh Jiten BhagatCarole Goble

Rodrigo LopezEric Nzuobontane

Steve Pettifer

Katy Wolstencroft

Robert Stevens

David De Roure

52

Mannie Tagarira

Jerzy OrlowskiSergejs Aleksejevs

Page 49: BioIT Europe 2010 - BioCatalogue
Page 50: BioIT Europe 2010 - BioCatalogue

Thank You

http://www.biocatalogue.org

About Us - http://wiki.biocatalogue.org

API Docs - http://apidocs.biocatalogue.org

11th July 2010 54ISMB 10

Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: BioCatalogue: a universal catalogue of web services for the life sciences, Nucl. Acids Res., 2010.

doi:10.1093/nar/gkq394