BioIT Europe 2010 - BioCatalogue
-
Upload
biocatalogue -
Category
Technology
-
view
557 -
download
2
description
Transcript of BioIT Europe 2010 - BioCatalogue
The Reality of Web Services in the Life Sciences
Professor Carole [email protected]
University of Manchester, UKmyGrid Project
BioIT World Europe 2010, Hannover
http://www.biocatalogue.org
Web Services
• Programmatic Interfaces to Services.
• Machine-Machine communication
• Software Lego™ that works across the web and underpins enterprise SOA.
• Standard interfaces.• Two big families:
– SOAP and REST.
Programmatic Interfaces to Services on the up…..
• Specialisation and segregation of methods from monolithic servers.
• Component packaging.• Publishing data and analyses.• Tools / resources integration.• Applications, analytic workflows,
workbenches and enterprise platforms
• Agile software development• Remote and in house execution • Loosely coupled systems.
http://ww
w.m
yexperiment.org/w
orkflows/15
8.html
Service Providersand Consumers
• Core facility (EMBL-EBI, DDBJ, NCBI …)
• EMBL-EBI 8-10million hits/month• 329 services
• Community projects and labs
• Single Investigator projects
• Enterprises (e.g. Pharmas)
Public Private
Web Service Rhetoric
• Pistoia Alliance
• BioIT Alliance
• ELIXIR
• But not all rosy … see Christian Hauck’s talk 16.00 Thursday.
Web Service Technology Standards
• Simple Object Access Protocol– Remote Procedure Call based– HTTP transport protocol only– Web Service Description Language in
XML, UDDI registry– Extensible
• Representational State Transfer– Resource (document) style– HTTP and URI application protocol– XML and JSON responses, usually– GET / PUT / POST – Lightweight, webby
Bio Service Special Flavours
• Distributed Annotation Services (www.biodas.org)
• BioMOBY (www.biomoby.org)
• SADI
• SSWAP (iPlant Collaborative)
Where…can I find them? advertise mine?
What…do they do? can I use them?
How…do they work? up to date? reliable?
Who…provides them? recommends them? knows about them?
Reusing Public and Third Party Web Services
Web Service Description Language
<wsdl:message name="getGlimmersResponse">
<wsdl:part name="getGlimmersReturn" type="xsd:string"/> </wsdl:message> <wsdl:message name="aboutServiceRequest"/> <wsdl:message name="getGlimmersRequest">
<wsdl:part name="in0" type="xsd:string"/> <wsdl:part name="in1" type="xsd:string"/> <wsdl:part name="in2" type="xsd:string"/> <wsdl:part name="in3" type="xsd:string"/> <wsdl:part name="in4" type="xsd:string"/> <wsdl:part name="in5" type="xsd:string"/> <wsdl:part name="in6" type="xsd:string"/> <wsdl:part name="in7" type="xsd:int"/> <wsdl:part name="in8" type="xsd:string"/>
Pathport Web service from the Virginia Bioinformatics Institute http://pathport.vbi.vt.edu/services/wsdls/beta/glimmer.wsd
Name of the service
Uninformative names for parameters
What kind of string?
Services In the Wild
Find• EMBOSS clustalw program called ‘emma’
Execute• SOAP / REST / Quasi-REST / REST-like
Understand• Input0:string, Output0: string• What does SeqRet actually do?• Example data? Parameter configurations?
Input-Output correlations?
Use• Quality of Service, Monitoring, Robustness• Volatility, Sustained, License, Conditions of Use
Cataloguingto avoid reinvention
• Investigator and project specific registries
• Community lists• Specialist
registries
• General catalogues and search engines
An Open, Public, Curated, Boutique Cataloguefor Web Services serving the Life Sciences for the
Bioinformatics Community
http://www.biocatalogue.orgLaunched June 2009
Nucl Acids Res, June 2010, Web Servers issue doi: 10.1093/nar/gkq394
UNDERSTANDand USE
UNDERSTANDand USE
Prot
ein
Seq.
Alig
nmen
t
Prot
ein
Stru
ctur
e P
redi
ction
Prot
ein
Func
tion
Pred
ictio
n
Nuc
leoti
de S
eq. A
lignm
ent
Rna
stru
ctur
e pr
edic
tion
Gen
e Pr
edic
tion
Text
Min
ing
Ont
olog
y
Phyl
ogen
y
Mic
roar
ray
Sequ
ence
Ret
rieva
l
Iden
tifier
Ret
rieva
l
Stru
ctur
e Re
trie
val
Lite
ratu
re R
etrie
val
Gen
omic
s
Prot
eom
ics
Syst
ems
Biol
ogy
Bios
tatis
tics
Chem
oinf
orm
atics
Service Coverage1719 services – SOAP and REST
– 92% with service description– 57.5% with all ops/methods described
>60 classifications Big players: EBI, NCBI, DDBJ etc….
60 operations on chemistry and chem-informatics data
[June 09 - Sep10]
Steady use: 2K+ unique IPs/month.
• Chiefly public services• Community contributed
– Service Providers: 127– Third Parties: 92
submitters– 420 registered members– 27 countries
(UK>Spain>USA>Canada)
• Partners and registries– EMBRACE Registry,
SeekDa!, (BioMOBY, DAS)
• Automated crawling• Manual mining
Building Content and Community
EMBL-EBI
DDBJ
NCBI
But these statistics have to be interpreted…..
Curation
Chang
e log
s
Quantitative Annotations
Tags
Semantic Annotations
Ontologies
FunctionalCapabilities
Provenance
OperationalCapabilities
OperationalMetrics
Use Policy
Social Status
Ratings
AttributionFree text
Instrumentation
Usable and Useful
Understandable
Annotations
Bio-Services• EDAM• myGrid• BioMOBY…
Bioontologies• OBO
Foundry• BioPortal…
Services• WSMO• SAWSDL• SA-REST…
Incremental Annotation50,672
• accumulate, aggregation, types, attribution
Archived ServiceArchived Service
AnnotationsAnnotations
AttributionAttribution
TaggingTagging
Social Social
Annotate AnythingAnnotate Anything
CategoriesCategories
OperationsInputsOutputs
OperationsInputsOutputs
Example useExample use
• Availability• API changes• Test script
sandbox
• Based on EMBRACE Registry Monitoring Framework
• Availability• API changes• Test script
sandbox
• Based on EMBRACE Registry Monitoring Framework
Social SharingFeeds
Social SharingFeeds
WSDL, SAWSDL, SA-REST, WSMORDF and SPARQL
Service annotationformats
Gadgets, Apps
Customised and Private instances
A service / resource
Open Source (BSD)Open Platform
Read (Write) REST APIs
EDAM, BioMOBY, myGrid, OBO family, BioXSD
Annotation Ontologies
People Powered ContentReward and AttributionSensitivities
Tools
Bringing a Community together
Automation
Core Contribution& CurationCoordinationGovernance
Content Capture & Curation
GovernanceBlackhole
• Submission• Content• Ownership / submitter /
curator responsibilities• Responsibility migrations• Service update• Metadata update• Notifications• Withdrawal• Take-down• Archiving• Preservation
Curating third party services is HARD
The Reality of Web Services in the Life Sciences
The Reality of (Expert) Crowd Sourcing Contributions
for a Web Service Catalogue
Eight years ago Lincoln Stein said…
“An interface is a contract between data provider and
data consumer”
Stein L Creating a bioinformatics nation. Nature 2002;417:119-120.
A Public interface means a Public Service
• Thinking local not global– Local configuration bake-ins – Scalability – I/O and load– Interface granularity and interaction
chattiness
• Interface churn– Silent API volatility– BioCatalogue Change logs– Web Interface trumps API– Local application trumps dependent
external ones
Ensembl API: updated on every release, not backward compatible with obscured versioning.
BioMART: exposed internal identifier formats and then changed them.
Preservation
(Public) Service Sustainability
Staff/funding/project churn• 2 year availability, responsibility migration/hole, service
decay -> application decay• 58% developed by students, 24% stated not maintained • (Schultheiss et al. (2010) PLoS Comp Biol (in review))• 146 services archived, >90% availability
Sustainability strategyMake it portable, Provide documentationUse existing frameworks and practicesInvolve the community and know your usersPlan sunset or migrationFunding models for sustainability
Schultheiss et al. (2010) PLoS Comp Biol (in review)
Geek UsabilityQuasi-Standards
• http://xml.nig.ac.jp/rest/Invoke?• service={x}&method={y}&...
• Which service? Need to know precisely what is expected for every service at the same endpoint
• http://xml.nig.ac.jp/{service}/{method}?...• Service-method pairs
y
like
http://BASE/op?parameter={value}
Usability: The What and How are Implicit knowledge
• No or lots of docs, poor examples• Complexity• Interfaces and Operation• Service families
Service
OperationOperationOperationOperationOperationOperationOperationOperationOperationOperation
Input
Output
Parameters
Errors
Behaviour families
Function
Polymorphic
Patterns
e.g. KEGG, TFmodeller
e.g. searchSimple operation in BLAST DDBJ
e.g. InterProScan (EBI), RapidMiner, Soaplab Server
Domain Tasks
Invocable operations
query database program
searchSimple
Polymorphic One operation
multiple functional unitsBLAST (DDBJ)
1 Operation: searchSimple
5 Functional units
PD: protein sequence databaseND: nucleotide sequence database
proteinBlast
blastp proteinPD
nucleotideBlast
blastn nucleotide ND
proteinNucleotideBlast
tblastn nucleotideND
nucleotideProteinBlast
blastx protein PD
nucleotideBlastFrameTranslation
tblastx nucleotide ND
Server Wrapper Pattern
• SOAPLab services operations
• clear | describe | getLastEvent | getResults | getResultsInfo | getStatus | run | runAndWaitFor | terminate | waitfor |
• All 100 or so services have same WSDL document.
The SOAP/REST technical view over services is not enough
Need a functional / task-oriented view
Functional Unitannotation
• Service description abstraction
• Services as functional tasks
• Within the boundary of a service
• Independent from technology used
Service
OperationOperationOperationOperationOperationOperationOperationOperationOperationOperationW
SD
LR
ES
TD
AS
[Missier, et al 2010 Functional Units: Abstractions for Web Service Annotations]
Complexitybecause it’s a database really
SABIO–RK Service only
Taverna workflow
find chemical reactions that are associated with a given metabolite, and the kinetics associated with those reactions.
Reflections
• Writing reusable, reliable (public) services with good and stable interfaces for others is hard
• A service interface is different to a web interface or a database query interface.
• Public interfaces – internal interfaces mismatch• Publishing an interface is a publishing step.• Technologist – User mismatch• Eat your own dog food• Takes resource, time and trouble• But will pay off! We can’t afford to reinvent.
Enterprise Concerns:real or perceived?
• Security– HTTPS trusted peers inside a firewall– WS-Security and OAuth (REST)– Or is it fear of using external data?
• Performance– Signature granularity and chattiness– Data shipping vs reference shipping– XML and JSON are not the only
formats
• Governance– Service Level Agreements
Technical or social issues?
Collaborative Curating
• Socialising the community• Rewarding contributors
• 10:90 long tail rule• Content feedback spiral
• Feedback sensitivities• Reputation protection
• Widen - Smart application feeds
• Resourced core content team
Cost of Crowd Curation
Take home
• Emerging, evolving, exciting and challenging Web service ecosystem
• BioCatalogue draws together services, knowledge and community to provide intelligence.
• Crowd collaboration to scale contribution, core to coordinate
• Open effort – contribute or adopt• Core resource – for Alliances and Journals
• Social + technical challenges• Christian Hauck’s talk 16.00 Thursday.
Credits
Thomas LaurentHamish McWilliams
Franck Tanoh Jiten BhagatCarole Goble
Rodrigo LopezEric Nzuobontane
Steve Pettifer
Katy Wolstencroft
Robert Stevens
David De Roure
52
Mannie Tagarira
Jerzy OrlowskiSergejs Aleksejevs
Thank You
http://www.biocatalogue.org
About Us - http://wiki.biocatalogue.org
API Docs - http://apidocs.biocatalogue.org
11th July 2010 54ISMB 10
Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: BioCatalogue: a universal catalogue of web services for the life sciences, Nucl. Acids Res., 2010.
doi:10.1093/nar/gkq394