GGF Summer School 24th July 2004, Italy
Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid
Professor Carole Goble
University of Manchester
http://www.mygrid.org.uk
GGF Summer School 24th July 2004, Italy
Virtual organisations and (Re)use
ServiceProviders
Bioinformaticians
BiologistsAnnotation providers
Tool & middleware developers
Service & PlatformAdministrators
ServiceInformationWorkflow
Registries mIRs Resources
GGF Summer School 24th July 2004, Italy
Finding and selecting services
Activation energy gradientUnregistered services• Scavenging• URLs and Soaplab endpoints
– IntrospectionRegistered services• Word-based searching• Semantic annotation for later
discovery and (re)use by friends and strangers in your VO (Part 3)
Drag and drop services onto Taverna workbench
GGF Summer School 24th July 2004, Italy
Registry View Service• Registry• Third party registries• Third party services• Third party annotation
(RDF)• Views over federated
registries• UDDI interfaces
extended with RDF• Federated views
– Updated via Notification Service
– Personalized based on Annotation
• Authorisation and IPR
GGF Summer School 24th July 2004, Italy
Semantic discovery• User chooses services• A common ontology is
used to annotate and query any myGrid object including services.
• Discover workflows and services described in the registry via Taverna.
• Look for all workflows that accept an input of semantic type nucleotide sequence
• Aim to have semantic discovery over public view on the Web.
GGF Summer School 24th July 2004, Italy
Workflow and service annotation
• Adding structured metadata to a workflow registration to enable others to discover and reuse it more effectively. E.g. what semantic type of input does it accept.
GGF Summer School 24th July 2004, Italy
Can you guess what it is yet?
GGF Summer School 24th July 2004, Italy
Service Registrationhttp://pedro.man.ac.uk
GGF Summer School 24th July 2004, Italy
Semantic Discovery• Drag a workflow
entry into the explorer pane and the workflow loads.
• Drag a service/ workflow to the scavenger window for inclusion into the workflow
GGF Summer School 24th July 2004, Italy
Annotation
Registryplug-in
Taverna Workbench
Registry(Personalised View)
PedroAnnotation tool
Ontology Store
Vocabulary
Others
WSDLSoap-
lab
ServiceProviders
Description extraction
Ontologists
Registry
Registry
Interface Description
Annotation/description
Annotation providers
GGF Summer School 24th July 2004, Italy
Annotation
Storeplug-in
Taverna Workbench
PedroAnnotation tool
Annotation providers
Ontology Store
Vocabulary
Ontologists
Annotation/description
mIR
HaystackProvenance
Browser
Scientists
GGF Summer School 24th July 2004, Italy
Taverna Workbench
Registry(Personalised View)
Registry
Ontology Store
Vocabulary Others
WSDLSoap-
lab
ServiceProviders
Ontologists
Registryplug-in
Registry
Registry
FetaSemantic Discovery
FreeFluoWfEE
WorkflowExecution
Bioinformaticians
mIRStore data &
metadata
invoking
Fetaplug-in
GGF Summer School 24th July 2004, Italy
Layered Semantics• Domain Semantics layered on top of domain neutral but
scientific data model• Reducing the activation energy, lowering barriers of
entry.
Workflow metadata
Provenance metadataService Metadata
Data Metadata
Syntax
Domain SemanticsOntologies
WorkflowOGSA-DQP
FormatXSD typesMIME types
Experiment SemanticsIMv2
GGF Summer School 24th July 2004, Italy
Operation
name, descriptiontaskmethodresourceapplication
Service
namedescriptionauthororganisation
Parameter
name, descriptionsemantic typeformattransport typecollection typecollection format
Model of services
WSDL based Web service
WSDL basedoperation
Soaplab servicebioMoby serviceworkflow
hasInput
hasOutput
Local Java code
subclasssubclass
GGF Summer School 24th July 2004, Italy
BLASTBLASTBLASTservice
Sequence similarity search
createJob()
setProgram()
run()
getResults()
setDatabase()
setE_value()
blastQuery()
or
IBM Life Sciences service
SOAPLAB service
Task Service class Specific services
Classes of servicesDomain “semantic”“Unexecutable”“Potentials”
Instances of servicesBusiness “operational”“Executable”“Actuals”
Tiered specifications
GGF Summer School 24th July 2004, Italy
Matrix of m
etadata in w
orkflow lifecycle
GGF Summer School 24th July 2004, Italy
Stratified metadata• Service Type and Class (OWL)
• Service Instance (RDF)
GGF Summer School 24th July 2004, Italy
Service and Workflow registration
• Description scheme
• RDFS & DAML+OIL / OWL ontologies of services & biology
• Based on DAML-S• Reasoning over
OWL descriptions• Query over RDF• Aim to have
semantic discovery over public view on the web.
Workflow registryentry
OperationalDescriptionsCost, QoS
Access rights…
OperationalDescriptionsCost, QoS
Access rights…
WorkfllowExecutive Summary
DescriptionsInputs,
Outputs,Tasks,
Component resources
Syntacticdescriptions
e.g. MIME types
Invokable Interface descriptions
e.g. XML data types
Invokable Interface descriptions
e.g. XML data types
ConceptualdescriptionsConceptual
descriptions
RDF
OWL
OWL/RDF
RDF Store
stored
encoded
ScuflURI
ProvenanceDescriptions
Authors, creation date, institution…
ProvenanceDescriptions
Authors, creation date, institution…
WSDL
Workflow registration allows peer review and publication of e-Science methods.
GGF Summer School 24th July 2004, Italy
Bioinformatics ontology
Web serviceontology
Task ontology
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Upper levelontology
Service Ontology Suiteparameters: input, output, precondition, effectperforms_taskuses-resourceis_function_of
Inspired by DAML-S
Current work: Joint development on an Open Biological OntologiesBioService Ontology. http://obo.sourceforge.net/
GGF Summer School 24th July 2004, Italy
Reflections
• Adverts for services and workflows turns out to be tricky– Describing different executable objects
• Workflows and Services
– Stratification of metadata• Classes and Instances of services and workflows
– Service execution• Complex state based invocation models• Parametric polymorphism of services• Executable process models vs discovery process
models• Multi-dimensions of service composition.
GGF Summer School 24th July 2004, Italy
Reflections• Multiple descriptions, multiple interfaces
– Users needs vs machine needs
• The dimensions of Service Class substitution– Biologists choose experimentally meaningful services
and do not want “semantically similar” substitutions; only substituting one instance for another
– Experimentally neutral “glue” services that can be substituted are comparatively few
– If users are choosing services you don’t need many kinds of metadata to eliminate 90% of options.
GGF Summer School 24th July 2004, Italy
Reuse and Repurposing
• Describing for reuse is challenging– Reuse depends on semantic descriptions and
these are costly to produce– Describing for someone else’s benefit– Reuse by multiple stakeholders
• Licensing workflows for reuse.• Authorisation models• But reuse does happen!• Metadata pays off but it needs a network
effect and there is a cost.
GGF Summer School 24th July 2004, Italy
So far, Using Concepts• Controlled vocabulary for advertisements for
workflows and services• Indexes into registries and mIR
– Semantic discovery of services and workflows– Semantic discovery of repository entries
• Type management for composition– Semantic workflow construction: guidance and
validation• Navigation paths between data and knowledge
holdings– Semantic “glue” between repository entries– Semantic annotation and linking of workflow
provenance logs
GGF Summer School 24th July 2004, Italy
Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid
GGF Summer School 24th July 2004, Italy
Provenance
Experiments being performed repeatedly, at different site, different time, by different users or groups;
Scientists
In silico experiments:
A large repository of records about experiments!!•verification of data;• “recipes” for experiment designs;• explanation for the impact of changes;• ownership;• performance of services;• data quality;
GGF Summer School 24th July 2004, Italy
Provenance Web
serviceInvocation1
serviceInvocation2
data1
data2 data3
data4
Process provenance
WSDL
Data provenance
Organisation provenance
dataAnother
Knowledge provenance
GenomicProject
GGF Summer School 24th July 2004, Italy
Representing links
• Identify each resource– Life science identifier: URI with associated data and
metadata retrieval protocols.– Understanding that underlying data will not change
urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3
GGF Summer School 24th July 2004, Italy
Representing links II
• Identify link type– Again use URI– Allows us to use RDF infrastructure
• Repositories• Ontologies
urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3
http://www.mygrid.org.uk/ontology#derived_from
GGF Summer School 24th July 2004, Italy
Provenance Pyramid
Knowledge Level
Organisation Level
Data Level
Process Level
GGF Summer School 24th July 2004, Italy
Workflow run
Workflow design
Experiment design
Project
Person
Organisation
Process
Service
Event
Data item
Data itemData item
data derivation e.g. output data derived from input data
knowledge statementse.g. similar protein sequence to
instanceOf
partOf componentProcesse.g. web service invocation of BLAST @ NCBI
componentEvente.g. completion of a web service invocation at 12.04pm
runBye.g. BLAST @ NCBI
run for
Organisation level provenance Process level provenance
Data/ knowledge level provenance
User can add templates to each workflow process to determine links between data items.
GGF Summer School 24th July 2004, Italy
19747251 AC005089.3831Homo sapiens BAC
clone CTA-315H11 from 7, complete sequence15145617 AC073846.6
815Homo sapiens BAC
clone RP11-622P13 from 7, complete sequence15384807 AL365366.20
46.1Human DNA sequence
from clone RP11-553N16 on chromosome 1, complete sequence7717376 AL163282.2
44.1Homo sapiens
chromosome 21 segment HS21C08216304790 AL133523.5
44.1Human chromosome 14
DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence34367431 BX648272.1
44.1Homo sapiens mRNA;
cDNA DKFZp686G08119 (from clone DKFZp686G08119)5629923 AC007298.17
44.1Homo sapiens 12q22
BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence34533695 AK126986.1
44.1Homo sapiens cDNA
FLJ45040 fis, clone BRAWH302048620377057 AC069363.10
44.1Homo sapiens
chromosome 17, clone RP11-104J23, complete sequence4191263 AL031674.1
44.1Human DNA sequence
from clone RP4-715N11 on chromosome 20q13.1-13.2 Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence17977487 AC093690.5
44.1Homo sapiens BAC
clone RP11-731I19 from 2, complete sequence17048246 AC012568.7
44.1Homo sapiens
chromosome 15, clone RP11-342M21, complete sequence14485328 AL355339.7
44.1Human DNA sequence
from clone RP11-461K13 on chromosome 10, complete sequence5757554 AC007074.2
44.1Homo sapiens PAC
clone RP3-368G6 from X, complete sequence4176355 AC005509.1
44.1Homo sapiens
chromosome 4 clone B200N5 map 4q25, complete sequence2829108 AF042090.1
44.1Homo sapiens
chromosome 21q22.3 PAC 171F15, complete sequence
>gi|19747251|gb|AC005089.3| Homo sapiens BAC clone CTA-315H11 from 7, complete sequenceAAGCTTTTCTGGCACTGTTTCCTTCTTCCTGATAACCAGAGAAGGAAAAGATCTCCATTTTACAGATGAGGAAACAGGCTCAGAGAGGTCAAGGCTCTGGCTCAAGGTCACACAGCCTGGGAACGGCAAAGCTGATATTCAAACCCAAGCATCTTGGCTCCAAAGCCCTGGTTTCTGTTCCCACTACTGTCAGTGACCTTGGCAAGCCCTGTCCTCCTCCGGGCTTCACTCTGCACACCTGTAACCTGGGGTTAAATGGGCTCACCTGGACTGTTGAGCG
urn:lsid:taverna:datathing:15
..BLAST_Report
rdf:type
urn:lsid:taverna:datathing:13
..similar_sequences_to
.. nucleotide_sequence
rdf:type
service invocation
..created_by
workflow invocation
workflow definition
experiment definition
project
person
group
service description
organisation
..described_by
..run_during
..invocation_of
..part_of
..works_for
..part_of
..part_of
..author
..author
..run_for
A B
..masked_sequence_of
..filtered_version_of
Relationship BLAST report has with other items in the repository
Other classes of information related to BLAST report
Provenance tracking
• Automated generation of this web of links
• Workflow enactor generates– LSIDs– Data derivation links– Knowledge links– Process links– Organisation links
GGF Summer School 24th July 2004, Italy
Haystack (IBM/MIT)GenBank
record
Portion of the Web of
provenance
Managing collection of
sequences for review
GGF Summer School 24th July 2004, Italy
GGF Summer School 24th July 2004, Italy
Reflections• Visualisation of results usually domain specific• Provenance browsing and querying needs to fit with that
visualisation• Generic graphical presentation limited to small, low
complexity result sets• Layered provenance for different purposes and different
stakeholders– Detailed process for debugging and usage statistics
for QoS – Data and Knowledge for the Scientist
• Migration with data objects• Versioning• Using provenance to its maximum potential
GGF Summer School 24th July 2004, Italy
Map of Context
Literature relevant to
provenance study or data in this
workflow
Literature relevant to
provenance study or data in this
workflowOWL Ontologies mapping between objects
ExperimentNotes
Interlinking graph of the workflow that generates the provenance logs
Web page of people who has related interests as the owner of the workflow
Provenance record of a workflow run
RDF
HTML
XML
XML
PDFLSID
XML
LSID
URI
GGF Summer School 24th July 2004, Italy
Provenance metadata
• Outside objects– RDF store
• Within objects– LSID metadata.
LSID metadata
URI LSID
LSID URI
GGF Summer School 24th July 2004, Italy
Linked Provenance Resources
The subsumed concepts
The subsuming concepts
Link to the log annotated with more general concept
Link to the log annotated with more specific concept
GGF Summer School 24th July 2004, Italy
Generating Links
The generated Link to related provenance
document
The concept
The name of the data
GGF Summer School 24th July 2004, Italy
SemanticsOntology-aided
workflow construction
Ontology-aided workflow
construction
• RDF-based service and data registries
• RDF-based metadata for ALL experimental components
• RDF-based provenance graphs
• OWL based controlled vocabularies for database content
• OWL based integration of experiment entities RDF-based semantic
mark up of results, logs, notes, data
entries
RDF-based semantic mark up of results, logs, notes, data
entries
GGF Summer School 24th July 2004, Italy
Standards• By tapping into (defacto) standards (LSID, RDF,
WS-I) and communities we can leverage others results and tools– Haystack, Pedro, Jena, CHEF/Sakai.
• The Grid standards are confusing and volatile– The choice of vanilla Web Services was good.– We didn’t jump to OGSI. We won’t jump to WSRF
until its necessary.
• And workflow standards have been untimely.
GGF Summer School 24th July 2004, Italy
Role of Ontologies
Composing and validating workflows and service compositions & negotiations
Describing & Linking Provenance records
Change & event Notification topics
Ontologies
Resource annotations
Service & resource registration & discovery
Schema mediation
Controlling contents of metadata and dataKnowledge-based guidance
and recommendation
Service matching and provisioning
Help
GGF Summer School 24th July 2004, Italy
Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid
GGF Summer School 24th July 2004, Italy
A pioneer of the…
The Semantic Grid is an extension of the current Grid in which information and services are given well-defined and explicitly represented meaning, better enabling computers and peopleto work in cooperation
Semantics in and on the Grid
GGF Summer School 24th July 2004, Italy
The semantics of knowledge
• Semantic Grids– Grids and Grid middleware that makes use of
semantics for its installation, deployment, running etc.
– I.e. Semantics IN the Grid FOR the Grid.
• Knowledge Grids– A virtual knowledge base derived by using the
Grid resources, in the same spirit as a data grid is a virtual data resource and a compute grid a virtual computer. Knowledge Grids include services for knowledge mining.
– I.e Semantics ON the Grid arising from the USE of the Grid.
GGF Summer School 24th July 2004, Italy
Com
puter
ScientistsScientific
Applications
Grid Middleware
Grid platform
and resources
Security policies
standards
Scientists
Ser
vice
Providers
Knowledge Stakeholders
Knowledge for the Grid Application
Semantics for the Grid
Sources of Knowledge
GGF Summer School 24th July 2004, Italy
“The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. It is based on the idea of having data on the Web defined and linked such that it can be used for more effective discovery, automation, integration, and reuse across various applications.”
Hendler, J., Berners-Lee, T., and Miller, E. Integrating Applications on the Semantic Web, 2002,
http://www.w3.org/2002/07/swint.
GGF Summer School 24th July 2004, Italy
Big VisionThe Web today is:• A hypermedia digital
library– Collection of linked web
pages• Ubiquitous interface to
applications– Amazon.com
• A platform for multimedia– BBC Radio 4 in my room!
• A naming scheme– Unique identity for
resources
From machine readable resources for
humans to
computable resources for machines
Why not make the computers do the work?
A place where people do the work, filtering, linking and interpreting. Computers do
the presentation.
GGF Summer School 24th July 2004, Italy
http://www.marriott.com/epp/...http://www.amia.org/meetings/...hasvenue
hotelhaslocation
Washington
USA
locatedin
haslocation
city
country
locatedin
conferenceorganisedby
http://www.amia.org/about/
event
dates
period
• Expose the meaning of resources by assertions in a common data model…
• Publish and share consensually agreed ontologies so we can share the metadata and add in background knowledge
• Then we can query, filter, integrate and aggregate the metadata …
• and reason over it to infer more metadata using rules …
• and attribute trust to the metadata.
GGF Summer School 24th July 2004, Italy
Infrastructure enablers for e-Research
• On demand transparently constructed multi-organisational federations of distributed services
• Distributed computing middleware
• Computational Integration• Sharing Resources
• An automatically processable, machine understandable web
• Distributed knowledge and information management
• Information integration• Sharing information
Grid Computing
Grid Computing
Semantic Web
Semantic Web
GGF Summer School 24th July 2004, Italy
Semantic Web layers
MetadataAnnotation
Ontologies
Web
Deep web
Rulesp -> a; p=a
p -> a; p=a
p -> a; p=a
p -> a; p=a
p -> a; p=a
Trust
Agents
Search engines and filters
Applications
GGF Summer School 24th July 2004, Italy
Semantic Grid layers
MetadataAnnotation
Ontologies
Grid Services
Grid State
Rulesp -> a; p=a
p -> a; p=a
p -> a; p=a
p -> a; p=a
p -> a; p=a
Trust
Agents
Search engines and filters
Applications
GGF Summer School 24th July 2004, Italy
Languages
OWLRDFXML
We are here!
GGF Summer School 24th July 2004, Italy
RDF in a nutshell• Resource Description Framework• W3C candidate recommendation (http://www.w3.org/RDF)• Graphical formalism ( + XML syntax + semantics)
– for representing metadata– for describing the semantics of information in a machine-
accessible way• RDFS extends RDF with “schema vocabulary”, e.g.:
– Class, Property– type, subClassOf, subPropertyOf– range, domain
• Statements are <subject, predicate, object> triples:<Ian,hasColleague,Uli>
• Statements describe properties of resources• A resource is any object that can be pointed to by a URI:• Properties themselves are also resources (URIs)
Ian
Uli
hasColleague
GGF Summer School 24th July 2004, Italy
RDF in a nutshell
• Common model for metadata
• A graph of triples• Query over and link
together• RDQL, repositories,
integration tools, presentation tools
• Jena, Haystack
http://www.w3.org/RDF/
GGF Summer School 24th July 2004, Italy
Connected by concepts
http://www.w3.org/2003/Talks/0521-www-keynote-tbl/slide22-0.htmlTim Berners-Lee, 2003
GGF Summer School 24th July 2004, Italy
W3C Web Ontology language OWL
• The Ontology Language de jour• Continuum of expressivity
– Concepts, roles, individuals, axioms– From simple frames to description
logics– Sound and complete formal
semantics• Supports reasoning to infer classification
– Based on the SHIQ description logic • Eas(ier) to extend and evolve and merge
ontologies• Known in the Bioinformatics world e.g.
OBO• Layered on top of RDF• Tools, tools, tools.
http://www.w3.org/TR/2004/REC-owl-features-20040210/
GGF Summer School 24th July 2004, Italy
Coupling Semantic Web and e-Science/Grid
• Expose the meaning of Grid services, resources and entities by assertions in a common data model … RDF
• Publish and share consensually agreed ontologies so we can share the metadata and add in background knowledge … RDF(S), OWL
• Then we can query, filter, integrate and aggregate the metadata … RDQL
• and reason over it to infer more metadata using rules … DL Reasoning, SWRL
• and attribute trust to the metadata.
GGF Summer School 24th July 2004, Italy
Grid services
Sem
antic
Web
Ser
vice
s
Semantics for the Grid
Grid services for Semantic Web
Plum
bers
DevelopersWeb
Services
Grid SemanticWeb
Semantic Grid
Engineers
Aes
thet
ics
Theoreticians
GGF Summer School 24th July 2004, Italy
Publications• P Lord, C Wroe, R Stevens, CA Goble, S Miles, L Moreau, K Decker, T Payne, J Papay,
Semantic and Personalised Service Discovery in Proceedings IEEE/WIC International Conference on Web Intelligence / Intelligent Agent Technology Workshop on "Knowledge Grid and Grid Intelligence" October 13, 2003, Halifax, Canada.
• J Zhao, CA Goble, M Greenwood, C Wroe, R Stevens Annotating, linking and browsing provenance logs for e-Science in 1st Semantic Web Conference (ISWC2003) Workshop on Retrieval of Scientific Data, Florida, USA, October 2003
• C Wroe, R.D. Stevens, CA Goble, A Roberts, M Greenwood A suite of DAML+OIL ontologies to describe bioinformatics web services and data. International Journal of Cooperative Information Systems. Special issue on Bioinformatics and Biological Data Management 12(2):197-224, 2003.
• C Wroe, CA Goble, M Greenwood, P Lord, S Miles, L Moreau, J Papay, T Payne Experiment automation using semantic data on a bioinformatics Grid, IEEE Intelligent Systems, Jan/Feb 2004
• J Zhao, C Wroe, CA Goble, R Stevens, D Quan, M Greenwood, Using Semantic Web Technologies for Representing e-Science Provenance in Proc 3rd International Semantic Web Conference ISWC2004, Hiroshima, Japan, 9-11 Nov 2004.
• C Wroe, P Lord, S Miles, J Papay, L Moreau, C Goble Recycling Services and Workflows through Discovery and Reuse to appear in Proceedings UK e-Science All Hands Meeting Nottingham, UK, 1-3 September, 2004.
• P Lord, S Bechhofer, M Wilkinson, G Schiltz, D Gessler, C Goble, L Stein, D Hull. Applying semantic web services to bioinformatics: Experiences gained, lessons learnt. in Proc 3rd International Semantic Web Conference ISWC2004, Hiroshima, Japan, 9-11 Nov 2004
• M. Szomszor and L. Moreau Recording and Reasoning Over Data Provenance in Web and Grid Services in International Conference on Ontologies, Databases and Applications of Semantics (ODBASE'03), volume 2888 of Lecture Notes in Computer Science, pages 603-620, Catania, Sicily, Italy, 3-7 November 2003
Top Related