Scientific Lenses over Linked Data An approach to support multiple integrated views
-
Upload
alasdair-gray -
Category
Science
-
view
154 -
download
2
description
Transcript of Scientific Lenses over Linked Data An approach to support multiple integrated views
![Page 1: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/1.jpg)
Scientific Lenses over Linked DataAn approach to support multiple integrated views
Alasdair J G [email protected]
alasdairjggray.co.uk
@gray_alasdair
![Page 2: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/2.jpg)
Open PHACTS Use Case
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
Chemical Properties (Chemspider)
Launched drugs (Drugbank)
Human => Mouse (Homologene)
Protein Families (Enzyme)
Bioactivty Data (ChEMBL)
… other info (Uniprot/Entrez etc.)
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
16 October 2014 Scientific Lenses – A. J. G. Gray 1
![Page 3: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/3.jpg)
Discovery Platform
16 October 2014 Scientific Lenses – A. J. G. Gray 2
Drug Discovery Platform
Apps
Domain API
Interactive
responses
Production quality
integration platform
Method
Calls
![Page 4: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/4.jpg)
App EcosystemAn “App Store”?
http://www.openphactsfoundation.org/apps.html
Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium
MOE Collector Cytophacts Utopia Garfield SciBite
KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna
16 October 2014
![Page 5: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/5.jpg)
API Hits
16 October 2014 Scientific Lenses – A. J. G. Gray 4
April 2013 – March 2014: 15.8m
April 2014 – Sept 2014: 14m
Total: 29.8 million
![Page 6: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/6.jpg)
Linked Data API
16 October 2014 Scientific Lenses – A. J. G. Gray 5
Drug
Disease (1.4)
PathwayTarget
https://dev.openphacts.org/
![Page 7: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/7.jpg)
Source Initial Records Triples Properties
ChEMBL 1,481,473 304,360,749 77
DrugBank 19,628 517,584 74
UniProt 564,246 405,473,138 82
ENZYME 6,187 73,838 2
ChEBI 40,575 1,673,863 2
GeneOntology 38,137 2,447,682 26
GOA 661,232 1,765,622,393 15
ChemSpider 1,361,568 215,193,441 23
ConceptWiki 2,828,966 4,291,131 1
WikiPathways 946 1,949,074 34
Open PHACTS Data
16 October 2014 Scientific Lenses – A. J. G. Gray 6
![Page 8: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/8.jpg)
14 January 2013OPS Dataset Descriptions – A. J.
G. Gray 7
Dataset Descriptions in the Open Pharmacological Space
Being replaced by W3C
HCLS community profile
http://tiny.cc/hcls-datadesc-ed
![Page 9: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/9.jpg)
OPS Discovery Platform
Nanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
Co
re P
latf
orm
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps
![Page 10: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/10.jpg)
Multiple Identities
P12047X31045
GB:29384
16 October 2014 Scientific Lenses – A. J. G. Gray
Andy Law's Third Law
“The number of unique identifiers assigned to an individual is
never less than the number of Institutions involved in the study”http://bioinformatics.roslin.ac.uk/lawslaws/
9
Are these the
same thing?
![Page 11: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/11.jpg)
Gleevec®: Imatinib Mesylate
16 October 2014 Scientific Lenses – A. J. G. Gray 10
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
![Page 12: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/12.jpg)
Gleevec®: Imatinib Mesylate
16 October 2014 Scientific Lenses – A. J. G. Gray 11
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
Are these records the same?
It depends upon your task!
![Page 13: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/13.jpg)
BRCA1: Chromosome 17Breast cancer type 1 susceptibility protein
16 October 2014 Scientific Lenses – A. J. G. Gray 12
http://en.wikipedia.org/wiki/File:Protei
n_BRCA1_PDB_1jm7.pnghttp://en.wikipedia.org/wiki/File:BRCA1
_en.png
Genes == Proteins?
![Page 14: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/14.jpg)
BRCA1: Chromosome 17Breast cancer type 1 susceptibility protein
16 October 2014 Scientific Lenses – A. J. G. Gray 13
http://en.wikipedia.org/wiki/File:Protei
n_BRCA1_PDB_1jm7.pnghttp://en.wikipedia.org/wiki/File:BRCA1
_en.png
Genes == Proteins?
Are these records the same?
It depends upon your task!
![Page 15: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/15.jpg)
Example Use Cases
16 October 2014 Scientific Lenses – A. J. G. Gray 14
I need to perform an
analysis, give me details
of the active compound
in Gleevec.
Which targets are
known to interact
with Gleevec?
![Page 16: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/16.jpg)
Scientific Lenses – A. J. G. Gray 15
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Structure Lens
16 October 2014
I need to perform an analysis, give me
details of the active compound in
Gleevec.
![Page 17: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/17.jpg)
Scientific Lenses – A. J. G. Gray 16
skos:closeMatch
(Drug Name)
skos:closeMatch
(Drug Name)
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Name Lens
16 October 2014
Which targets are known to interact
with Gleevec?
![Page 18: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/18.jpg)
What is a Scientific Lens?
A lens defines a conceptual view over the data
Specifies operational equivalence conditions
Consists of:
Identifier (URI)
Title (dct:title)
Description (dct:description)
Documentation link (dcat:landingPage)
Creator (pav:createdBy)
Timestamp (pav:createdOn)
Equivalence rules (bdb:linksetJustification)
16 October 2014 Scientific Lenses – A. J. G. Gray 17
![Page 19: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/19.jpg)
CHEMBL427526
CHEMBL521CHEMBL175
Lens Effects: Ibuprofen
16 October 2014 Scientific Lenses – A. J. G. Gray 18
Ibuprofen consists of two equally active stereoisomers.
• Stereoisomers not always represented in data
Users wish to retrieve information for any stereoisomer.
![Page 20: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/20.jpg)
Default Lens
16 October 2014 Scientific Lenses – A. J. G. Gray 19
Ibuprofen consists of two equally active stereoisomers.
• Stereoisomers not always represented in data
Users wish to retrieve information for any stereoisomer.
![Page 21: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/21.jpg)
Stereoisomer Lens
16 October 2014 Scientific Lenses – A. J. G. Gray 20
Ibuprofen consists of two equally active stereoisomers.
• Stereoisomers not always represented in data
Users wish to retrieve information for any stereoisomer.
![Page 22: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/22.jpg)
Mapping Generation
16 October 2014 Scientific Lenses – A. J. G. Gray 21
ops:OPS437281
✔
ops:OPS380297
has_stereoundefined_parent[ci:CHEMINF_000456]
ops:OPS380292
is_stereoisomer_of[ci:CHEMINF_000461]
Other relationships
• has part
• is tautomer of
• uncharged counterpart
• isotope
…
![Page 23: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/23.jpg)
Initial Connectivity
16 October 2014 Scientific Lenses – A. J. G. Gray 22
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7
![Page 24: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/24.jpg)
Scientific Lenses – A. J. G. Gray 23
Compound Information
16 October 2014
![Page 25: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/25.jpg)
Proceed with Caution!
16 October 2014 Scientific Lenses – A. J. G. Gray 24
![Page 26: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/26.jpg)
Co-reference Computation
Rules ensure
Unrestricted
transitivity within
conceptual type
Restrict crossing
conceptual types
Based on justifications
Provenance captured
16 October 2014 Scientific Lenses – A. J. G. Gray 25
0..*
0..*
0..*
0..1
0..1
![Page 27: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/27.jpg)
Initial Connectivity
16 October 2014 Scientific Lenses – A. J. G. Gray 26
Datasets 37
Linksets 104
Links 7,096,712
Justification
s
7
![Page 28: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/28.jpg)
Inferred Connectivity
16 October 2014 Scientific Lenses – A. J. G. Gray 27
Datasets 37
Linksets 883
Links 17,383,846
Justifications 7
![Page 29: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/29.jpg)
BridgeDb
16 October 2014 Scientific Lenses – A. J. G. Gray 28
![Page 30: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/30.jpg)
?iri cheminf:logd ?logd .
FILTER (?iri = cw:979b545d-f9a9 ||
?iri = cs:2157 ||
?iri = chembl:1280 ||
?iri = db:db00945 )
GRAPH <http://rdf.chemspider.com> {
}
GRAPH <http://…
cw:979b545d-f9a9 cheminf:logd ?logd .
Identity
Mapping
Service(BridgeDB)
Query
Expander
Service
Profiles
Mappings
Q, L1 Q’
[cw:979b545d-f9a9,
cs:2157,
chembl:1280,
db:db00945]
cw:979b545d-f9a9, L1
cw:979b545d-f9a9 cheminf:logd ?logd .
Lenses: Under the hood
• Can also be achieved through UNION
• IMS call adds overhead
16 October 2014 Scientific Lenses – A. J. G. Gray 29
![Page 31: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/31.jpg)
Experiment
Is it feasible to use a stand-off
mapping service? Base lines (no external call):
“Perfect” URIs
Linked data querying
Expansion approaches (external service
call):
FILTER by Graph
UNION by Graph
C. Y. A. Brenninkmeijer, C. A. Goble, A. J. G. Gray, P. T. Groth, A. Loizou, S.
Pettifer: Including Co-referent URIs in a SPARQL Query. COLD 2013.
http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf
![Page 32: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/32.jpg)
“Perfect” URI Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
chembl_mol:m1280 cheminf:mw ?mw .
}
}
16 October 2014 Scientific Lenses – A. J. G. Gray 31
![Page 33: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/33.jpg)
Linked Data Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
?chemblid cheminf:mw ?mw .
}
cs:2157 skos:exactMatch ?chemblid .
}
16 October 2014 Scientific Lenses – A. J. G. Gray 32
![Page 34: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/34.jpg)
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
16 October 2014 Scientific Lenses – A. J. G. Gray 33
![Page 35: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/35.jpg)
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
16 October 2014 Scientific Lenses – A. J. G. Gray 34
![Page 36: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/36.jpg)
Data:
167,783,592 triples
Mappings:
2,114,584 triples
Lenses:
1
Experiment Data
16 October 2014 Scientific Lenses – A. J. G. Gray 35
![Page 37: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/37.jpg)
Average execution times
![Page 38: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/38.jpg)
Average execution times
0.0
18
![Page 39: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/39.jpg)
Q6: Target Pharmacology
![Page 40: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/40.jpg)
Explorer Screenshot
16 October 2014 Scientific Lenses – A. J. G. Gray 45
![Page 41: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/41.jpg)
Explorer Screenshot
16 October 2014 Scientific Lenses – A. J. G. Gray 46
![Page 42: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/42.jpg)
Conclusions
Scientific data is complex and messy
Requires flexibility in linking
Equivalence depends upon context
Lenses provide support for operation
equivalence
Chemical structures support automatic
computing of links with justification
16 October 2014 Scientific Lenses – A. J. G. Gray 47
![Page 43: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/43.jpg)
Acknowledgements
Royal Society of Chemistry
Colin Batchelor
Karen Karapetyan
Jon Steele
Valery Tkachenko
Antony Williams
University of Manchester
Christian Brenninkmeijer
Ian Dunlop
Carole Goble
Steve Pettifer
Robert Stevens
Swiss Institute for Bioinformatics
Christine Chichester
European Bioinformatics Institute
Mark Davies
Anna Gaulton
John Overington
University of Vienna
Daniela Digles
Maastricht University
Chris Evelo
Andra Waagmeester
Egon Willighagen
VU University of Amsterdam
Paul Groth
Antonis Loizou
Connected Discovery
Lee Harland
16 October 2014 Scientific Lenses – A. J. G. Gray 48
![Page 44: Scientific Lenses over Linked Data An approach to support multiple integrated views](https://reader033.fdocuments.us/reader033/viewer/2022060121/559419791a28ab730d8b4628/html5/thumbnails/44.jpg)
Questions
Alasdair J G [email protected]
alasdairjggray.co.uk
@gray_alasdair
Open [email protected]
openphacts.org
@open_phacts