Open PHACTS “Data integration for all”...support of a publicly accessible platform to integrate,...
Transcript of Open PHACTS “Data integration for all”...support of a publicly accessible platform to integrate,...
Open PHACTS
“Data integration for all”
Andrew Leach
Task, workflow and results
AUREUS search targets: voltage-gated potassium channels
Apply filters (MW, cLogP, Lipinski
+ remove undesirable target) ⇒ ~1000 molecules
Similarity searches (RG, TP, Daylight) Cluster analysis
⇒ ~10000 molecules selected
IonWorks© single shot screening
240 single shot hits progressed into full curve assay
5 full curve actives (in at least one test occasion)
Series for lead
optimisation
Stefan Senger, ca. 2004
Task: create a focussed set to
identify leads against voltage-
gated potassium channels
We (may) know where the data is, but integrating is a pain, bespoke, and often only for experts
Q: Identify all oxidoreductase inhibitors with an activity <100nM in both
mouse and human
Q: The current Factor Xa lead series is characterised by substructure X.
Retrieve all bioactivity data in serine protease assays for molecules
that contain substructure X.
Q: For a given interaction profile, give me compounds similar to it.
ChEMBL DrugBank Gene
Ontology Wikipathways
Uniprot
ChemSpider
UMLS
ConceptWiki
ChEBI
etc.
Internal
The Innovative Medicines Initiative
Biggest public-private partnership in
area of medicine
Collaboration between European
Commission and European
Federation of Pharmaceutical
Industries and Associations (EFPIA)
Promotion of medical innovation in
Europe
Tackle key bottlenecks
Recognises “in kind” contributions
Focus on key problems
– Efficacy, Safety, Education &
Training, Knowledge
Management
Public Domain Drug Discovery Data Pharma are accessing, processing, storing & re-processing
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
Why repeat at each company?
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
GSK
AZ
Pfizer
Merck
Information Tombs
– Built for primary use-case
– Tailored indexes
– Tailored GUIs
– Unique language & metadata
– Poor interoperability/integration
Literature HR Synthesis Portfolio SAR Docs Safety In vivo Etc
Pfizer Limited – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for Bioinformatics
BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona
Leiden University Medical Centre
Royal Society of Chemistry
Vrije Universiteit Amsterdam
Spanish National Cancer Research Centre
University of Manchester
Maastricht University
Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität Bonn
AstraZeneca
GlaxoSmithKline
Esteve
Novartis
Merck Serono
H. Lundbeck A/S
Eli Lilly
Netherlands Bioinformatics Centre
Swiss Institute of Bioinformatics
ConnectedDiscovery
EMBL-European Bioinformatics Institute
Janssen
OpenLink
Project Partners
A use-case driven approach, focussed on delivery
for the real world
Main architecture, technical implementation and primary
capabilities driven by a set of prioritised research questions
Based on the main research questions define prioritised data
sources
Develop three Exemplars to demonstrate the capabilites of
the Open PHACTS System and to define interfaces and
input/output standards
Work Streams
Build: Service layer and resource integration
Drive: Development of exemplar work packages & Applications
Sustain: Community engagement and long-term sustainability
Assertion & Meta Data Mgmt
Transform / Translate
Integrator
OPS Service Layer
Corpus 1
‘Consumer’
Firewall
Supplier
Firewall
Db 2
Db 3
Db 4
Corpus 5
Std Public
Vocabularies
Target
DossierCompound
Dossier
Pharmacological
Networks
Business
Rules
Work Stream 1: Open Pharmacological Space (OPS) Service Layer
Standardised software layer to allow public DD resource integration− Define standards and construct OPS service layer− Develop interface (API) for data access, integration
and analysis− Develop secure access models
Existing Drug Discovery (DD) Resource Integration
Work Stream 2: Exemplar Drug Discovery Informatics tools
Develop exemplar services to test OPS Service Layer
Target Dossier (Data Integration)
Pharmacological Network Navigator (Data Visualisation)
Compound Dossier (Data Analysis)
Platform Explorer
Standards
Apps
API
Number sum Nr of 1 Question
15 12 9 All oxido,reductase inhibitors active <100nM in both human and mouse
18 14 8
Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?
24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives.
32 13 8 For a given interaction profile, give me compounds similar to it.
37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X.
38 13 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not).
41 13 8
A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature.
44 13 8 Give me all active compounds on a given target with the relevant assay data
46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)
59 14 8 Identify all known protein-protein interaction inhibitors
Prioritised research questions
Kamal Azzaoui et al, DDT in press 2013
`
Pathways
Pharmacological
Activities
Biological
Processes
Transcripts
Pathological
Processes
Diseases
Genes
Proteins
Interactions
Clinical Drug
Applications
Indications
Drugs
Compounds
Chemicals
Open PHACTS will be built upon semantic technologies and
standards, providing an opportunity to:
• Demonstrate that semantic technologies can perform to the same degree
as existing systems
• Provide an open platform to address common drug discovery questions;
expose pharma’s use-cases and knowledge
• Create a pre-competitive infrastructure that can be sustained and
expanded into new areas; providing the platform for future collaboration
Why Semantic Technologies?
• Rapidly developing technology, powerful algorithms for integration and
querying of data
• “schema free”
• Open standards – facilitating sharing public, private, commercial
• A community of developers, leverage work going on elsewhere
User Interfaces & Applications
Linked Data API
Linked Data Cache
Identity
Mapping
Service
Identity
Resolution
Service
Domain
Specific
Services
Data Key architecture components
Nanopub
Db
VoID
Data Cache (Triple Store)
Semantic Workflow Engine
(LARKC)
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Open PHACTS
Explorer 1st Gen Apps
App
Framework
Identity
Resolution
Service
(ConceptWiki)
Chemistry
Normalisation
& Q/C
ChemSpider
Identifier
Management
Service
(BridgeDb+)
Partner Apps
Data
Import
Co
re P
latf
orm
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
Oct. 2012
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public
Ontologies
User
Annotations
Building Quality
High quality chemical names and synonyms. Leverage ChemSpider and
Concept wiki curation, Q/C and mapping
ChemSpider Validation and Standardization Platform (CVSP) for flagging
chemical representation issues
Basic curation interface for editing concept terms available through Concept
Wiki
Data quality issues detected in data sources reported back to depositors for
their evaluation
STANDARD_TYPE UNIT_COUNT
---------------- -------
AC50 7
Activity 421
EC50 39
IC50 46
ID50 42
Ki 23
Log IC50 4
Log Ki 7
Potency 11
log IC50 0
STANDARD_TYPE STANDARD_UNITS COUNT(*)
------------------ ------------------ --------
IC50 nM 829448
IC50 ug.mL-1 41000
IC50 38521
IC50 ug/ml 2038
IC50 ug ml-1 509
IC50 mg kg-1 295
IC50 molar ratio 178
IC50 ug 117
IC50 % 113
IC50 uM well-1 52
IC50 p.p.m. 51
IC50 ppm 36
IC50 uM-1 25
IC50 nM kg-1 25
IC50 milliequivalent 22
IC50 kJ m-2 20
~ 100 units
>5000 types Implemented using the Quantities, Dimension, Units, Types
Ontology (http://www.qudt.org/)
Quantitative Data Challenges
Chemistry within Open PHACTS
The challenges associated with handling chemistry data require the
support of a publicly accessible platform to integrate, standardise and
host the data.
ChemSpider, an online database from the Royal Society of Chemistry
hosts the chemical compound collection underpinning Open PHACTS
and is responsible for standardising the chemical compounds and
providing both regular updates and ongoing data curation.
To serve the Open PHACTS platform, a structure validation and
standardisation platform (CVSP) has been developed to ensure
chemical structures are normalised to rules derived from the FDA
structure standardisation guidelines and modified based on input from the
EFPIA members.
The many challenges of chemistry representation…
Identities within Open PHACTS
Open PHACTS integrates information
from multiple different databases, many
of which use unique identifiers. The
Identity Mapping Service (IMS)
ensures these identifiers are linked and
available for use interchangeably
throughout the Open PHACTS platform.
To maintain vocabulary heterogeneity
and provide interoperability, the
ConceptWiki is used. The ConceptWiki
is an open access system that accepts
essentially unlimited numbers of
synonyms, in multiple languages, and
then maps all the terms correctly back to
one unique concept identifier, alleviating
vocabulary problems and identifier
differences.
Synonyms:
Aspirin
Dispril
2-Acetoxybenzoic acid
Acetyl salicylic acid
Salicylic acid, acetyl-
ChemSpider ID: 2157
Explorer
FDA: 16030
ChEBI ID: CHEBI:15365
DrugBank ID: APRD00264
IMS
Why Provenance Matters
Using a community specification known as
“VoID” (Vocabulary of Interlinked Datasets)
Record version, author, derivations
Builds trust with users – know what you are
querying (and why it might have changed)
Provides mechanism to provide usage
statistics back to providers, help them
understand the value
Easier to track errors and ensure quality
Actively participating in community
provenance programme (W3C)
What does Open PHACTS do?
Currently integrated
databases
Database
Number of
triples
(million)
ACD Labs /
ChemSpider 161.34
ChEBI 0.91
ChEMBL_v13 146.08
ConceptWiki 3.74
DrugBank 0.52
Enzyme 0.07
Gene Ontology 0.85
SwissProt 156.57
WikiPathways 0.14
TOTAL 470.21
Open PHACTS draws together
multiple sources of publicly-
available pharmacological and
chemical data, allowing public
access to the information via the
Open PHACTS Explorer, an
intuitive interface.
Licensing: 3 “public” databases
Comparative Toxicogenomics Database
OMIM
Drugbank
All are available as “open” RDF you can download right now. But:
“CUTTING THE GORDIAN KNOT”
What are the problems with licensing we had to address?
– To make the data and software generated by the project usable and reusable
– Multiplicity of unclear or non-standard licenses on original data sources
• ‘Public’ can mean use but not redistribute, use in commercial environment,
• Legal position on use and reuse extremely unclear
• Different issues than just linking to data
– What is the legal status of integrated collections of the above, and of derived knowledge from
such a collection?
– Appropriate software license selection
– Legal clarity for EFPIA and end users
– Approaches for commercial data integration, EFPIA in-house data
AIM: to enable maximum possible dissemination and usability of the integrated data and
architecture generated by the project - with approaches that will be applicable in other
data integration projects
Chose John Wilbanks as consultant
A framework built around STANDARD well-understood
Creative Commons licences – and how they interoperate
Deal with the problems by:
Interoperable licences
Appropriate terms
Declare expectations to users and
data publishers
One size won‘t fit all requirements
Data Licensing Solution
Development partnerships Influence on API developments
Opportunities to demo ideas & use cases to core team
Need MoU and annexe
Associated partners Support, information
Exchange of ideas, data, technology
Opportunities to demo at ctions, mostommunity
webinars
Need MoU
Associated partners
Development partnerships
Consortium
MoU
+Annexe
Consortium 28 current members
Open PHACTS and the
scientific community
Example applications
Advanced analytics
ChemBioNavigator Navigating at the interface of chemical and
biological data with sorting and plotting options
TargetDossier Interconnecting Open PHACTS with multiple
target centric services. Exploring target
similarity using diverse criteria
PharmaTrek Interactive Polypharmacology space of
experimental annotations
UTOPIA Semantic enrichment of scientific PDFs
Predictions
GARFIELD Prediction of target pharmacology based on the
Similar Ensemble Approach
eTOX collector Automatic extraction of data for building
predictive toxicology models in eTOX project
ChemBioNavigator
Matthias Rarey et al
PharmaTrek
Jordi Mestres et al
Call for expressions of interest
Open PHACTS ENSO proposal
Open PHACTS intends to submit a
proposal for IMI ENSO funding.
We are currently drafting our ENSO
proposal and invite all EFPIA
companies with an interest in Open
PHACTS to contact us to discuss
opportunities for involvement.
The Open PHACTS Foundation
Open PHACTS has a successor
organisation, the Open PHACTS
Foundation.
Please register your interest with us
for further information on membership
and other opportunities to get involved
within Open PHACTS.
For more information and/or to register interest email us at
Acknowledgements
Stefan Senger
Gerhard Ecker
The OpenPHACTS consortium
Data
Targets; Chemistry; Pharmacology; Literature; Patents
Standards Ontology/taxonomy;
Minimum information guide; Dictionaries; Interchange mapping
Assertions
e.g. Gene-to-Disease; Compound-to-Target;
Compound-to-ADR
Application (Knowledge)
Fact Visualisation e.g. Target Dossiers;
SAR Visualisation
SERVICES
After Barnes et al Nature Review Drug Discovery 2009 doi10.1038/nrd2944
Nanopublications – Capturing scientific information in
the Triple Store