Post on 17-Aug-2020
Nexus 2.2: Overview
Dr Nicholas Marchetti – Product Manager Dr Adrian Fowkes – Senior Scientist
nik.marchetti@lhasalimited.org adrian.fowkes@lhasalimited.org
Derek Nexus 6.0 Update
Dr Nicholas Marchetti
Product Manager
nik.marchetti@lhasalimited.org
New Features and Enhancements: Derek Nexus 6.0
• An update to the current knowledge base, including additionof new alerts.• Endpoints of focus have been Mutagenicity in-vivo,
Mutagenicity in-vitro, Carcinogenicity, Chromosome damage,Skin sensitisation
• Negative Predictions:• Now available for Skin Sensitisation endpoint• References for nearest neighbours are now included in
this feature
• New endpoints of Glucocorticoid Receptor Agonism andAndrogen Receptor Modulation have been included.
NEGATIVE PREDICTIONS
Why do we need negative predictions?
• Previously, a lack of alerts firing would always lead to this:
• For endpoints that are well developed, we wanted toprovide a stronger prediction and provide moreinformation for expert review:
2013 2014 2015 2016 2017 2018
Negative Predictions Workflow in Derek
Query Compound
Match alert or example
Compare structural features to publicly
available data
Prediction for/against toxicity
Make negativepredictions, provide
information for expert assessment
Yes
NoDerek KB 2018.1
Confidence in Negative Predictions
• Bacterial Mutagenicity in-vitro and Skin Sensitisation willnot provide a “Nothing to Report”
• In the absence of an alert, we compare the structure toan external dataset, to assess if there could be any othercause for concern:
• No misclassified or unclassified features• Contains misclassified features• Contains unclassified features
Highly confident negative prediction
Slightly lower confidence – some features may be a cause for concern
References
• We are now providing references for nearest neighbourscontaining misclassified features – for both mutagenicityand skin sensitisation.
Skin sensitisation: Lhasa Skin Sensitisation Negative Prediction Dataset
• A fragment library is generated from an internal skinsensitisation dataset of > 2500 chemicals
• Dataset consists of human, mouse and guinea pig data• Overall experimental call is derived using a hierarchical
approach
Human Standard animal Non-standard animal Other animal
Non-standard LLNANon-radioactive LLNABuehler (closed patch) testFreund’s complete adjuvant testFreund’s complete adjuvant test (modified)Split adjuvant testSingle injection adjuvant testSingle injection adjuvant test (modified)Maurer optimisation testOptimisation testOpen epicutaneous testClosed epicutaneous test
LLNA (OECD Guidelines)GPMT (OECD Guidelines)
BgVVBasketter
Draize testDraize test (altered)Mouse ear swelling testpositive data only
Hierarchy of Skin Sensitisation Assays in dataset
Chemical of interest
Is there any human data?
Is there any standard
assay data?
Is there positive data from other assays?
Is there any non-standard assay data?
Call assigned as human result
Conservative call between standard
assay results
Conservative call between non-standard
assay results
Call assigned as positive No call assigned
Yes
Yes
Yes
No
No
No
No
No misclassified or unclassified features
• This type of prediction is given for compounds where allfeatures in the molecule are found in accuratelyclassified compounds from the data set.
Query Compound Search datasets
Non-sensitiserDerek KB 2018.1
Contains misclassified features
• Misclassified features are those that have been derived from a non-alerting positive compound in the data set
• To get this type of prediction, your query compound has a feature in common with a non-alerting positive compound
• A non-alerting positive compound is experimentally positive in a particular assay (e.g. Ames), but is not covered by an alert in the Derek Knowledgebase. Derek does not have a mechanistic explanation as to why the data set example is positive, therefore expert assessment is required
Misclassified features: Skin Sensitisation Workflow
Query Compound
Search Lhasa Skin Sensitisation
Negative Prediction Dataset
Highlighted feature found in Sensitiser
No Alert Fired
Derek KB 2018.1
Expert review: Is shared feature the cause of sensitisation?
Non-sensitiser (contains misclassified features)
Contains unclassified features
• Unclassified features are those that have not been found in the data set.
• To get this type of prediction your query compound has fired no alerts, but does contain a structural fragment that is not covered in the respective data sets. Although Derek has found no mechanistic reason for this compound to be positive, the unclassified feature could be of concern, therefore expert assessment is required
Unclassified features: Skin Sensitisation Workflow
Query Compound
Search Lhasa Skin Sensitisation
Negative Prediction Dataset
UnclassifiedFeatures
Features not found
Derek KB 2018.1
No Alert Fired
Negative Predictions Performance – Skin Sensitisation – 5 fold cross validation
• Misclassified and unclassified features occur infrequently and represent
areas of increased uncertainty, which may require further scrutiny
How often are non-sensitisers correctly predicted?
73.9%77.3 %
52.1 %
65.7%
0
10
20
30
40
50
60
70
80
90
100
Non-alerting compound Non-sensitiser Non-sensitiser withmisclassified features
Non-sensitiser withunclassified features
Neg
ativ
e pr
edic
tivity
(%)
Derek ‘no alert’ Derek with skin negative predictions
80%
8%
12%
How often does each type of negative prediction occur?
Non-sensitiser
Non-sensitiser with misclassified features
Non-sensitiser with unclassified features
prevalence = 51%
SKIN SENSITISATION
Derek KB 2018.1 – What’s new?
• 10 new Skin Sensitisation alerts• 7 of which were built using member donated data
• 12 alerts modified• e.g. Expanding/refining the scope of alerts
Structural alerts – Performance
• Ongoing development of the skin sensitisation endpoint:
• Alert performance has improved over recent years, due to:• Continued analysis of the available public data• Extraction of (anonymous) knowledge from proprietary data
shared by members e.g. Bristol-Myers Squibb
Derek KB Acc Se Sp PP NP No. of alertsDerek 2014* 72 73 71 72 73 73Derek 2015* 74 78 70 72 76 80Derek 2018* 75 79 72 73 77 90
*Analysis based on an in-house dataset of 1267 sensitisers and 1282 non–sensitisers based on
conservative combination of results from the LLNA and/or guinea pig assays.
Acc = Accuracy, Se = Sensitivity, Sp = Specificity, PP = Positive Predictivity, NP = Negative Predictivity
Alert example
Alert 444 - Imine or alpha-beta unsaturated imine
• Alert was made more specific by:1. Narrowing the scope to exclude ketimines/tertiary imines…2. …but still include alpha, beta-unsaturated imines, which
can react through Michael addition• This reduced the number of false positives by 83% when
tested against the members’ data, and by 86% when tested against public data
1. No alert fires 2. Alert fires
GENOTOXICITY
Derek KB 2018.1 - What’s new?
• 18 Mutagenicity in-vitro alerts modified• e.g. Expanding/refining the scope of alerts
• 12 new Mutagenicity in-vitro alerts• 10 of which were built using member donated data
• 6 Chromosome damage alerts modified• 4 of which were modified using member donated data
• Extended 12 Mutagenicity in-vitro alerts to also apply to Mutagenicity in-vivo• Using newly publicly available transgenic rodent mutation
assay data
Alert example
Alert 746: Arylboronic acid or derivative
• Alert was made more specific by:1. Narrowing the scope to exclude aryl boronic acids with bulky
para substituents2. Alert will also no longer fire if there is a fused non-aromatic
ring at the para position• This reduced the number of false positives by 27% when
tested against members’ data, and by 6% when tested against public data
No alert fires
CARCINOGENICITY
Derek KB 2018.1 - What’s new?
• 2 new carcinogenicity alerts• Aniline or precursor
• This alert was validated against 3 public datasets, giving an average (mean) positive predictivity of 91%
• Uracil, thymine or precursor• This alert was validated against 3 public datasets, giving an
average (mean) positive predictivity of 100%
REPRODUCTIVE TOXICITY
Derek KB 2018.1 - What’s new?
• 1 new Teratogenicity alert • 17-Hydroxyprogesterone derivative
• 2 new endpoints relating to Teratogenicity• Glucocorticoid receptor agonism• Androgen receptor modulation
• These molecular initiation event based endpoints were designed to provide better coverage for the Teratogenicity model
Meteor Nexus 3.0 Update
Dr Nicholas Marchetti
Product Manager
nik.marchetti@lhasalimited.org
What’s new in Meteor Nexus 3.1.0?
• New Biotransformation • 566 – Hydrolysis of Cyclic Peptides
• Modified Biotransformations• 563 – N-Glucuronidation of Amides and Related Compounds• 41 – Conjugation of Hydrazines, Hydrazides and Related Compounds
with Pyruvic Acid• Biotransformations 41 and 42 have been merged
• 43 – Conjugation of Hydrazines, Hydrazides and Related Compounds with alpha-Ketoglutaric Acid
• Biotransformations 43 and 44 have been merged• 245 – Oxidative N-Dealkylation• 371 – Epoxidation of 1,1-Disubstituted Haloalkanes
Metabolism Dataset
• Metabolic data collected from the following journals• Drug Metabolism and Disposition• Xenobiotica• Biochemical Pharmacology• Journal of Pharmacology and Experimental Therapeutics• Chemical Research in Toxicology• Journal of Medicinal Chemistry• Journal of Agriculture and Food Chemistry
• 2,608 papers• 18,379 reactions
Sarah Nexus 3.0 Update
Senior Scientist
adrian.fowkes@lhasalimited.org
Dr Adrian Fowkes
Sarah Nexus 3.0
1. Improvements to predictions
a) Structure standardisation
b) Sarah Nexus training set
2. Improvements to interpretability
a) Additional information for example compounds
b) Strain profile information
c) Additional compounds for analysis
3. Improvements to model building
a) Additional curation options
Structure Standardisation
Additional structure standardisation rules have been implemented into Sarah Nexus
Structure Standardisation
• Structure standardisation is beneficial for two main reasons
1. Appropriate curation of structures to ensure the activity of compounds is accurately reflected during model building • 21% compounds with CAS numbers in the training set have at
least 2 structure representations before any standardisation
2. To ensure that whatever way a query structure is drawn by the user the same prediction is produced
Sarah Nexus Training Set
• Larger training set due to amount of data donated by member organisations to improve performance and curation of the public literature
Data Source Conflicted Negative Positive Equivocal UnreliableAcid Halide Mutagenicity Dataset 5 31 3Bursi Mutagenicity Dataset 1925 2389CGX Mutagenicity Dataset 1 329 354 15Derek Nexus Example Compounds 11 317FDA CFSAN Mutagenicity Dataset 7 4052 4238 6Feng Mutagenicity Dataset 2 956 892Hansen Mutagenicity Dataset 2 2976 3477Helma Mutagenicity Dataset 343 341ISSSTY Mutagenicity Dataset 5 3216 3316 161Marketed Pharmaceuticals Dataset 483 39 20Member Data 1 47 24Vitic Nexus NTP Table 3 1220 697 69Vitic Nexus Summary Call Table 875 3185 2905 19 57
Sarah Model 2.0 Training Set 864 5166 4716 65 40
Previous Training Set 4879 4628
https://www.lhasalimited.org/publications/improving-chemical-space-coverage-of-an-in-silico-prediction-system-by-targeted-inclusion-of-fragments-absent-from-the-training-set/4465
Sarah Nexus - Validation
0
10
20
30
40
50
60
70
80
90
BA Sens Spec Cov
%
Performance Metric
Sarah Nexus vs proprietary data
Sarah Model1.0.1
SarahModel
2.0
Sarah Nexus 3.0
1. Improvements to predictions
2. Improvements to interpretability
a) Additional information for example compounds
b) Strain profile information
c) Additional compounds for analysis
Additional Information for Compounds
Hypotheses
Training set examples
Additional Information for CompoundsToggle between published and standardised structure
Examine data source and follow up references
Strain Profile Information
• There is lots of detailed strain data available for compounds in the Sarah Nexus training set
• Sarah Nexus allows users to explore supporting Ames strain profiles for both hypotheses and individual structures
• Reduce uncertainty• Better decision making
Additional Information for Compounds
Addition Of Detailed Strain DataOverall strain data for the
hypothesis
Strain data for the individual example
Additional Information for Compounds
Additional Compounds For Review
Compounds whose activity was not resolved for inclusion into the training set are now available for review in the Nexus interface.
Compounds in this panel have access to the new features in Nexus. For example, viewing strain profiles and references.
Sarah Nexus 3.0
1. Improvements to predictions
2. Improvements to interpretability
3. Improvements to model building
a) Additional curation options
Model Building
• Sarah Nexus allows for the creation of new models using its SOHN methodology• Supplement the Sarah Nexus training set with additional data• Build models from new data sets
• New options have been implemented into the model building workflow to support the model building process
Model Building
Add meta data for new compounds which can be viewed in the compound information panel during the review of predictions
Model Building
Structure standardisation rules developed by Lhasa Limited can be applied to imported datasets
Model Building
Dataset curation is further supported by options to handle the experimental activities present in the model dataset and the imported dataset
Conclusions
• The new features implemented in Sarah Nexus 3.0 further support its use as a statistical system for ICH M7
• Expansion of the training set to support Sarah Model 2.0
• Improved structure standardisation rules to improve consistent representation of structures and their experimental activity
• Increased interpretability to support expert review• Strain information• Additional compounds for review• Compound meta data
Lhasa Limited
Granary Wharf House, 2 Canal Wharf
Leeds, LS11 5PS
Registered Charity (290866)
Company Registration Number 01765239
+44(0)113 394 6020
info@lhasalimited.org
www.lhasalimited.org
Questions?