Predicting Adverse Drug Reactions Using PubChem Screening Data
-
Upload
yannick-pouliot -
Category
Health & Medicine
-
view
217 -
download
2
Transcript of Predicting Adverse Drug Reactions Using PubChem Screening Data
TITLE
Yannick Pouliot (with significant contributions from Annie Chiang)
8/31/2010
It’s Back: Predicting Adverse Drug Reactions Using PubChem Screening Data
Motivation
Short-term: Determine feasibility of predicting specific classes of adverse drug reactions (ADRs) using machine learning and compound screening data
Long-term: Use collection of simple screens to assess likelihood of tissue-specific ADRs
Understanding “BioAssay” Notion
• Usually, BioAssay = collection of activity measurements for compounds screened against a specific target in a cell type at one or more concentrations
• However, scope of BioAssay DB goes beyond compound screening:
▫ Cell-free assays
▫ In vivo assays
What’s a SOC?
• SOC = System of Organ Classes
• A SOC groups “… adverse reaction Preferred Terms pertaining to the same system-organ”.
• Example: SOC C0236104 - “Resistance Mechanism Disorders”
Knowns
• Drugs frequently exhibit a higher frequency of tissue-specific ADRs beyond generic liver and kidney damage.
• Pubchem Bioassays DB offers a large number of assays involving a significant number of protein targets
Hypothesis
H1: Drugs with increased frequency of SOC-specific ADRs can be identified from patterns of reactivity in PubChem BioAssay screens.
Ho: Reactivity patterns in PubChem BioAssay do not distinguish drugs with increased frequency of tissue-specific ADRs .
Data Features
• For a given SOC, matrix of
▫ PRR
▫ drug CUI
▫ BioAssay ID (“AID”)
• Sparse matrix: most compounds have been screened in a few assays only
▫ limited overlap between CVAR and BioAssay
• Very large data sets (more later)
Data
Integration
Analytical
Process
Binarized PRR (PRR>=2 1
else 0)
Selected Statistic:
Proportional Risk Ratio (PRR)
Drug of interest
Other drugs
Event of interest
A B
Other events C D
• PRR = OBS/EXP = [A / (A+C)] / [B / (B+D)]
• Serious ADR Threshold PRR≥2, w/at least
3 cases reported
Results: Max PRR by SOC for Statins
Active Ingredient PRR (no. cases) SOCAtorvastatin 6.2 (958) Musculo-skeletalCerivastatin 10.47 (284) Musculo-skeletalFluvastatin 5.12 (7) Musculo-skeletalLovastatin 4.62 (11) Musculo-skeletalPravastatin 5.13 (104) Musculo-skeletal
Rosuvastatin 7.34 (803) Musculo-skeletalSimvastatin 5.79 (186) Musculo-skeletal
Addressing Zero ADR
• Many drugs do not have a SOC-specific PRR
▫ Unclear if this means they are unusually safe (could be due to e.g. low prescription volume)
▫ Approach: Assign SOC-specific PRR = 0 if at least 10 ADR reports exist overall
Results Since Last Meeting
Properties of CVAR drug ingredients
Number
Ingredients with drug reports in CVAR 2,901
Ingredients with drug reports in CVAR WITH `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction'
2,746
Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction' AND whoart_soc_cui is not null
2,731
Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction' and whoart_soc_cui is not null AND total_number_reports >= 10
1,550
Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction' and whoart_soc_cui is not null and total_number_reports >= 10 AND present in PUBCHEM_BIOASSAY
485
BioAssay Subset Properties
Assays and Drugs in PubChem BioAssay with SOC-identified CVAR drug ingredients and ADR reports >=10
AssayType NumberOfAssays NumberCVARCmpds
confirmatory 545 664
in vivo_screening 81 341
other 93 790
screening 466 629
summary 6 202
Total: 1,191 2,626
Mapping Results
Number
All SIDs 913,742
CVAR drug ingredients mapped to SIDs 7,913
CVAR drug ingredients with SOC-identified ADRs mapped to SIDs 4,382
CVAR drug ingredients with SOC-identified ADRs and >= 10 reports mapped to SIDs 3,136
SOC ID SOCName Avg Model AUC InitCmpds CmpdsRetained
C0236104resistance mechanism disorders
0.92 (0.000593) 468 70
C0221016red blood cell disorders
0.79 (0.000318) 468 185
C0236099reproductive disorders - male
0.77 (0.000167) 468 271
C0027651 neoplasms 0.76 (0.000802) 468 115
C0035204respiratory system disorders
0.76 (0.000465) 468 177
C0027765centr & periph nervous system disorders
0.74 (0.000272) 468 376
C0014130endocrine disorders
0.72 (0.000721) 468 126
C0042790 vision disorders 0.72 (0.000174) 468 286
C0037272skin and appendages disorders
0.7 (0.000196) 468 250
Predictive
Modeling
Results - 1
SOCName Avg Model AUC AID1 AssayType Objective TargetAvg p-value AID1 Avg Coeff AID1
resistance mechanism disorders
0.92 (0.000593) AID119 confirmatorySmall molecule inhibitors of tumor cell
growth in implanted CCRF-CEM leukemia cells in mice
2.95E-004 (1.05E-005)
1.15E+000 (5.08E-003)
red blood cell disorders
0.79 (0.000318) AID330in vivo
screening
Small molecule inhibitors of tumor cell growth in implanted P388 leukemia
CD2F1 (CDF1) tumors in mice
1.15E-004 (1.30E-006)
2.34E-001 (6.33E-004)
reproductive disorders - male
0.77 (0.000167) AID1461 confirmatorySmall molecule inhibitors of
neuropeptide S receptor (NPSR) signaling
G protein-coupled receptor for asthma susceptibility isoform A (NPRS A) [Homo sapiens]
7.49E-008 (1.45E-009)
5.55E-001 (3.87E-004)
neoplasms 0.76 (0.000802) AID543 confirmatorySmall molecules cytoxic to H-4-II-E rat
hepatoma cell line2.16E-005
(8.19E-007)9.39E-001 (2.21E-
003)
respiratory system disorders
0.76 (0.000465) AID774 otherSmall molecule inhibitors of Inhibition of
dnzymes frequently used to reach a NAD/NADH Endpoint
2.14E-003 (3.85E-005)
-9.21E+000 (2.31E-002)
centr & periph nervous system disorders
0.74 (0.000272) AID1672 screening
Small molecule inhibitors of inward-rectifying potassium ion channel Kir2.1
in HEK293 cells (human embryonic kidney)
potassium inwardly-rectifying channel J2 [Mus musculus]
1.70E-007 (5.27E-009)
3.08E-001 (1.70E-004)
endocrine disorders
0.72 (0.000721) AID885 confirmatorySmall molecule inhibitors of cytochrome
P450 3A4 (cell-free)cytochrome P450_ subfamily IIIA-polypeptide 4
[Homo sapiens]1.22E-003
(6.53E-005)8.75E-001 (2.65E-
003)
vision disorders 0.72 (0.000174) AID2553 screeningSmall molecule inhibitors of transient receptor potential cation channel C6
(TRPC6) in HEK293 cells
short transient receptor potential channel 6 [Mus musculus]
5.04E-004 (1.09E-005)
1.97E-001 (2.23E-004)
skin and appendages disorders
0.7 (0.000196) AID781 screeningSmall molecule inhibitors of 14-3-3/Bad
interactions (cell-free)
tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein-zeta
polypeptide [Bos taurus]
6.23E-005 (1.37E-006)
4.95E-001 (4.66E-004)
ROC AUC For C0236104 -
“resistance mechanism disorders”
LOOCV Validation For C0236104 -
“resistance mechanism disorders”
Universe of Data For SOC 0236104
Disorders Associated With SOC C0236104
(“Resistance Mechanism Disorders”)
Allergic conditions
Autoimmune disorders
Immune disorders NEC
Immunodeficiency syndromes
Ancillary infectious topics
Bacterial infectious disorders
Chlamydial infectious disorders
Ectoparasitic disorders
Fungal infectious disorders
Helminthic disorders
Infections - pathogen unspecified
Mycobacterial infectious disorders
Mycoplasmal infectious disorders
Protozoal infectious disorders
Rickettsial infectious disorders
Viral infectious disorders
Indications For Drugs Correlated with Model For SOC
C0236104 (“Resistance Mechanism Disorders”)
Antineoplastic Agents
Anti-Bacterial Agents
Anti-inflammatory Agents
Anticholesteremic Agents
Anti-Inflammatory Agents, Non-Steroidal
Anti-Allergic Agents
Analgesics
Anti-Dyskinesia Agents
Lessons Learned
• Limitation of relational databases sans partitioning
▫ Queries won’t return if >50M rows
• Sneaky MySQL loader
▫ Can fail to load records w/o reporting error
▫ Problem when on can’t easily verify expected number of records from XML files
▫ Solution: Write your own loader (can include data validation)
• BMIR cluster has serious NFS problems
▫ Couldn’t run more than a few parsing jobs at same time
• … and my favorite: The dreaded NCBI surprise!
The Case Of The Missing Atorvastatin
• Problem: Why were some statins missing from my dataset?
▫ E.g.: Atorvastatin
• Answer: It is present, but there is no way to identify it as such
• Example from AID 881 Atorvastatin SID = 29215408
… and no
synonyms!
And Now For Some Test Marketing
Acknowledgements
• Alex and Chirag, for contributing secret R knowledge
• Atul, for being helpfully skeptical and patient
• Alex S for quickly addressing DB issues
• NCBI, for providing DBs and messing up my life
Need To Standardize And Normalize Assay Activity Metrics
Types of activity metrics (substr 1-12)
% Cell Viabi
% cellular A
% CPE Inhibi
% Inhibition
%Activity at
%displacemen
%Efficacy at
%Inhibition
%Response of
Activity at
AF_20uM
AreaNm
AreaoftheNuc
Ave %Efficac
Ave %Inhibit
AverageInteg
AverageInten
AverageSpots
Baseline-Act
Cell-Activit
CellCount
CellsNucInte
Donor-Activi
Fed-Activity
FP-Activity
F_Ratio
GFP-Activity
Mean High
Mean Low
Mean_NC
Mean_PC
MPIPiCm
MPIPiNm
MS % Inhibit
NucleiNucAre
NumberofCell
Parental-Act
PercentagePo
PiNmbyPiCm
Primary % In
Rate-Activit
Ratio-Activi
RatioofSpoti
RFP-Activity
STD Deviatio
Std.Err(Repe
StdDev_NC
StdDev_PC
TIINiNM
TotalCytopla
TotalIntegra
TotalSpotInt
Total_fluore
TSHR-Activit
W460-Activit
W530-Activit
ZScore
ZScore at 10
ZScore at 20