Data mining methodologies for pharmacovigilance

35
Data Mining Methodologies for Pharmacovigilance ABDELFATTAH AL ZAQQA SCHOOL OF COMPUTER SCIENCE PRINCESS SUMAYA UNIVERSITY FOR TECHNOLOGY 1 Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

description

Medicines: is the applied science or practice of the diagnosis, treatment, and prevention of disease. Bad effects called Adverse Drug Reactions (ADRs) , it differs from side effects.

Transcript of Data mining methodologies for pharmacovigilance

Page 1: Data mining methodologies for pharmacovigilance

Data Mining

Methodologies for

PharmacovigilanceABDELFATTAH AL ZAQQA

SCHOOL OF COMPUTER SCIENCE

PRINCESS SUMAYA UNIVERSITY FOR TECHNOLOGY

1A

bd

elfa

ttah

Al Z

aq

qa

, PSU

T-Am

ma

n-J

ord

an

Page 2: Data mining methodologies for pharmacovigilance

Agenda

Introduction

Examples

Some facts of ADRs and drugs.

Pharmacovigilance

Phv methodologies

Data mining

Computational methodology-Pre-Marketing

Computational methodology-Post Marketing

Future perspectives

2

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 3: Data mining methodologies for pharmacovigilance

Introduction

Medicines: is the applied science or

practice of the diagnosis, treatment, and

prevention of disease.

Most medicines have both good and bad

effects.

Bad effects called Adverse Drug Reactions

(ADRs) , it differs from side effects.

Side effects whether therapeutic or adverse ADRs cause over 700,000

emergency department visits

each year in the United States

3

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 4: Data mining methodologies for pharmacovigilance

Example of ADRs and side effects

reduce your headache or fever

reduce the ability of your blood to clot

× bleeding of intestine

• Desired and undesired effects of an aspirin therapy

4

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 5: Data mining methodologies for pharmacovigilance

Facts

New drug may takes 10 years and

billions of dollars.

ADRs may led to withdrawals drug.

Drug interactions may also increase

the risk of ADRs

ADRs may cause over 100,000 deaths among hospitalized

patients each year.

ADRs is the fourth largest cause of

death in US

136 $ billion annual cost in US from

ADRs.

5

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 6: Data mining methodologies for pharmacovigilance

Pharmacovigilance (PhV)

Pharmacovigilance (PhV) is the science that concerns with the detection, assessment, understanding and prevention of ADRs

Pharmacovigilance (PhV)=drug safety surveillance

Surveillance for premarketing (i.e. Data from preclinical & clinical trials) and post-marketing(i.e. throughout a drug’s market life)

6

Phv trend to link the Preclinical human safety with information from post marketing.

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 7: Data mining methodologies for pharmacovigilance

Phv methodologies

Phv historically relied on biological

experiments or manual review of case

report

7

In vitro Safety Pharmacology

Profiling (SPP) is one of the

fundamental method for preclinical;

by testing compounds with

biochemical and cellular assays.

SPP still not efficient (cost and time)

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 8: Data mining methodologies for pharmacovigilance

Computational methodologies for PhV

Vast quantities and complexity of

data to be analyzed

Computational methods at both pre-

marketing and post-marketing stages

are more efficient in time and cost (i.e.

can accurately detect ADRs in a

timely fashion)

SPP still not efficient (cost and time)

Datasets are available

EMA and NCA are example of

specialized companies that maintain

and develop database of ADRs

8

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 9: Data mining methodologies for pharmacovigilance

What is Data mining ?!

Data mining the process of extracting previously unknown, valid and

actionable information from large information sources or databases

So what we will need to do this process?!

project goals: detection and prevention of ADRs

dataset acquisition: Available

data cleaning and preprocessing: organize the raw data obtained

data mining: extract useful information

data interpretation: Analysis of data

utilization: the act of using

9

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 10: Data mining methodologies for pharmacovigilance

Computational methodology-Pre-Marketing

Most of existing research devoted to develop computational methods.

These research can be categorized into

I. protein target-based.

II. chemical structure-based approaches.

III. integrative approach.

10

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 11: Data mining methodologies for pharmacovigilance

Computational methodology-Pre-Marketing-Protein

target-based

Drugs typically work by activating or inhibiting the function of

a protein, which in turn results in therapeutic benefits to a

patient.

drugs with similar in vitro protein binding profiles tend to similar

side-effects, Fliri et al.

Fukuzaki et al, proposed a method to predict ADRs using sub-

pathways “cooperative pathways” (pathways that function

together).

They developed an algorithm called CoopeRativE Pathway

Enumerator (CREPE) to select combinations of sub-pathways

it depends on the availability of gene-expression data

observed under identical conditions.

11

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 12: Data mining methodologies for pharmacovigilance

CoopeRativE Pathway Enumerator

(CREPE)

12

V vertex, I itemset (activation conditions)

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 13: Data mining methodologies for pharmacovigilance

Computational methodology-Pre-

Marketing-Protein target-based

More recently, Brouwers et al proposed that the side

effect similarity of drugs could be attributed to their target

proteins being close in a molecular network.

They proposed a pathway neighborhood measure to

assess the closest distance of drug pairs according to their

target proteins in the human protein protein interaction

network and found network neighborhoods to only

account for 5.8% of the side-effect similarities compared

to 64% by shared drug targets.

13

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 14: Data mining methodologies for pharmacovigilance

Computational methodology-Pre-

Marketing-Protein target-based

Pouliot et al. applied logistic regression (LR) models.

To identify potential ADRs manifesting in 19 specific

system organ classes (SOCs), as defined by the Medical

Dictionary for Regulatory Activities ,across 485

compounds in 508 BioAssays in the PubChem database.

The models were evaluated using leave-one-out-cross-

validation. The mean AUCs (area under the receiver

operating characteristic curve) ranged from 0.60 to 0.92

across different SOCs.

14

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 15: Data mining methodologies for pharmacovigilance

Chemical Structure-based Approach-premarketing

It attempts to link ADRs to their chemical structure.

Bender et al, explore the correlation but the positive predictive was quit low under 0.5. but at least he proved the concept.

Hammann et al, employed decision tree to determine the chemical, physical, and structural properties of compounds that predispose them to causing ADRs

Hammann focused on ADRs in centerla nervous system (CNS),liver, and kidney.

Hammann decision tree model positive predictive accuracies ranging from 78.9% to 90.2%.

15

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 16: Data mining methodologies for pharmacovigilance

Chemical Structure-based Approach-premarketing

Pauwels et al. developed a sparse canonical correlation analysis (SCCA)

method to predict high-dimensional side-effect profiles of drug molecules

based on the chemical structures.

They predict 1385 side effects in the SIDER DB from chemical structures of

888 approved drugs.

Pauwels et al best resulting AUC(area under curve) was between 0.6088

and 0.8932

16

• SCCA examines the

relationships of many variables of different

types simultaneously

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 17: Data mining methodologies for pharmacovigilance

Integrative Approach- premarketing

Huang et al. proposed a new computational framework to predict ADRs

by integrating systems biology data that include protein targets, protein-

protein interaction network, gene ontology (GO) annotation ,and

reported side effects. They predict heart-related ADRs (i.e. cardio toxicity),

which resulted in the highest AUC of 0.771.

Recently, Liu et al. investigated the use of phenotypic information,

together with chemical and biological properties of drugs, to predict

ADRs. using five machine learning algorithms: LR, Naïve Bayes (NB),

KNearest Neighbor (KNN), Random Forest (RF), and SVM.

17

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 18: Data mining methodologies for pharmacovigilance

Integrative Approach

integration of chemical, biological, and phenotypic properties

outperforms the chemical structured-based method (from 0.9054 to 0.9524

with SVM) and has the potential to detect clinically important ADRs at

both preclinical and post-market phases for drug surveillance.

18

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 19: Data mining methodologies for pharmacovigilance

Post Marketing

many ADRs may still be missed

because the clinical trials are often

small, short, and biased by excluding

patients with comorbid diseases.

do not mirror actual clinical use

situations for diverse populations

(e.g. inpatient)

thus it is important to continue the

surveillance postmarket.

19

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 20: Data mining methodologies for pharmacovigilance

Computational methodology-Post

marketing-Data sources

Spontaneous reporting systems (SRSs) is the

core data-collection system for post-

marketing drug surveillance since 1960. US

FDA and the VigiBase maintain such as these

report.

World Health Organization (WHO) manage

these SRSs.

20

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 21: Data mining methodologies for pharmacovigilance

Post marketing-Spontaneous Reports

Disproportionality Analysis (DPA) involves frequency analyses of 2x2

contingency tables to quantify the degree to which a drug and ADR co-

occurs “disproportionally” compared with what would be expected if

there were no association

ADR No ADR Total

Drug a b N=a+b

No Drug c d c+d

Total M=a+c B+d T=a+b+c+d

21

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 22: Data mining methodologies for pharmacovigilance

Post marketing-Spontaneous Reports

Many approaches are applied the straightforward method is the

calculation of frequentist metrics

• Definitions of the frequentist measures of association

Association Measures Definition

Relative Reporting Ratio (RRR) (t * a) / (m * n)

Proportional Reporting Ratio (PRR) (a * (t – n)) / (c * n)

Reporting Odds Ratio (ROR) (a * d) / (c * b)

22

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 23: Data mining methodologies for pharmacovigilance

Post marketing-Spontaneous Reports

Other algorithms were also developed but they are more complex, such

as gamma-Poisson shrinker (GPS) and the multi-item gamma-Poisson

shrinker (MGPS)

DPA methods are effective in detecting single Drug-ADR associations

23

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 24: Data mining methodologies for pharmacovigilance

Data Mining Algorithms

DPA methods are effective in detecting single Drug-ADR associations

Data mining for multi-item ADR associations.

Harpaz et al identified 1167 multi-item ADR associations Using a set of

162,744 reports submitted to the FDA in 2008, 67% were validated by a

domain expert

Tatonetti et al applied the bi clustering algorithm to identify drug groups

that share a common set of ADRs in SRS data.

They discovered ADRs between drugs that couldn’t be discovered using

DPA method.(e.g pravastatin and paroxetine had effect on blood

glucose)

24

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 25: Data mining methodologies for pharmacovigilance

Post marketing -Electronic Medical

Records

Electronic Medical Records :is a computerized medical record created in

an organization that delivers care, EMRs contain not only detailed patient

information but also copious longitudinal clinical data.

EMR databases consist of data in two types formats:

(1) structured (e.g., laboratory data)

Several groups have employed computational methods on

structured or coded data in EMRs to identify specific ADR signals

(2) unstructured (narrative clinical notes).

25

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 26: Data mining methodologies for pharmacovigilance

Structured & unstructured26

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 27: Data mining methodologies for pharmacovigilance

Post marketing -Electronic Medical

Records-structured data

Yoon et al, demonstrated laboratory abnormality to be a valuable source

for PhV by examining the odds ratio of laboratory abnormalities between a

drug-exposed and a matched unexposed group using 10 years of EMR

data.

Evaluation of their algorithm on 470 randomly selected drug-and-

abnormal-lab-event pairs produced a positive predictive value of 0.837

and negative predictive value of 0.659.

27

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 28: Data mining methodologies for pharmacovigilance

Post marketing -Electronic Medical

Records-Unstructured Data

natural language processing (NLP) technique is required to extract the

needed information from unstructured data.

Wang et al first employed NLP techniques to extract drug-ADR

Link

28

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 29: Data mining methodologies for pharmacovigilance

Non-conventional Data Sources-Post marketing

1. Biomedical Literature

Shetty and Dalal retrieved articles (published between

1949 and2009), for prioritizing drug-ADR associations.

DPA was applied to identify statistically significant pairs

from the thousands of pairs in the remaining articles.

Evaluation showed that the method identified true

associations with 0.41 and 0.71 inprecision and recall,

respectively.

29

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 30: Data mining methodologies for pharmacovigilance

Non-conventional Data Sources

2. Health Forums

Data posted by users on health-related websites may also contain valuable drug safety information

mine drug-and-ADR from health –related websites (e.g. DailyStrength(http://www.dailystrength.org/))

System evaluation was conducted on a manually annotated set of 3600 user posts corresponding to 6 drugs. The system was shown to achieve 0.78 in precision and 0.70 in recall.

30

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 31: Data mining methodologies for pharmacovigilance

Non-conventional Data Sources

Chee et al, aggregated individuals’

opinions and review of drugs and used

NLP technique to group drugs.

Some drugs were withdrawn from

based on these messages.

31

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 32: Data mining methodologies for pharmacovigilance

Future perspectives

This presentation provides a general

overview of the current computational

methodologies applied for PhV. basic

concepts and highlight some

representative work

it is desirable to incorporate various

data sources into one framework to

understand ADRs.

Data mining algorithms are applicable

and useful to detect drugs

interactions.

EMR for ADR prediction is not readily

accessible for data mining, more

sophisticated studies and NLP

techniques is needed.

cause-and-effect relationships is an

intrinsically hard problem in data

mining and need to be further

investigated for the PhV application.

32

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 33: Data mining methodologies for pharmacovigilance

Useful links

https://www.mediguard.org

http://www.jmedicalcasereports.com

http://www-

stat.stanford.edu/~tibs/Correlate/

http://www.iom.edu/

http://www.smartlogic.com/

http://blogs.sas.com/content/jmp/201

2/06/04/disproportionality-analysis-is-

coming-in-jmp-clinical-4-0/

33

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 34: Data mining methodologies for pharmacovigilance

References

Oxford English Dictionary definition of "medicine“

Source: The Importance of Pharmacovigilance, WHO 2002

Budnitz, D.S., Pollock, D.A., Weidenbach, K.N.,Mendelsohn, A.B., Schroeder, T.J. and Annest, J.L. National surveillance of emergency department visits for outpatient adverse drug events. JAMA, 296, 15 (Oct 18 2006), 1858-1866.

Hopkins, A.L. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4, 11 (Nov 2008), 682-690.

Helma C, Gottmann E, Kramer S. Knowledge discovery and data mining in toxicology. Stat Meth Med Res. 2000;9:329–58.

http://articles.mercola.com/sites/articles/archive/2012/02/11/leading-causes-of-death-cost-for-us-economy.aspx

Mutsumi Fukuzaki, Mio Seki,Side Effect Prediction using Cooperative Pathways

34

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan

Page 35: Data mining methodologies for pharmacovigilance

Thank you!35

Abdelfattah Al Zaqqa, PSUT-Amman-Jordan