Where Innovation Is Tradition Meta-analysis of genetic associations using knowledge representation...

Post on 25-Dec-2015

213 views 0 download

Transcript of Where Innovation Is Tradition Meta-analysis of genetic associations using knowledge representation...

Where Innovation Is Tradition

Meta-analysis of genetic associations using knowledge

representationJ. Enrique Herrera-Galeano

Jeff Solka

Colloquium

Bioinformatics and Computational Biology

Systems Biology

George Mason University

September 24th, 2013

Where Innovation Is Tradition

1. Background

2. The problem &

3. Motivating examples

4. Results

5. OGA application

Outline

Hypothesis

Where Innovation Is Tradition

Genetics

Hippocrates (460-370 BC), Celsus (25 BC-50 AD), and Galen (130-201 AD) Description of the Human body

Mendel distinguished between internal state (genotype) and the external stage (phenotype)

Mendelian inheritance, biochemical pathway defects, metabolic disorders

Phenylketonuria - Described by Ivar Asbjørn Følling In 1934 is a good example of a disorder caused by as single mutation Autosomal recesive

• <do> add gene </do>

• <do> add SNP </do>

• This caused the search for the gene for everything

Where Innovation Is Tradition

Genetic epidemiology• Segregation analysis = analysis of pedigrees

• PCR 1980’s Short tandem repeats (STRs)

Highly polymorphic and neutral to selection

Whole Genome Mapping (WGM) or Linkage analysis

1990’s linkage of Breast Cancer to Chromosome 17q (D17S588 and D17S250)

Where Innovation Is Tradition

BRCA1 and BRCA2 Chromosome 17q

Where Innovation Is Tradition

Not as simple

• Janine Altmüller in 2001 best summarized these observations by stating “Positional cloning based on whole-genome screens in complex human disease has proved more difficult than originally had been envisioned…” (Altmüller, 2001)

Where Innovation Is Tradition

Candidate Gene Approach

• 1990’s Due to the limited success of WGM Take all the genes associated with the phenotype by different methods, find polymorphisms, genotype.

• 2000’s Human genome Sequencing -> SNPs Illumina Golden Gate array - Thousand of SNPs – Hundreds of Genes - SNP selection problem (NP complete)

Where Innovation Is Tradition

Candidate Gene Approach

Herrera-Galeano, 2008

Metropolis monteCarlo markov chain

Min(σ(distanace) *The probability of a SNP being real p = 0.3L + 0.2H + 0.2S + 0.1M + 0.1V, where L = Illumina scoreH = heterozygosity (fromdbSNP) S = success rate (from dbSNP),M = 1 if present as tag SNP in the HapMap, or zero if not, V = the number of validation sources/10.

Where Innovation Is Tradition

Candidate Gene Approach

Example:

• PEAR1 Herrera-Galeano, ATVB 2008

Where Innovation Is Tradition

Complex Human Disease

Neurological abnormalities: Schizophrenia, depressionHigh Blood pressureLDL cholesterolHeightWeightBMI

Vp = Vg + Ve Vp = Phenotypic varianceVg = Genetic varianceVe = Environmental variance

Heritability in the broad sense H = Vg/Vp (Falconer, 1993)

Where Innovation Is Tradition

Complex Human Disease

Where Innovation Is Tradition

Genome Wide association (GWAS)• High Density arrays now allowed for

millions of SNPs, left behind SNP selection.

• Missing Heritability

Where Innovation Is Tradition

GWAS

Solutions to the missing heritability problem:

Epigenomics, other omics…

Epistatic effect:

1. Map/reduce for cloud brute force (Wang, 2011)

2. Random handfuls ( Province, 2008)

3. Machine learning (Lin, 2012)

4. Information theory (Lee, 2012)

Where Innovation Is Tradition

The problem/hypothesis

• All of these focus on the search space of the genotypes the relationships of phenotypes currently unutilized

• Are closely related phenotypes associated to the same genes?

• What methodology can be utilized to answer such a question?

GWAS General Well Being

QTL clearly related to Mental disorders, what if a related SNP was associated with a related phenotype

Where Innovation Is Tradition

GWAS General Well Being Example

RsNumber Pvalue Position ObsHET MAF HWpval Genes Fxn_Class

rs11588923 0.04847 147983660 0.066 0.034 1 LOC729130 intron

rs1046332 2.00E-07 148084132 0.038 0.019 1 NA NA

rs15931 5.90E-10 148122974 0.032 0.016 1 HIST2H2BE mrna-utr

rs1451641 2.30E-10 148132504 0.031 0.016 1 NA NA

rs1349532 2.30E-10 148137627 0.031 0.016 1 BOLA1 locus-region

rs12078573 0.00402 148170233 0.092 0.052 0.1476 MTMR11 intron

rs10494363 5.60E-11 148176119 0.03 0.015 1 NA NA

rs16841623 0.04478 148204570 0.116 0.059 0.3557 OTUD7B intron

rs16841697 0.04478 148205144 0.116 0.059 0.3557 OTUD7B intron

rs16832993 0.03906 148234790 0.116 0.059 0.3557 OTUD7B intron

Where Innovation Is Tradition

Ontologies and Genetic association

Requirements :

• Phenotype ontology Human Phenotype Ontology (HPO) Robinson (2010)

• Database of Genetic associations (NCBI Genetic Association Database)

Where Innovation Is Tradition

Ontologies and Genetic associationColumbia Medical Entity Dictionary (MED) -A semantic network from ICD-10, SNOMED, UMLS

Is-a relationship

Where Innovation Is Tradition

Human Phenotype Ontology

Where Innovation Is Tradition

Linking HOP with GAD

• How to match the ontology concepts with the genetic association database entries?

Overlapping matching sets:

CoronaryArtery

Disease

Concepts that matchCoronary Artery Disease

Where Innovation Is Tradition

Linking HOP with GAD

Pattern matching: Find string s in text T

Finite-state automaton (grep)

Blast

Suffix tree/array

Where Innovation Is Tradition

Linking HOP with GAD

Suffix array:

One common word: percentage of assignment (41.1% vs.

27.5%) error rate 30% , one sample n=1,000

Complete string matching: percentage of assignment 19%,

error rate ~2% on 5 samples of n=1000

Where Innovation Is Tradition

OGA Entity Relationship Diagram

Where Innovation Is Tradition

SQLite DBs

3/28/12

Where Innovation Is Tradition

OGA Simplified UML Diagram

Where Innovation Is Tradition

Mockup OGA

3/28/12

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Package contents

File Description

oga.jar The java jar file that contains all the

classes necessary to run the application

merge.db The SQLite database that implements

the database design (see methods)

Concepts.data The names of the HOP concepts

Concepts.data.bis The index to support the suffix array

based pattern matching

Libraries sqlite-jdbc-3.7.2.jar a dependency to

connect to the SQLite database

Genetic Associations on the Phenotype Ontology

Where Innovation Is Tradition

Why these nine genes?

Gene Symbol Gene Name

BDNF brain-derived neurotrophic factor

CLOCK circadian locomoter output cycles

CNR1 cannabinoid receptor 1

GHRL ghrelin/obestatin prepropeptide

HTR1B 5-hydroxytryptamine (serotonin)

HTR2A 5-hydroxytryptamine (serotonin)

HTR2C 5-hydroxytryptamine (serotonin)

SLC6A4 neurotransmitter transporter

TPH1 tryptophan hydroxylase 1

Where Innovation Is Tradition

OGA: Ontology of Genetic Associations

Allows for answering questions such as:

• What Genes are associated with Mental Disorder?

• What is the intersection of genes between two or more phenotypes of interest?

Where Innovation Is Tradition

Observed count of phenotypes per gene : Mental Disorder Concept

Gene Gene Name Phenotype Count

SLC6A4 neurotransmitter transporter 20NOS1  nitric oxide synthase 1 16

HLA-A major histocompatibility complex, class I, A 13APOE apolipoprotein E 11

HLA-DRB1 major histocompatibility complex, class II, DR beta 1 10NOS2A nitric oxide synthase 2, inducible 10TOR1A torsin family 1, member A 10TOR1B torsin family 1, member B 10BCHE butyrylcholinesterase 9CCL2 chemokine (C-C motif) ligand 2 9SERPINI1 serpin peptidase inhibitor, clade I 9VLDLR very low density lipoprotein receptor 9MAOA monoamine oxidase A 8

Where Innovation Is Tradition

Phenotypes count found by chance?

• Empirical p-value

Empirical p-value = 1 / sumi=1..n(C'i)

Where Innovation Is Tradition

OGA preliminary statsGAD has 84,558 entries

23,303 unique matches (27.5%)

SLC6A4 -> 20 phenotypes 178 iterations p-value = 0.0056

NOS1 -> 16 phenotypes 41 iterations

p-value = 0.02

All other > 0.05

SLC6A4, MAOA, NOS1, NOS2A and NOS3

Where Innovation Is Tradition

INFORMATION NETWORK

SLC6A4

Regulates

SEROTONIN

MAOA

Degrades

Oxidase

NOS1, NOS2A

Antioxidants and depression?

Where Innovation Is Tradition

Neurocarta

Where Innovation Is Tradition

OGA vs Neurocarta

OGA Neurocarta

Number of links 98,698 30,000

Number of concepts 2,708 2,000

Number of genes 4,666 7,000

Backbone HOP HOP, DO, MPO

Curated No Yes

Statistical analysis Yes No

Interface Standalone Website

Where Innovation Is Tradition

Top 10 genes by phenotype count

Gene Phenotypes in OGA

ACE 1,923

NOS3 1,659

APOE 1,573

GJB2 1,042

HLA-DRB1 1,008

AGT 971

MTHFR 960

NOS1 866

TNF 770

HLA-DQB1 689

Where Innovation Is Tradition

Top 10 phenotypes by gene count

Phenotype Genes in OGA

Alzheimer’s Disease 2,433

Schizophrenia 1,816

Colorectal Cancer 1,581

Hypertension 1,251

Breast Cancer 1,211

Asthma 911

Osteosclerosis 798

Rheumatoid arthritis 687

Myocardial infarction 643

Obesity 641

Where Innovation Is Tradition

Motivating examples

1. Colon cancer and Helicobacter pylori infection susceptibility

2. Lipid metabolism, diabetes, obesity, and hypertension

3. Schizophrenia, bulimia, depression and psychosis

4. Autism and Cerebral palsy

Where Innovation Is Tradition

Motivating examples

1. Colon cancer and Helicobacter pylori infection susceptibility

Strofilas et al., 2012 Colon cancer & H. pylori infection

O'Donoghue, 2011 CYP2C19 and H. pylori

Yamamoto et al., 2013 CYP2C19 and cancer

CYP2C19 is the gene symbol for the Cytochrome P450, family 2, subfamily C, polypeptide 19 gene

Where Innovation Is Tradition

Motivating examples2. Lipid metabolism, diabetes, obesity, and hypertension

Gene Symbol Gene Name Comment (associations according to OMIM)

APOE Apolipoprotein E Alzheimer disease-2, Hyperlipoproteinemia, type III, Myocardial infarction susceptibility

ACE Angiotensin I-converting

enzyme

Myocardial infarction susceptibility, Alzheimer disease, Stroke

CETP Cholesteryl Ester Transfer

protein

Hyperalphalipoproteinemia

AGT Angiotensinogen Hypertension

IL6 Interleukin 6 Diabetes

FGB Fibrinogen B, Beta polypeptide None

PON1 Paraoxonase 1 Coronary artery disease, Microvascular complications of diabetes

LPL Lipoprotein lipase Combined hyperlipidemia, familial

MTHFR 5,10-Methylenetetrahydrofolate

reductase

Vascular disease, Schizophrenia

Where Innovation Is Tradition

Motivating examples2. Lipid metabolism, diabetes, obesity, and hypertension

Cytoscape

Where Innovation Is Tradition

Motivating examples

3. Schizophrenia, bulimia, depression and psychosisGene Symbol Gene Name Comment

HTR2A HTR2A 5-hydroxytryptamine

(serotonin) receptor 2A, G

protein-coupled

A neurotransmitter associated

with depression, schizophrenia,

anorexia

SLC6A3 Solute carrier family 6

(neurotransmitter transporter,

dopamine), member 3

Eating disorders, attention

deficit-hyperactivity disorder,

Major affective disorder 1

SLC6A4 Solute carrier family 6

(neurotransmitter transporter,

dopamine), member 3

Anxiety, Obsessive-compulsive

disorder

Empirical p value < 0.001

Where Innovation Is Tradition

Motivating examples

3. Schizophrenia, bulimia, depression and psychosis

Where Innovation Is Tradition

Motivating examples

Autism

and

Cerebral

palsy

Gene Symbol Gene Name Comment (OMIM)

PTGS2 prostaglandin-endoperoxide synthase 2

(prostaglandin G/H synthase and

cyclooxygenase)

Prostaglandin synthesis

APOE Apolipoprotein E Alzheimer disease-2, Hyperlipoproteinemia, type III,

Myocardial infarction susceptibility

SERPINE1 Serpin peptidase inhibitor, clade E (nexin,

plasminogen activator inhibitor type 1),

member 1

Plasminogen activator inhibitor-1 deficiency 613329

{Transcription of plasminogen activator inhibitor,

modulator of}

TFPI Tissue factor pathway inhibitor (lipoprotein-

associated coagulation inhibitor)

Also known as lipoprotein-associated coagulation

inhibitor

ITGB3 Integrin, beta 3 (platelet glycoprotein IIIa,

antigen CD61)

Glanzmann thrombasthenia, purpura posttransfusion,

thrombocytopenia, neonatal alloimmune ,

susceptibility to Myocardial infarction.

TNFTumor necrosis factor

Asthma, cardiovascular disease

MTHFR 5,10-Methylenetetrahydrofolate reductase Vascular disease, Schizophrenia

Where Innovation Is Tradition

Conclusions

1. An indexing algorithm for pattern matching has been successfully implemented to link HOP and GAD with a low error rate (2%)

2. OGA has been implemented and released

3. One motivating example has results deviating from the null

4. Bioinformatics paper is under review

Where Innovation Is Tradition

Future directions

1. Evaluation of observations by network analysis or combined effect models

2. Extension of OGA by adding GWASdb

3. Extension of backbone to include DO

4. Automated updates

5. Automated model evaluation from dbGAP

Where Innovation Is Tradition

Thank you

Dr. Jeffrey Solka

Dr. Iosif Vaisman

Dr. Patrick M Gillevet

Dr. David Hirschberg

Dr. Vishwesh Mokashi