Post on 25-Dec-2015
Where Innovation Is Tradition
Meta-analysis of genetic associations using knowledge
representationJ. Enrique Herrera-Galeano
Jeff Solka
Colloquium
Bioinformatics and Computational Biology
Systems Biology
George Mason University
September 24th, 2013
Where Innovation Is Tradition
1. Background
2. The problem &
3. Motivating examples
4. Results
5. OGA application
Outline
Hypothesis
Where Innovation Is Tradition
Genetics
Hippocrates (460-370 BC), Celsus (25 BC-50 AD), and Galen (130-201 AD) Description of the Human body
Mendel distinguished between internal state (genotype) and the external stage (phenotype)
Mendelian inheritance, biochemical pathway defects, metabolic disorders
Phenylketonuria - Described by Ivar Asbjørn Følling In 1934 is a good example of a disorder caused by as single mutation Autosomal recesive
• <do> add gene </do>
• <do> add SNP </do>
• This caused the search for the gene for everything
Where Innovation Is Tradition
Genetic epidemiology• Segregation analysis = analysis of pedigrees
• PCR 1980’s Short tandem repeats (STRs)
Highly polymorphic and neutral to selection
Whole Genome Mapping (WGM) or Linkage analysis
1990’s linkage of Breast Cancer to Chromosome 17q (D17S588 and D17S250)
Where Innovation Is Tradition
BRCA1 and BRCA2 Chromosome 17q
Where Innovation Is Tradition
Not as simple
• Janine Altmüller in 2001 best summarized these observations by stating “Positional cloning based on whole-genome screens in complex human disease has proved more difficult than originally had been envisioned…” (Altmüller, 2001)
Where Innovation Is Tradition
Candidate Gene Approach
• 1990’s Due to the limited success of WGM Take all the genes associated with the phenotype by different methods, find polymorphisms, genotype.
• 2000’s Human genome Sequencing -> SNPs Illumina Golden Gate array - Thousand of SNPs – Hundreds of Genes - SNP selection problem (NP complete)
Where Innovation Is Tradition
Candidate Gene Approach
Herrera-Galeano, 2008
Metropolis monteCarlo markov chain
Min(σ(distanace) *The probability of a SNP being real p = 0.3L + 0.2H + 0.2S + 0.1M + 0.1V, where L = Illumina scoreH = heterozygosity (fromdbSNP) S = success rate (from dbSNP),M = 1 if present as tag SNP in the HapMap, or zero if not, V = the number of validation sources/10.
Where Innovation Is Tradition
Candidate Gene Approach
Example:
• PEAR1 Herrera-Galeano, ATVB 2008
Where Innovation Is Tradition
Complex Human Disease
Neurological abnormalities: Schizophrenia, depressionHigh Blood pressureLDL cholesterolHeightWeightBMI
Vp = Vg + Ve Vp = Phenotypic varianceVg = Genetic varianceVe = Environmental variance
Heritability in the broad sense H = Vg/Vp (Falconer, 1993)
Where Innovation Is Tradition
Complex Human Disease
Where Innovation Is Tradition
Genome Wide association (GWAS)• High Density arrays now allowed for
millions of SNPs, left behind SNP selection.
• Missing Heritability
Where Innovation Is Tradition
GWAS
Solutions to the missing heritability problem:
Epigenomics, other omics…
Epistatic effect:
1. Map/reduce for cloud brute force (Wang, 2011)
2. Random handfuls ( Province, 2008)
3. Machine learning (Lin, 2012)
4. Information theory (Lee, 2012)
Where Innovation Is Tradition
The problem/hypothesis
• All of these focus on the search space of the genotypes the relationships of phenotypes currently unutilized
• Are closely related phenotypes associated to the same genes?
• What methodology can be utilized to answer such a question?
GWAS General Well Being
QTL clearly related to Mental disorders, what if a related SNP was associated with a related phenotype
Where Innovation Is Tradition
GWAS General Well Being Example
RsNumber Pvalue Position ObsHET MAF HWpval Genes Fxn_Class
rs11588923 0.04847 147983660 0.066 0.034 1 LOC729130 intron
rs1046332 2.00E-07 148084132 0.038 0.019 1 NA NA
rs15931 5.90E-10 148122974 0.032 0.016 1 HIST2H2BE mrna-utr
rs1451641 2.30E-10 148132504 0.031 0.016 1 NA NA
rs1349532 2.30E-10 148137627 0.031 0.016 1 BOLA1 locus-region
rs12078573 0.00402 148170233 0.092 0.052 0.1476 MTMR11 intron
rs10494363 5.60E-11 148176119 0.03 0.015 1 NA NA
rs16841623 0.04478 148204570 0.116 0.059 0.3557 OTUD7B intron
rs16841697 0.04478 148205144 0.116 0.059 0.3557 OTUD7B intron
rs16832993 0.03906 148234790 0.116 0.059 0.3557 OTUD7B intron
Where Innovation Is Tradition
Ontologies and Genetic association
Requirements :
• Phenotype ontology Human Phenotype Ontology (HPO) Robinson (2010)
• Database of Genetic associations (NCBI Genetic Association Database)
Where Innovation Is Tradition
Ontologies and Genetic associationColumbia Medical Entity Dictionary (MED) -A semantic network from ICD-10, SNOMED, UMLS
Is-a relationship
Where Innovation Is Tradition
Human Phenotype Ontology
Where Innovation Is Tradition
Linking HOP with GAD
• How to match the ontology concepts with the genetic association database entries?
Overlapping matching sets:
CoronaryArtery
Disease
Concepts that matchCoronary Artery Disease
Where Innovation Is Tradition
Linking HOP with GAD
Pattern matching: Find string s in text T
Finite-state automaton (grep)
Blast
Suffix tree/array
Where Innovation Is Tradition
Linking HOP with GAD
Suffix array:
One common word: percentage of assignment (41.1% vs.
27.5%) error rate 30% , one sample n=1,000
Complete string matching: percentage of assignment 19%,
error rate ~2% on 5 samples of n=1000
Where Innovation Is Tradition
OGA Entity Relationship Diagram
Where Innovation Is Tradition
SQLite DBs
3/28/12
Where Innovation Is Tradition
OGA Simplified UML Diagram
Where Innovation Is Tradition
Mockup OGA
3/28/12
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Implementation
Where Innovation Is Tradition
OGA Package contents
File Description
oga.jar The java jar file that contains all the
classes necessary to run the application
merge.db The SQLite database that implements
the database design (see methods)
Concepts.data The names of the HOP concepts
Concepts.data.bis The index to support the suffix array
based pattern matching
Libraries sqlite-jdbc-3.7.2.jar a dependency to
connect to the SQLite database
Genetic Associations on the Phenotype Ontology
Where Innovation Is Tradition
Why these nine genes?
Gene Symbol Gene Name
BDNF brain-derived neurotrophic factor
CLOCK circadian locomoter output cycles
CNR1 cannabinoid receptor 1
GHRL ghrelin/obestatin prepropeptide
HTR1B 5-hydroxytryptamine (serotonin)
HTR2A 5-hydroxytryptamine (serotonin)
HTR2C 5-hydroxytryptamine (serotonin)
SLC6A4 neurotransmitter transporter
TPH1 tryptophan hydroxylase 1
Where Innovation Is Tradition
OGA: Ontology of Genetic Associations
Allows for answering questions such as:
• What Genes are associated with Mental Disorder?
• What is the intersection of genes between two or more phenotypes of interest?
Where Innovation Is Tradition
Observed count of phenotypes per gene : Mental Disorder Concept
Gene Gene Name Phenotype Count
SLC6A4 neurotransmitter transporter 20NOS1 nitric oxide synthase 1 16
HLA-A major histocompatibility complex, class I, A 13APOE apolipoprotein E 11
HLA-DRB1 major histocompatibility complex, class II, DR beta 1 10NOS2A nitric oxide synthase 2, inducible 10TOR1A torsin family 1, member A 10TOR1B torsin family 1, member B 10BCHE butyrylcholinesterase 9CCL2 chemokine (C-C motif) ligand 2 9SERPINI1 serpin peptidase inhibitor, clade I 9VLDLR very low density lipoprotein receptor 9MAOA monoamine oxidase A 8
Where Innovation Is Tradition
Phenotypes count found by chance?
• Empirical p-value
Empirical p-value = 1 / sumi=1..n(C'i)
Where Innovation Is Tradition
OGA preliminary statsGAD has 84,558 entries
23,303 unique matches (27.5%)
SLC6A4 -> 20 phenotypes 178 iterations p-value = 0.0056
NOS1 -> 16 phenotypes 41 iterations
p-value = 0.02
All other > 0.05
SLC6A4, MAOA, NOS1, NOS2A and NOS3
Where Innovation Is Tradition
INFORMATION NETWORK
SLC6A4
Regulates
SEROTONIN
MAOA
Degrades
Oxidase
NOS1, NOS2A
Antioxidants and depression?
Where Innovation Is Tradition
Neurocarta
Where Innovation Is Tradition
OGA vs Neurocarta
OGA Neurocarta
Number of links 98,698 30,000
Number of concepts 2,708 2,000
Number of genes 4,666 7,000
Backbone HOP HOP, DO, MPO
Curated No Yes
Statistical analysis Yes No
Interface Standalone Website
Where Innovation Is Tradition
Top 10 genes by phenotype count
Gene Phenotypes in OGA
ACE 1,923
NOS3 1,659
APOE 1,573
GJB2 1,042
HLA-DRB1 1,008
AGT 971
MTHFR 960
NOS1 866
TNF 770
HLA-DQB1 689
Where Innovation Is Tradition
Top 10 phenotypes by gene count
Phenotype Genes in OGA
Alzheimer’s Disease 2,433
Schizophrenia 1,816
Colorectal Cancer 1,581
Hypertension 1,251
Breast Cancer 1,211
Asthma 911
Osteosclerosis 798
Rheumatoid arthritis 687
Myocardial infarction 643
Obesity 641
Where Innovation Is Tradition
Motivating examples
1. Colon cancer and Helicobacter pylori infection susceptibility
2. Lipid metabolism, diabetes, obesity, and hypertension
3. Schizophrenia, bulimia, depression and psychosis
4. Autism and Cerebral palsy
Where Innovation Is Tradition
Motivating examples
1. Colon cancer and Helicobacter pylori infection susceptibility
Strofilas et al., 2012 Colon cancer & H. pylori infection
O'Donoghue, 2011 CYP2C19 and H. pylori
Yamamoto et al., 2013 CYP2C19 and cancer
CYP2C19 is the gene symbol for the Cytochrome P450, family 2, subfamily C, polypeptide 19 gene
Where Innovation Is Tradition
Motivating examples2. Lipid metabolism, diabetes, obesity, and hypertension
Gene Symbol Gene Name Comment (associations according to OMIM)
APOE Apolipoprotein E Alzheimer disease-2, Hyperlipoproteinemia, type III, Myocardial infarction susceptibility
ACE Angiotensin I-converting
enzyme
Myocardial infarction susceptibility, Alzheimer disease, Stroke
CETP Cholesteryl Ester Transfer
protein
Hyperalphalipoproteinemia
AGT Angiotensinogen Hypertension
IL6 Interleukin 6 Diabetes
FGB Fibrinogen B, Beta polypeptide None
PON1 Paraoxonase 1 Coronary artery disease, Microvascular complications of diabetes
LPL Lipoprotein lipase Combined hyperlipidemia, familial
MTHFR 5,10-Methylenetetrahydrofolate
reductase
Vascular disease, Schizophrenia
Where Innovation Is Tradition
Motivating examples2. Lipid metabolism, diabetes, obesity, and hypertension
Cytoscape
Where Innovation Is Tradition
Motivating examples
3. Schizophrenia, bulimia, depression and psychosisGene Symbol Gene Name Comment
HTR2A HTR2A 5-hydroxytryptamine
(serotonin) receptor 2A, G
protein-coupled
A neurotransmitter associated
with depression, schizophrenia,
anorexia
SLC6A3 Solute carrier family 6
(neurotransmitter transporter,
dopamine), member 3
Eating disorders, attention
deficit-hyperactivity disorder,
Major affective disorder 1
SLC6A4 Solute carrier family 6
(neurotransmitter transporter,
dopamine), member 3
Anxiety, Obsessive-compulsive
disorder
Empirical p value < 0.001
Where Innovation Is Tradition
Motivating examples
3. Schizophrenia, bulimia, depression and psychosis
Where Innovation Is Tradition
Motivating examples
Autism
and
Cerebral
palsy
Gene Symbol Gene Name Comment (OMIM)
PTGS2 prostaglandin-endoperoxide synthase 2
(prostaglandin G/H synthase and
cyclooxygenase)
Prostaglandin synthesis
APOE Apolipoprotein E Alzheimer disease-2, Hyperlipoproteinemia, type III,
Myocardial infarction susceptibility
SERPINE1 Serpin peptidase inhibitor, clade E (nexin,
plasminogen activator inhibitor type 1),
member 1
Plasminogen activator inhibitor-1 deficiency 613329
{Transcription of plasminogen activator inhibitor,
modulator of}
TFPI Tissue factor pathway inhibitor (lipoprotein-
associated coagulation inhibitor)
Also known as lipoprotein-associated coagulation
inhibitor
ITGB3 Integrin, beta 3 (platelet glycoprotein IIIa,
antigen CD61)
Glanzmann thrombasthenia, purpura posttransfusion,
thrombocytopenia, neonatal alloimmune ,
susceptibility to Myocardial infarction.
TNFTumor necrosis factor
Asthma, cardiovascular disease
MTHFR 5,10-Methylenetetrahydrofolate reductase Vascular disease, Schizophrenia
Where Innovation Is Tradition
Conclusions
1. An indexing algorithm for pattern matching has been successfully implemented to link HOP and GAD with a low error rate (2%)
2. OGA has been implemented and released
3. One motivating example has results deviating from the null
4. Bioinformatics paper is under review
Where Innovation Is Tradition
Future directions
1. Evaluation of observations by network analysis or combined effect models
2. Extension of OGA by adding GWASdb
3. Extension of backbone to include DO
4. Automated updates
5. Automated model evaluation from dbGAP
Where Innovation Is Tradition
Thank you
Dr. Jeffrey Solka
Dr. Iosif Vaisman
Dr. Patrick M Gillevet
Dr. David Hirschberg
Dr. Vishwesh Mokashi