24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic...
-
Upload
rebecca-tyler -
Category
Documents
-
view
223 -
download
2
Transcript of 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic...
![Page 1: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/1.jpg)
24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007
Bayesian association of haplotypes and non-genetic factors to
regulatory and phenotypic variation in human populations
Anitha Kannan and John Winn
Jim Huang*
Probabilistic and Statistical Inference Group, Edward S. Rogers Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada
Microsoft Research Cambridge Machine Learning and Perception Group Cambridge, UK
![Page 2: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/2.jpg)
24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007
Outline
• Main contributions:
• Joint Bayesian modelling of genetic variation data and quantitative trait measurements
• Rich probabilistic model for genotype data• State-of-the-art results on predicting missing
genotypes
![Page 3: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/3.jpg)
24/07/2007ISMB/ECCB 2007
OutlineGenotype: Unordered pair of SNPs along both chromosomes
Haplotype: Ordered set of SNPs along a chromosome
Presence of recombination hotspots partitions haplotypes into blocks [Daly, 2001]
![Page 4: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/4.jpg)
24/07/2007ISMB/ECCB 2007
Part I: Learning haplotype block structure
• Our model for genotype data should:– Account for phase & parent-child information– Account for uncertainty in ancestral
haplotypes– Account for uncertainty in block structure– Account for population-specific haplotype
block statistics– Allow for prior knowledge of haplotype block
structure
![Page 5: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/5.jpg)
24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007
Previous models for genotype data
• Previous methods learn a low-dimensional representation of the genotype data:
• HAPLOBLOCK (Greenspan, G. and Geiger, D. RECOMB 2003)– Hard partitioning of data into set of haplotype blocks using low-
dimensional “ancestral” haplotypes
• fastPHASE (Scheet P. and Stephens, M. Am J Hum Genet 2006)– Learn ancestral haplotypes from high-dimensional genotype data
while accounting for uncertainty in haplotype blocks
• Jojic, N., Jojic, V. and Heckerman, D. UAI 2004.
![Page 6: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/6.jpg)
24/07/2007ISMB/ECCB 2007
Low-dimensional latent representation
Probabilistic generative model for genotype data
High-dimensional data
Unsupervised learning via maximum likelihood
![Page 7: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/7.jpg)
24/07/2007ISMB/ECCB 2007
Predicting missing genotype data
• Have we learned a good density model for genotype data?
• Gains from– Accounting for uncertainty in haplotype block structure– Accounting for uncertainty in ancestral haplotypes– Accounting for parental relationships
• Assess model using cross-validation/test prediction error
![Page 8: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/8.jpg)
24/07/2007ISMB/ECCB 2007
Predicting missing genotype data
• Crohn’s/5q31 data set (Daly et al., 2001)– Crohn’s disease data from Chromosome 5q31 containing
genotypes for 129 children + 258 parents across 103 loci (phases given for children)
• For each test set, make ρ fraction of data missing• Retain model parameters from model learned from training
data, then draw 1000 samples over missing data• Compute fill-in error rate over 1000 samples, for all
missing data
![Page 9: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/9.jpg)
24/07/2007ISMB/ECCB 2007
Prediction error for Crohn’s/5q31 data
![Page 10: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/10.jpg)
24/07/2007ISMB/ECCB 2007
Comparative performance for Crohn’s/5q31 data
![Page 11: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/11.jpg)
24/07/2007ISMB/ECCB 2007
Establishing haplotype block boundaries
• Define the recombination prior γ on transition probabilities– Different γ correspond to different “blockiness” of data
• For each locus k, can compute the probability of transition pk
– Can establish a threshold t and establish block boundaries
• Once blocks are defined, can assign block labels lb = (m,n)
![Page 12: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/12.jpg)
24/07/2007ISMB/ECCB 2007
Haplotype block structure in the ENm006 region
• 573 SNP markers for 270 individuals from 3 sub-populations:– 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan,
Nigeria (YRI);– 90 individuals (30 trios) of European descent from Utah (CEU)– 45 Han Chinese individuals from Beijing (CHB+JPT)/45 Japanese
individuals from Tokyo (JPT)
![Page 13: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/13.jpg)
24/07/2007ISMB/ECCB 2007
Part II: Linking haplotype block structure and gene expression data
![Page 14: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/14.jpg)
24/07/2007ISMB/ECCB 2007
A model for linking haplotype structure to quantitative trait measurements
Observed quantitative trait profile
+
x 1.0
x 0.0
Relevance variable
=
Latent block profile
Haplotype block 2
Individual 1
Individual 2
Individual 3
Individual 4
Individual 5
Individual 1
Individual 2
Individual 3
Individual 4
Individual 5
Haplotype block 1
Label 1
Label 2
Label 3
Label 4
x
x
![Page 15: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/15.jpg)
24/07/2007ISMB/ECCB 2007
Sbj
zgj
μbg
wbg
ρg
individuals j = 1,…,J
blocks b = 1,…,B
quantitative traits g = 1,…,G
ºº
α0,β0
τ0,μ0
Noise precision
Latent block profile
Relevance variable
Observed trait
Block label
π0
A Bayesian model for linking haplotype structure to quantitative
measurements
Tbj
![Page 16: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/16.jpg)
24/07/2007ISMB/ECCB 2007
Linking haplotype blocks to phenotype• 387 individuals with Crohn’s (+1) or non-Crohn’s (-1) phenotype;• Link 10 haplotype blocks from 5q31 to phenotype• Average cross-validation error: 23.1% + 3.45%
Haplotype blocks 2 and 10 most relevant to Crohn’s phenotype (p < 4.76 x 10-5)
Test cases (sorted)
Test data splits
![Page 17: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/17.jpg)
24/07/2007ISMB/ECCB 2007
Linking haplotype blocks to gene expression• ENm006 data set:
• 19 haplotype blocks (573 SNPs)• 28 gene expression profiles in ENm006 region
(Stranger et al., 2007)
![Page 18: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/18.jpg)
24/07/2007ISMB/ECCB 2007
Addressing population stratification
…whereas variation between individuals is the effect we’re interested in
The population variable affects phenotype/gene expression…
![Page 19: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/19.jpg)
24/07/2007ISMB/ECCB 2007
Associations between haplotype blocks and gene expression
GDI1 - HapBlock2 (YRI) GDI1 - HapBlock5 (CHB+JPT)
p < 2.5 x 10-4 p < 3.33 x 10-4
![Page 20: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/20.jpg)
24/07/2007ISMB/ECCB 2007
Summary
• Enhanced version of Jojic et al. (UAI 2004) model for haplotype inference/ discovering block structure
• Novel Bayesian model for associating haplotype blocks to gene expression
• We re-discover population-specific block structures across populations in the HapMap data
• Predictions for Crohn’s disease from Chromosome 5q31 data• Cis- associations between blocks and gene expression in
ENm006 in presence of non-genetic factors
• Cis- association between HapBlocks 2 and 5 and GDI1
![Page 21: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/21.jpg)
24/07/2007ISMB/ECCB 2007
The road ahead…
• Applying to larger portions of the HapMap data• Finding trans- associations• Non-linear models for associating block structure to
quantitative traits• Joint learning of haplotype block structure and
associations• Accounting for patterns of gene
co-expression/similar phenotypes
![Page 22: 24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in.](https://reader030.fdocuments.us/reader030/viewer/2022020117/56649d125503460f949e5a21/html5/thumbnails/22.jpg)
24/07/2007ISMB/ECCB 2007
Acknowledgements
• Manolis Dermitzakis and Richard Durbin, Wellcome Trust Sanger Institute
• Nebojsa Jojic, Microsoft Research Redmond
• Paul Scheet, University of Michigan - Ann Arbor
• US National Science Foundation (NSF)