Regulatory variation and eQTLs
description
Transcript of Regulatory variation and eQTLs
Regulatory variation and eQTLs
Chris [email protected]
Regulatory variation
• What do trait-associated variants do?• Genetic changes to:
– Coding sequence **– Gene expression levels– Splice isomer levels– Methylation patterns– Chromatin accessibility– Transcription factor binding kinetics– Cell signaling– Protein-protein interactions
Regulatory
BASIC CONCEPTSHistory, eQTL, mQTL, others
Within a population
• Damerval et al 1994• 42/72 protein levels differ in maize• 2D electrophoresis, eyeball spot quantitation• Problems:
– genome coverage– quantitation– post-translational modifications
• Solution: use expression levels instead!
Usual mapping tools available
gene 3
Whole-genome eQTL analysis is an independent GWAS for expression of each gene
gene 2
gene N
gene 5
gene 4
gene 1
• cis-eQTL– The position of the eQTL maps near
the physical position of the gene.– Promoter polymorphism?– Insertion/Deletion?– Methylation, chromatin conformation?
• trans-eQTL– The position of the eQTL does not
map near the physical position of the gene.
– Regulator?– Direct or indirect?
Modified from Cheung and Spielman 2009 Nat Gen
Genetics of gene expression (eQTL)
eQTL – THE ARRAY ERAyeast, mouse, maize, human
Yeast
• Brem et al Science 2002• Linkage in 40 offspring of lab x wild strain
cross • 1528/6215 DE between parents• 570 map in cross
– multiple QTLs– 32% of 570 have cis linkage
• 262 not DE in parents also map
trans hotspots
Brem et al Science 2002
Yvert et al Nat Genet 2003
Mammals I
• F2 mice on atherogenic diet• Expression arrays; WG linkage
Schadt et alNature 2003
Mammals II
Chesler et al Nat Genet 2005
10% !!
Mammals III
• No major trans loci in humans– Cheung et al Nature 2003– Monks et al AJHG 2004– Stranger et al PLoS Genet 2005, Science 2007
WHERE ARE THE TRANS eQTLS?Open question
gene 3
Whole-genome eQTL analysis is an independent GWAS for expression of each gene
gene 2
gene N
gene 5
gene 4
gene 1
Issues with trans mapping
• Power– Genome-wide significance is 5e-8
– Multiple testing on ~20K genes– Sample sizes clearly inadequate
• Data structure– Bias corrections deflate variance– Non-normal distributions
• Sample sizes– Far too small
But…
• Assume that trans eQTLs affect many genes…
• …and you can use cross-trait methods!
Association data
Z1,1 Z1,2 … … Z1,p
Z2,1
::
Zs,1 Zs,p
Cross-phenotype meta-analysis
SCPMA ~L(data | λ≠1)
L(data | λ=1)
Cotsapas et al, PLoS Genetics
CPMA detects trans mixtures
Open research questions
• Do trans effects exist?– Yes – heritability estimates suggest so.– Can we detect them?
• Larger cohorts?– Most eQTL studies ~50-500 individuals– See later, GTEx Project
• Better methods?– Collapsing data?– PCA, summary statistics, modeling?
CAN WE LEARN REGULATORY VARIATION FROM eQTL?
First, let’s define the question
• Can we use genetic perturbations as a way to understand how genes are regulated?
• In what groups, in which tissues? • To what stimuli/signaling events? • Do cis eQTLs perturb promoter elements?• Do trans perturb TFs? Signaling cascades?
Most significant SNP per gene 0.001 permutation threshold
Significant associations are symmetrically distributed around TSS
Stranger et al., PLoS Gen 2012
268 271 262
73 85 82
86 86 86
Cell type-specific and cell type-shared gene associations(0.001 permutation threshold)
cell type
No.
of c
ell t
ypes
with
gen
e as
soci
ation
69-80% of cis associations are cell type-specific
• cis association sharing increases slightly when significance thresholds are relaxed• Cell type specificity verified experimentally for subset of eQTLs
Dimas et al Science 2009
Dimas et al Science 2009Slide courtesy Antigone Dimas
Open research questions
• Do cis eQTLs perturb functional elements?– Given each is independent, how can we know?
• Do tissue-specific effects correlate with the expression of a gene across tissues? Or a regulator?– Perhaps a gene is expressed, but in response to different
regulators across tissues?• If we ever find trans eQTLs…
– Common regulators of coregulated genes?– Tissue specificity?– Mechanisms?
APPLICATION TO GWASCandidate genes, perturbations underlying organismal phenotypes
eQTLs as intermediate traits
Schadt et al Nat Genet 2005
Modified from Nica and Dermitzakis Hum Mol Genet 2008
Exploring eQTLs in the relevant cell type is important for disease association studies
cell type not relevant for diseaserelevant cell type for disease
Importance of cataloguing regulatory variation in multiple cell types
Slide courtesy Antigone Dimas
Barrett et al 2008de Jager et al 2007
Franke et al 2010Anderson et al 2011
POPULATION DIFFERENCES
Shared association in 8 HapMap populations
APOH: apolipoprotein H Stranger et al., PLoS Gen 2012
Number of genes with cis-eQTL associations8 extended HapMap populations
Spearman Rank CorrelationEnsGene
0.01 0.001 0.0001# genes FDR # genes FDR # genes FDR
CEU 2869 0.06 657 0.03 313 0.01CHB 2832 0.06 774 0.02 378 0.00GIH 2959 0.06 698 0.03 300 0.01JPT 2900 0.06 795 0.02 386 0.00
LWK 3818 0.05 773 0.02 311 0.01MEX 2609 0.07 472 0.04 165 0.01MKK 4222 0.04 947 0.02 411 0.00YRI 3961 0.05 799 0.02 328 0.01
non-redundant 12494 3130 1132>2 pops 6889 0.55 1074 0.34 547 0.488 pops 151 0.01 63 0.02 28 0.02
4 PCA + 4 non-PCAGenes are Ensembl genes.Generated on July 22, 2009, modified July 28th because of MKK error.These numbers differ from v2 because all assocs where dist to true TSS > 1Mb were removed.This removed some genes. I redid the MostSigSnpEnsGene after removing those SNPs. This is in column MostSigSnpEnsGene_1
SRC: permutation threshold
Stranger et al., PLoS Gen 2012
Direction of allelic effectsame SNP-gene combination across populations
AGREEMENT
OPPOSITE
Population 1
rs40915
THAP
5
TTCTCC
8.00
7.75
7.50
7.25
7.00
6.75
6.50
Population 2
rs40915
THAP
5
TTCTCC
8.00
7.75
7.50
7.25
7.00
6.75
6.50
log 2 e
xpre
ssio
n
log 2 e
xpre
ssio
n
rs40915
THAP
5
TTCTCC
8.00
7.75
7.50
7.25
7.00
6.75
6.50
log 2 e
xpre
ssio
n
rs40915
THAP
5
TTCTCC
8.00
7.75
7.50
7.25
7.00
6.75
6.50
log 2 e
xpre
ssio
n
Stranger et al., PLoS Gen 2012
Slide courtesy Alkes Price
Population differences could have non-genetic basis
• Differences due to environment? (Idaghdour et al. 2008)
• Differences in cell line preparation? (Stranger et al. 2007)
• Differences due to batch effects? (Akey et al. 2007)
(Reviewed in Gilad et al. 2008)
Slide courtesy Alkes Price
Gene expression experiment
Does gene expression in 60 CEU + 60 YRI vary with ancestry?
Does gene expression in 89 AA vary with % Eur ancestry?
60 CEU + 60 YRI from HapMap, 89 AA from Coriell HD100AAGene expression measurements at 4,197 genes obtained using Affymetrix Focus array
c
Slide courtesy Alkes Price
Gene expression differences in African Americans validate CEU-YRI differences
c = 0.43 (± 0.02)(P-value < 10-25)
12% ± 3%
in cis
Slide courtesy Alkes Price
EMERGING EFFORTSRNAseq, GTEx
RNAseq questions
• Standard eQTLs – Montgomery et al, Pickrell et al Nature 2010
• Isoform eQTLs– Depth of sequence!
• Long genes are preferentially sequenced• Abundant genes/isoforms ditto• Power!?• Mapping biases due to SNPs
Strategies for transcript assembly
Garber et al. Nat Methods 8:469 (2011)
GTEx – Genotype-Tissue EXpressionAn NIH common fund project
Current: 35 tissues from 50 donors
Scale up: 20K tissues from 900 donors.
Novel methods groups: 5 current + RFA
RNAseq combined with other techs
• Regulons: TF gene sets via CHiP/seq– Look for trans effects
• Open chromatin states (Dnase I; methylation)– Find active genes– Changes in epigenetic marks correlated to RNA– Genetic effects
• RNA/DNA comparisons – Simultaneous SNP detection/genotyping– RNA editing ???