Popula’on)Structure)and)Disease3...
Transcript of Popula’on)Structure)and)Disease3...
-
Popula'on Structure and Disease-‐Associa'ons
02-‐223 How to Analyze Your Own Genome
Fall 2013
-
Popula'on Structure and Genome-‐wide Associa'on Analysis
• The muta;on that gives the lactose persistence phenotype is more common in Caucasian popula;on than in Asian popula;on
• The allele for blonde hair color is also more common in Caucasian popula;on than in Asian popula;on
-
Popula'on Structure and Genome-‐wide Associa'on Analysis
• Popula;on structure in data causes false posi;ves in GWAS – If samples in the case group are more related (come from the same
popula;on group), any SNPs more prevalent in the case popula;on will be found significantly associated with the trait.
-
Popula'on Structure and Genome-‐wide Associa'on Analysis
• What if we perform GWAS within each popula;on groups.
Half of the people have “aa” in both case and control groups
Half of the people have “AA” in both case and control groups
-
Accoun'ng for Popula'on Structure in Associa'on Analysis
• Needs to account for popula;on structure in associa;on mapping
• During the data collec;on process, one needs to design the study such that each popula;on is represented in case/control groups in a balanced way – In prac;ce, this can be hard to control – The effect of cryp;c popula;on structure
-
Genomic Control (GC)
• Idea: Use the SNPs that are not associated with the trait to remove the effect of popula;on stra;fica;on
• Genotype data consist of – Candidate genes to be tested for associa;ons – L supplementary loci (null loci) for es;ma;ng the infla;on factor λ
• GC uses the infla;on factor λ to correct the associa;on sta;s;c of the SNP in the candidate gene
• Limita;on: the infla;on factor λ is assumed to be the same across the genome, ignoring popula;on admixture
Devlin & Roeder, Biometrics 1999
-
Genomic Control (GC)
P-‐value threshold before GC correc;on
P-‐value threshold a_er GC correc;on
-
Structured Associa'on
• Idea: Within each subpopula;on, an associa;on between a gene;c marker and the trait is a true associa;on.
• Two-‐stage method – Step 1:
• es;mate the popula;on structure by applying clustering algorithms on the genome data
• assign sampled individuals to popula;on groups – Step 2:
• Test for phenotype associa;on within each popula;on inferred in Step 1
-
Structured Associa'on
• Cluster individuals to popula;on groups and perform GWAS within each popula;on group
Half of the people have “aa” in both case and control groups
Half of the people have “AA” in both case and control groups
-
Experiments: Lactose Persistence Phenotype
• Data : 1400 individuals from the control group of the WTCCC dataset, all of European descent. (The Wellcome Trust Case Control Consortium, Nature 2007)
• Genotype : 135.16-136.82Mb region on chromosome 2 (known to show geographical variation).
• Phenotype : Lactose persistence, fully determined by a particular mutation near the LCT gene (Enattah et al., 2002)
• Associated marker : SNP rs4988243 lies in a high linkage disequilibrium region (r2 >0.9) with this known genetic variant.
-
Experiments: Lactose Persistence
• Results from admixture clustering (Pritchard et al., Gene;cs 2000) of genotype data with four populations
• Given the results (genome composi;on, each column for each individual in the figure below) from Structure, individuals are grouped into four popula;ons using K-‐means algorithm
-
Experiments: Lactose Persistence
• Detec;ng the muta;on that confers lactose persistence phenotype to an individual
• Genomic control was not successful in detec;ng the true associa;on SNP, part because it ignores admixture
The correct SNP for lactose persistence phenotype
Genomic Control
-
Experiments: Lactose Persistence
• Detec;ng the muta;on that confers lactose persistence phenotype to an individual
• Once the popula;on structure is discovered by Structure, sparse mul;variate regression is run on each group separately
Lasso for structured associa;on (for each subpopula;on discovered by Structure)
The correct SNP for lactose persistence phenotype
-
Summary
• Popula;on structure and associa;on study – The alleles that are differently represented in different popula;ons can
appear as falsely associated with the phenotype of interest
– It is important to detect the popula;on structure in genomes and take into account this informa;on in associa;on analysis
• Sta;s;cal methods for correc;ng for popula;on structure – Genomic control – Structured associa;on