(Re)introduction to Unix Sarah Medland. So Unix… Long and venerable history
Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)
-
Upload
barbara-dalton -
Category
Documents
-
view
221 -
download
1
Transcript of Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)
![Page 1: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/1.jpg)
Introduction to Gene-Finding: Linkage and Association
Danielle Dick, Sarah Medland, (Ben Neale)
![Page 2: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/2.jpg)
Aim of QTL mapping…
LOCALIZE and then IDENTIFY a locus that regulates a trait (QTL)
• Locus: Nucleotide or sequence of nucleotides with variation in the population, with different variants associated with different trait levels.
![Page 3: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/3.jpg)
Location and Identification
• Linkage
• localize region of the genome where a QTL that regulates the trait is likely to be harboured
• Family-specific phenomenon: Affected individuals in a family share the same ancestral predisposing DNA segment at a given QTL
![Page 4: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/4.jpg)
Location and Identification
• Association
• identify a QTL that regulates the trait• Population-specific phenomenon: Affected
individuals in a population share the same ancestral predisposing DNA segment at a given QTL
![Page 5: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/5.jpg)
Linkage
Overview
![Page 6: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/6.jpg)
Progress of the Human Genome Project
Human Chromosome 4
![Page 7: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/7.jpg)
Genetic markers (DNA polymorphisms)
ATGCTTGCCACGCE
ATGCTTGCCATGCE
Single Nucleotide Polymorphism
ATGCTTGCCACGCE
ATGCTTCTTGCCATGCE
Microsatellite Markerscan be di(2), tri(3), or tetra (4)nucleotide repeats
![Page 8: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/8.jpg)
DNA polymorphisms
Can occur in gene, but be silent
Can change gene product (protein) Alter amino acid sequence (a lot or a little)
Can regulate gene product Upregulate or downregulate protein production Turn off or on gene
Can occur in noncoding region This happens most often!
![Page 9: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/9.jpg)
Mutations
![Page 10: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/10.jpg)
How do we map genes?
Deviation from Mendel’s Independent Assortment Law Aa & Bb = ¼ AB, ¼ Ab, ¼ aB, ¼ ab
We’re looking for variation from this
![Page 11: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/11.jpg)
Recombination
![Page 12: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/12.jpg)
Recombination
Another way of introducing genetic diversity
Allows us to map genes!
Crossovers more likely to occur between genes that are further away; likelihood of a recombination event is proportional to the distance Interference – tend not to see 2 crossovers in a small area
Alleles that are very close together are more likely to stay together, don’t assort independently
![Page 13: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/13.jpg)
Linkage Mapping (is a marker “linked” to the disease gene)
Collect families with affected individuals
Genome Scan - Test markers evenly spaced across the entire genome (~every 10cM, ~400 markers)
Lod score (“log of the odds”) – what are the odds of observing the family marker data if the marker is linked to the disease (less recombination than expected) compared to if the marker is not linked to the disease
![Page 14: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/14.jpg)
Thomas Hunt Morgan – discoverer of linkage
![Page 15: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/15.jpg)
Linkage = Co-segregation
A2A4
A3A4
A1A3
A1A2
A2A3
A1A2 A1A4 A3A4 A3A2
Marker allele A1
cosegregates withdominant disease
![Page 16: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/16.jpg)
Lod scores
>3.0 evidence for linkage
<-2.0 can rule out linkage
In between – inconclusive, collect more families
![Page 17: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/17.jpg)
Linkage = Co-segregation
A2A4
A3A4
A1A3
A1A2
A2A3
A1A2 A1A4 A3A4 A3A2
•Parametric Linkage used very successfully to map disease genes for Mendelian disorders
•Problematic for complex disorders: requires disease model, penetrance, assumes gene of major effect, phenotypic precision
![Page 18: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/18.jpg)
Nonparametric Linkage
Based on allele-sharing
More appropriate for phenotypes with multiple genes of small effect, environment, no disease model assumed
Basic unit of data: affected relative (often sibling) pairs
![Page 19: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/19.jpg)
x
1/4 1/4 1/4 1/4
![Page 20: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/20.jpg)
IDENTITY BY DESCENT
Sib 1
Sib 2
4/16 = 1/4 sibs share BOTH parental alleles IBD = 2
8/16 = 1/2 sibs share ONE parental allele IBD = 1
4/16 = 1/4 sibs share NO parental alleles IBD = 0
2
2
2
2
1
1
1 1
1
1
1 10
0
0
0
![Page 21: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/21.jpg)
Genotypic similarity between relatives
IBS Alleles shared Identical By State “look the same”, may have the
same DNA sequence but they are not necessarily derived from a
known common ancestor - focus for associationIBD Alleles shared
Identical By Descent
are a copy of the
same ancestor allele
- focus for linkage
M1
Q1
M2
Q2
M3
Q3
M3
Q4
M1
Q1
M3
Q3
M1
Q1
M3
Q4
M1
Q1
M2
Q2
M3
Q3
M3
Q4
IBS IBD
2 1
![Page 22: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/22.jpg)
Genotypic similarity – basic principals Loci that are close together are more likely to be
inherited together than loci that are further apart
Loci are likely to be inherited in context – ie with their surrounding loci
Because of this, knowing that a loci is transmitted from a common ancestor is more informative than simply observing that it is the same allele Critical to have parental data when possible
![Page 23: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/23.jpg)
Linkage Markers…
![Page 24: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/24.jpg)
For disease traits (affected/unaffected)
Affected sib pairs selected
IBD = 2IBD = 1IBD = 0
1000
250
750
500
Expected 1 2 3 127 310
Markers
![Page 25: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/25.jpg)
For continuous measures
Unselected sib pairs1.00
0.25
0.75
0.50
IBD = 0 IBD = 1 IBD = 2
Co
rre
lati
on
be
twee
n s
ibs
0.00
![Page 26: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/26.jpg)
So how does all this fit into Mx?
![Page 27: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/27.jpg)
IDENTITY BY DESCENT
Sib 1
Sib 2
4/16 = 1/4 sibs share BOTH parental alleles IBD = 2
8/16 = 1/2 sibs share ONE parental allele IBD = 1
4/16 = 1/4 sibs share NO parental alleles IBD = 0
2
2
2
2
1
1
1 1
1
1
1 10
0
0
0
![Page 28: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/28.jpg)
T2
AEA C
a ac e
T1
1 111
E
e
1
C
c
1
1 or .5
1
In biometrical modeling A is correlated at 1 for MZ twins and .5 for DZ twins .5 is the average genome-wide sharing of genes
between full siblings (DZ twin relationship)
![Page 29: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/29.jpg)
In linkage analysis we will be estimating an additional variance component Q For each locus under analysis the coefficient of
sharing for this parameter will vary for each pair of siblings The coefficient will be the probability that the pair of
siblings have both inherited the same alleles from a common ancestor π̂
![Page 30: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/30.jpg)
Q A C E
PTwin1
E C A Q
PTwin2
MZ=1.0 DZ=0.5
MZ & DZ = 1.0
1 1 1 11 1 1 1
q a c e e c a q
π̂
![Page 31: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/31.jpg)
Linkage
How do we do this?
1.Genotyping data.
Break down of time spent during a linkage/ association study
Cleaning &preparinggenotype dataRuning linkageanalyses
Estimatingsignificance
![Page 32: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/32.jpg)
Microsatellite data Ideally positioned at equal genetic distances
across chromosome Mostly di/tri nucleotide repeats
http://research.marshfieldclinic.org/genetics/GeneticResearch/screeningsets.asp
![Page 33: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/33.jpg)
Microsatellite data Raw data consists of allele lengths/calls (bp) Different primers give different lengths
So to compare data you MUST know which primers were used
http://research.marshfieldclinic.org/genetics/GeneticResearch/screeningsets.asp
![Page 34: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/34.jpg)
Binning Raw allele lengths are converted to allele
numbers or lengths Example:D1S1646 tri-nucleotide repeat size
range130-150 Logically: Work with binned lengths Commonly: Assign allele 1 to 130 allele, 2 to 133 allele … Commercially: Allele numbers often assigned based on
reference populations CEPH. So if the first CEPH allele was 136 that would be assigned 1 and 130 & 133 would assigned the next free allele number
Conclusions: whenever possible start from the RAW allele size and work with allele length
![Page 35: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/35.jpg)
Error checking
After binning check for errors Family relationships (GRR, Rel-pair) Mendelian Errors (Sib-pair) Double Recombinants (MENDEL, ASPEX,
ALEGRO) An iterative process
![Page 36: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/36.jpg)
‘Clean’ data ped file
Family, individual, father, mother, sex, dummy, genotypes
![Page 37: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/37.jpg)
Estimating genotypic sharing… The ped file is used with ‘map’ files to obtain
estimates of genotypic sharing between relatives at each of the locations under analysis
![Page 38: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/38.jpg)
Estimating genotypic sharing…
Merlin will give you probabilities of sharing 0, 1, 2 alleles for every pair of individuals.
![Page 39: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/39.jpg)
Estimating genotypic sharing… Output
![Page 40: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/40.jpg)
Estimating genotypic sharing… Output
Why isn’t P0, P1, P2 exact for everyone?
![Page 41: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/41.jpg)
Estimating genotypic sharing… Output
Why isn’t P0, P1, P2 exact for everyone?
-missing parental genotypes-low informativeness at marker
1/2 2/2
1/2 2/2
![Page 42: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/42.jpg)
Q A C E
PTwin1
E C A Q
PTwin2
MZ=1.0 DZ=0.5
MZ & DZ = 1.0
1 1 1 11 1 1 1
q a c e e c a q
π̂
![Page 43: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/43.jpg)
Genotypic similarity between relatives
IBD Alleles shared Identical By Descent are a copy of the same ancestor allele
Pairs of siblings may share 0, 1 or 2 alleles IBD
The probability of a pair of relatives being IBD is known as pi-hat
M1
Q1
M2
Q2
M3
Q3
M3
Q4
M1
Q1
M3
Q3
M1
Q1
M3
Q4
M1
Q1
M2
Q2
M3
Q3
M3
Q4
IBS IBD
2 1
ˆ ( 2) .5* ( 1)p IBD p IBDπ = +
![Page 44: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/44.jpg)
Estimating genotypic sharing… Output ˆ ( 2) .5* ( 1)p IBD p IBDπ = +
ˆ ?π =
ˆ ?π =
![Page 45: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/45.jpg)
Distribution of pi-hat
Adult Dutch DZ pairs: distribution of pi-hat at 65 cM on chromosome 19 < 0.25: IBD=0 group > 0.75: IBD=2 group others: IBD=1 group pi65cat= (0,1,2)
π̂
π̂π̂
![Page 46: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/46.jpg)
Linkage Analyses
Advantage Systematically scan the genome
Disadvantages Not very powerful Need hundreds – thousands of family member Broad peaks
![Page 47: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/47.jpg)
Lod scores
1cM = 1MB1MB=1000kb1kb=1000bp1cM = 1,000,000 bp
![Page 48: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/48.jpg)
Strategy
0
0.5
1
1.5
2
2.5
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160cM
Lod Scores
Wave 1
Wave 2
Combined
1. Ascertain families with multiple affecteds
2. Linkage analyses to identify chromosomal regions
3. Association analyses to identify specific genes
allele-sharing among affecteds within a family
Gene A Gene B Gene C
![Page 49: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/49.jpg)
BREAK
![Page 50: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/50.jpg)
Linkage vs. Association
Linkage analyses look for relationship between a marker and disease within a family (could be different marker in each family)
Association analyses look for relationship between a marker and disease between families (must be same marker in all families)
![Page 51: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/51.jpg)
Allelic Association:Allelic Association:Extension of linkage to the populationExtension of linkage to the population
3/5 2/6
3/2 5/2
3/5 2/6
3/6 5/6
Both families are ‘linked’ with the marker, but a different allele is involved
![Page 52: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/52.jpg)
Allelic AssociationAllelic AssociationExtension of linkage to the populationExtension of linkage to the population
3/6 2/4
3/2 6/2
3/5 2/6
3/6 5/6
All families are ‘linked’ with the markerAllele 6 is ‘associated’ with disease
4/6 2/6
6/6 6/6
![Page 53: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/53.jpg)
Localization
Linkage analysis yields broad chromosome regions harbouring many genes Resolution comes from recombination events (meioses)
in families assessed ‘Good’ in terms of needing few markers, ‘poor’ in terms
of finding specific variants involved
Association analysis yields fine-scale resolution of genetic variants Resolution comes from ancestral recombination events ‘Good’ in terms of finding specific variants, ‘poor’ in
terms of needing many markers
![Page 54: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/54.jpg)
Allelic AssociationAllelic AssociationThree Common FormsThree Common Forms
• Direct Association• Mutant or ‘susceptible’ polymorphism• Allele of interest is itself involved in phenotype
• Indirect Association• Allele itself is not involved, but a nearby correlated
marker changes phenotype
• Spurious association• Apparent association not related to genetic aetiology
(most common outcome…)
![Page 55: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/55.jpg)
Indirect and Direct Allelic Association
D
*
Measure disease relevance (*) directly, ignoring correlated markers nearby
Semantic distinction between Linkage Disequilibrium: correlation between (any) markers in populationAllelic Association: correlation between marker allele and trait
Direct Association
M1 M2 Mn
Assess trait effects on D via correlated markers (Mi) rather than susceptibility/etiologic variants.
D
Indirect Association & LD
![Page 56: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/56.jpg)
Decay of Linkage Disequilibrium
Reich et al., Nature 2001
![Page 57: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/57.jpg)
Average Levels of LD along chromosomes
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25 30
CEPHW.EurEstonian
Chr22
Dawson et alNature 2002
![Page 58: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/58.jpg)
Characterizing Patterns of Linkage Disequilibrium
Average LD decay vs physical distance
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25 30
Mean trends along chromosomes
Haplotype Blocks
![Page 59: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/59.jpg)
Linkage Disequilibrium Maps & Allelic Association
Primary Aim of LD maps: Use relationships amongst background markers (M1, M2, M3, …Mn) to learn something about D for association studies
Something = * Efficient association study design by reduced genotyping* Predict approx location (fine-map) disease loci * Assess complexity of local regions* Attempt to quantify/predict underlying (unobserved) patterns···
Marker 1 2 3 n
LD
D
![Page 60: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/60.jpg)
Deliverables: Sets of haplotype tagging SNPs
![Page 61: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/61.jpg)
1. Human Genome Project Good for consensus, not good for individual differences
2. Identify genetic variants Anonymous with respect to traits.
3. Assay genetic variants Verify polymorphisms, catalogue correlations amongst sites Anonymous with respect to traits
Sept 01 Feb 02 April 04 Oct 04
April 1999 – Dec 01
Oct 2002 - present
Building Haplotype Maps for Gene-finding
![Page 62: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/62.jpg)
Haplotype Tagging for Efficient Genotyping
Cardon & Abecasis, TIG 2003
• Some genetic variants within haplotype blocks give redundant information
• A subset of variants, ‘htSNPs’, can be used to ‘tag’ the conserved haplotypes with little loss of information (Johnson et al., Nat Genet, 2001)
• … Initial detection of htSNPs should facilitate future genetic association studies
![Page 63: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/63.jpg)
HapMap Strategy
Samples Four populations, small samples
Genotyping 5 kb initial density across genome (600K
markers) Subsequent focus on low LD regions Recent NIH RFA for deeper coverage
![Page 64: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/64.jpg)
Hapmap validating millions of SNPs.
Are they the right SNPs?
0
0.1
0.2
0.3
0.4
0.5
0.6
1-10% 11-20% 21-30% 31-40% 41-50%
Minor allele frequency
Population frequency
0
0.1
0.2
0.3
0.4
0.5
0.6
1-10% 11-20% 21-30% 31-40% 41-50%
Minor allele frequency
Population frequency
Frequency of public markers
Expected frequency in population
Distribution of allele frequencies in public markers is biased toward common alleles
Phillips et al. Nat Genet 2003
Updated with phase 2—more similar to expectation
![Page 65: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/65.jpg)
Summary of Role of Linkage Disequilibrium on Association Studies Marker characterization is becoming extensive and
genotyping throughput is high
Tagging studies will yield panels for immediate use Need to be clear about assumptions/aims of each panel
Density of eventual Hapmap probably cover much of genome in high LD, but not all
Challenges
Just having more markers doesn’t mean that success rate will improve Expectations of association success via LD are too high.
![Page 66: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/66.jpg)
Two types of association studies Case-control Family-based
![Page 67: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/67.jpg)
Allelic AssociationAllelic Association
3/6
2/43/2
6/23/5
2/6
3/6 5/6
Allele 6 is ‘associated’ with disease
4/62/6
6/6
6/6
3/4
5/2
Controls Cases
![Page 68: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/68.jpg)
Main BlameMain Blame
Primary Concern with Case-Control Analyses
Population stratification
Analysis of mixed samples having different allele frequencies is a primary concern in human genetics, as it leads to false evidence for allelic association.
![Page 69: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/69.jpg)
Population Stratification
Leads to spurious association
Requirements: Group differences in allele frequencies AND Group differences in outcome
In epidemiology, this is a classic matching problem, with genetics as a confounding variable
![Page 70: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/70.jpg)
M m Freq. Affected 51 59 .055 Unaffected 549 1341 .945 .30 .70
Population Stratification
Sample ‘B’ M m Freq. Affected 1 9 .01 Unaffected 99 891 .99 .10 .90 χ2
1 is n.s.
+
χ21 = 14.84, p < 0.001
Spurious Association
![Page 71: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/71.jpg)
Family-based association methods
TDT – Transmission Disequilibrium Test
1/2 3/3
2/3
•50/50 chance the 2 is transmitted•Looking for overtransmission of a particular allele across affected individuals (undertransmission to unaffecteds)
![Page 72: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/72.jpg)
TDT Advantages/Disadvantages
Detection/elimination of genotyping errors causes bias (Gordon et al., 2001)Uses only heterozygous parents Inefficient for genotyping
3 individuals yield 2 founders: 1/3 information not used
Can be difficult/impossible to collect
Late-onset disorders, psychiatric conditions, pharmacogenetic applications
Robust to stratificationGenotyping error detectable via Mendelian inconsistenciesEstimates of haplotypes possible
Advantages
Disadvantages
![Page 73: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/73.jpg)
Association studies < 2000: TDT
• TDT virtually ubiquitous over past decadeGrant, manuscript referees & editors mandated design
• View of case/control association studies greatly diminished due to perceived role of stratification
• Case/controls, using extra genotyping• +families, when available
Association Studies 2000+ :Return to population
![Page 74: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/74.jpg)
Detecting and Controlling for Detecting and Controlling for Population Stratification with Genetic MarkersPopulation Stratification with Genetic Markers
Idea• Take advantage of availability of large N genetic markers
• Use case/control design
• Genotype genetic markers across genome (Number depends on different factors)
• Look if any evidence for background population substructure exists and account for it
![Page 75: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/75.jpg)
Two types of association studies Case-control
Adv: more powerful Disadv: population stratification
limited by case/control definition
Family-based Adv: population stratification not a problem Disadv: less powerful, hard to collect parents for some
phenotypes
![Page 76: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/76.jpg)
Association Analyses vs Linkage Advantage
More powerful
Disadvantage Not systematic (in the past)
Now! Genome wide association scans
![Page 77: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/77.jpg)
Current Association Study Challenges1) Genome-wide screen or candidate geneGenome-wide screen
Hypothesis-free High-cost: large genotyping
requirements Multiple-testing issues
Possible many false positives, fewer misses
Candidate gene Hypothesis-driven Low-cost: small genotyping
requirements Multiple-testing less
important Possible many misses,
fewer false positives
![Page 78: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/78.jpg)
Current Association Study Challenges2) What constitutes a replication?
GOLD Standard for association studies
Replicating association results in different laboratories is often seen as most compelling piece of evidence for ‘true’ finding
But…. in any sample, we measureMultiple traitsMultiple genesMultiple markers in genes
and we analyse all this using multiple statistical tests
What is a true replication?
![Page 79: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/79.jpg)
What is a true replication?
Association to same trait, but different gene
Association to same trait, same gene, different SNPs (or haplotypes)
Association to same trait, same gene, same SNP – but in opposite direction (protective disease)
Association to different, but correlated phenotype(s)
No association at all
Genetic heterogeneity
Allelic heterogeneity
Allelic heterogeneity/pop differences
Phenotypic heterogeneity
Sample size too small
Replication Outcome Explanation
![Page 80: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/80.jpg)
There exist 6+ million putative SNPs in the public domain. Are they the right markers?
Current Association Study Challenges3) Do we have the best set of genetic markers
0
0.1
0.2
0.3
0.4
0.5
0.6
1-10% 11-20% 21-30% 31-40% 41-50%
Minor allele frequency
Population frequency
0
0.1
0.2
0.3
0.4
0.5
0.6
1-10% 11-20% 21-30% 31-40% 41-50%
Minor allele frequency
Population frequency
Frequency of public markers
Expected frequency in population
Allele frequency distribution is biased toward common alleles
![Page 81: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/81.jpg)
Current Association Study Challenges3) Do we have the best set of genetic markers
Tabor et al, Nat Rev Genet 2003
![Page 82: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/82.jpg)
Greatest power comes from markers that match allele freq with trait loci
Disease AlleleFrequency
Marker Allele Frequency
0.1 0.3 0.5 0.7 0.9
0.1 248 626 1306 2893 10830
0.3 1018 238 466 996 3651
0.5 2874 702 267 556 2002
0.7 9169 2299 925 337 1187
0.9 73783 18908 7933 3229 616
s = 1.5, = 5 x 10-8, Spielman TDT (Müller-Myhsok and Abel, 1997)
![Page 83: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/83.jpg)
Current Association Study Challenges4) Integrating the sampling, LD and genetic effectsQuestions that don’t stand alone:
How much LD is needed to detect complex disease genes?
What effect size is big enough to be detected?
How common (rare) must a disease variant(s) be to be identifiable?
What marker allele frequency threshold should be used to find complex disease genes?
![Page 84: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/84.jpg)
Complexity of System•In any indirect association study, we measure marker alleles that are correlated with trait variants…
We do not measure the trait variants themselves
•But, for study design and power, we concern ourselves with frequencies and effect sizes at the trait locus….
This can only lead to underpowered studies and inflated expectations
•We should concern ourselves with the apparent effect size at the marker, which results from
1) difference in frequency of marker and trait alleles2) LD between the marker and trait loci3) effect size of trait allele
![Page 85: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/85.jpg)
Practical Implications of Allele Frequencies
‘Strongest argument for using common markers is not CD-CV. It is practical:
For small effects, common markers are the only ones for which sufficient sample sizes can be collected
There are situations where indirect association analysis will not work Discrepant marker/disease freqs, low LD, heterogeneity, … Linkage approach may be only genetics approach in these cases
At present, no way to know when association will/will not work Balance with linkage
![Page 86: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/86.jpg)
Allele based test? 2 alleles 1 df
E(Y) = a + bX X = 0/1 for presence/absence
Genotype-based test? 3 genotypes 2 df
E(Y) = a + b1A+ b2D A = 0/1 additive (hom); W = 0/1 dom (het)
Haplotype-based test? For M markers, 2M possible haplotypes 2M -1 df
E(Y) = a + bH H coded for haplotype effects
Multilocus test? Epistasis, G x E interactions, many possibilities
Current Association Study Challenges5) How to analyse the data
![Page 87: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/87.jpg)
Candidate genes: a few tests (probably correlated)
Linkage regions: 100’s – 1000’s tests (some correlated)
Whole genome association: 100,000s – 1,000,000s tests (many correlated)
What to do? Bonferroni (conservative) False discovery rate? Permutations?….Area of active research
Current Association Study Challenges6) Multiple Testing
![Page 88: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/88.jpg)
Despite challenges: upcoming association studies hold some promise Availability of millions of genetic markers
Genotyping costs decreasing rapidly Cost per SNP: 2001 ($0.25) 2003 ($0.10) 2004
($0.01)
Background LD patterns being characterized International HapMap and other projects
![Page 89: Introduction to Gene-Finding: Linkage and Association Danielle Dick, Sarah Medland, (Ben Neale)](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649d095503460f949dbd06/html5/thumbnails/89.jpg)
Genome Wide Association Studies (GWAS) Underway: Genetic Analysis Information Network (GAIN)
Psoriasis, ADHD, Schizophrenia, Bipolar Disorder, Depression, Type 1 Diabetes
Welcome Trust Case Control Consortium Bipolar Disorder, Coronary Artery Disease, Crohn’s disease,
Rheumatoid Arthristis, Type 1 Diabetes, Type 2 Diabetes
Genes, Environment, & Health Initiative (Gene/Environment Association Studies: GENEVA) Addiction, diabetes, Heart Disease, Oral Clefts, Maternal
Metabolism and Birth Weight, Lung Cancer, Pre-Term Birth, Dental Carries
Genes, Environment, & Development Initiative (GEDI)