Some current issues in QTL identification Lon Cardon Wellcome Trust Centre for Human Genetics...
-
Upload
bennett-grant -
Category
Documents
-
view
221 -
download
7
Transcript of Some current issues in QTL identification Lon Cardon Wellcome Trust Centre for Human Genetics...
Some current issues in QTL identification
Lon CardonWellcome Trust Centre for Human Genetics
University of Oxford
Acknowledgements: Goncalo AbecasisStacey ChernyTwin course faculty
Positional Cloning
LO
D
Sib pairs Chromosome Region Association Study
Genetics
GenomicsPhysical Mapping/Sequencing
Candidate Gene Selection/Polymorphism Detection
Mutation Characterization/Functional Annotation
Inflammatory Bowel Disease Genome Screen
Hampe et al., Am J Hum Genet, 64:808-816, 1999
Inflammatory Bowel Disease Genome Screen
Hampe et al., Am J Hum Genet, 64:808-816, 1999
Susceptibility locus mapped for Crohn’s Disease
Genome Screens for Linkage in Sib-pairs
1997/98- Diabetes (IDDM + NIDDM)- Asthma/atopy- Osteoporosis- Obesity- Multiple Sclerosis- Rheumatoid arthritis- Systemic lupus erythematosus- Ankylosing spondylitis- Epilepsy- Inflammatory Bowel Disease- Celiac Disease- Psychiatric Disorders (incl. Scz, bipolar)- Behavioral traits (incl. Personality, panic)- others missed...
1999- NIDDM- Asthma/atopy- Psoriasis- Inflammatory Bowel Disease- Osteoporosis/Bone Mineral Density- Obesity- Epilepsy- Thyroid disease- Pre-eclampsia- Blood pressure- Psychiatric disorders (incl. Scz, bipolar)- Behavioral traits (incl. smoking, alcoholism,
autism)- Familial combined hyperlipidemia- Tourette syndrome- Systemic lupus erythematosus- others missed…
Human QTL Linkage Gene Identification Successes
0Well, at least < 5
Why so few successes in human QTL mapping?
Many valid reasons proposed:• Phenotypic complexity (not measured well)• Genetic complexity (many genes of small effect, GxE, epistasis)• Genotype error• Sampling design• Statistical methods• ….
Most linkage studies have been under-powered (and over-hyped)
QTL Mapping has very low power !1000 sibs, no parents: markers every 10 cM, each marker H=0.8
QTLh2=0.33
Kruglyak L, Lander ES. (1995). Am J Hum Genet 57: 439-454
Increasing power to detect linkage in sib-pairs
• Phenotypic selection– Carey & Williamson, 1991, AJHG
– Eaves & Meyer, 1994, Behav Genet
– Cardon & Fulker, 1994, AJHG
– Risch & Zhang, 1996, AJHG
Equivalent full sample N for 200 selected pairs from 10,000 (QTL allele freq = .2)
Concordant Discordant Combined
Additive 1400 3300 5000
Recessive 6000 3100 9500
Dominant 1400 3100 4400
1 2 3 4 5 6 7 8 9 102
46
810100
150
200
250
300
350
Decile ranking - Sib 1
Sib 2
Info
rma
tio
n s
core
Information Score for Additive Gene Action (p=0.5)
Linkage Analysis of QTLs-Summary-
• Spotted history. Few, if any, bona fide successes• Power has been large problem
• Of the few replicated loci, most have used some form of selection• EDAC, other selection schemes from large cohorts now underway• Genome-scans coming soon
Promising beginning for QTL linkage mapping
Positional Cloning
LO
D
Sib pairs Chromosome Region Association Study
Genetics
GenomicsPhysical Mapping/Sequencing
Candidate Gene Selection/Polymorphism Detection
Mutation Characterization/Functional Annotation
Association Analysis
• Simple genetic basis
Short unit of resemblancePopulation-specific
• One of easiest genetic study
designs
Correlate allele frequencies with traits/diseasesAt core of monogenic & oligo/polygenic trait models
• Widely used in past 20 years
HLA, candidate genes, pharmacogenetics, positional cloning
Angiotensin-1 Converting Enzyme
Keavney et al. (1999) Hum Mol Gen, 7:1745-1751
Evidence for Linkage
0
5
10
LO
D
A-5466C A-240T T1237C I/D 4656(CT)3/2
T-5991C T-3892C T-93C G2215A G2350A
Results of ACE analysis using VC association model
A-5466C A-240T T1237C I/D 4656(CT)3/2
T-5991C T-3892C T-93C G2215A G2350A0
5
10
15
LOD
for Linkage for Association
Alzheimers and ApoE4
Roses, Nature 2000
Association Resolution by Position
Roses, Nature 2000
Decay of Linkage Disequilibrium in a Small Set of Genes
Toward a linkage disequilibrium map of the human genome
• > 10 year ago, emphasis mainly on theory - LD measures, decay, population comparisons, …
• 1989: 1st use of LD for disease mapping: Cystic Fibrosis
• Recent years, gene-based haplotypes used widely for monogenic mapping
• Last 2 years: larger scale assessment of common alleles in reference populations
LD/haplotype map objective: find regions of high and low ancestral conservation to clarify signal/noise in allelic association studies
History of LD studies in humans:
Haplotype Map: Data/Interpretations
Distribution of pairwise LD ‘average extent of LD’
LD differences in genes
Eaves et al, Nat Genet 2000 Taillon-Miller et al, Nat Genet 2000
Stephens et al, Science 2001
Reich et al, Nature 2001
Johnson et al, Nat Genet 2001 Abecasis et al, AJHG 2001
Haplotype Map: Data/Interpretations
Local patterns of LD … Conserved haplotype segments ... ‘Blocks’
5q31. Daly et al, Nat Genet 2001
MHC class II. Jeffreys et al, Nat Genet 2001
Chr21. Patil et al, Science 2001
Current Status: Data/Interpretations
• How to define ‘useful’ LD is still unclear
• Easier to focus on pairwise LD rather than haplotypes. Is this efficient?
• For common alleles, D’ measure, LD extends ~ 50-60 kb on averageFor rare alleles, ?
• There is great variability in regional patterns of LDExplanations, predictors yet unknown
• Haplotype blocks are detectable and present broadly
• Size of blocks? How best to define them? Utility of htSNPs?
Human Genome Haplotype Map
1. NIH/TSC/Wellcome Trust funded international collaboration (likely)- follow-on from human sequencing project & SNP consortium
2. Hierarchical strategy- ‘sparse-map’ then more fine- Initially use available SNPs
3. Multiple populations- some family-based, most likely to be unrelateds
4. Aim is to catalog regions of high LD down to very fine-scale (ie., find big and small blocks)
Human Chromosome 22• First human chromosome to be “fully” sequenced
• Extensive knowledge of genomic landscape
• Abundance of SNPs and other variants/bp
~34.5 Mb on q-arm; p-arm mostly structural RNA; 679 genes on qDunham et al, Nature, 1999
Samples
• 7 x 3 generation CEPH families– 77 Individuals– 59 founder chromosomes– 1505 SNPs successfully genotyped
• 90 Unrelated Caucasian Individuals– 1286 SNPs genotyped (1261 overlapping with CEPHs)
• 51 Unrelated Estonian Individuals– 908 SNPs genotyped (594 overlapping with CEPHs)
N = 1505 markers. Median spacing = 15.07kb. 4 gaps > 200 kb. Smallest = 12 bp; largest = 293 kb.
Marker spacing
0
100
200
300
400
500
600
< 5k
b5-
1011
-20
21-3
0
31-4
0
41-5
0
51-6
0
61-7
0
71-8
0
81-9
0
91-1
00
101-
110
111-
120
121-
130
131-
140
> 15
0kb
Spacing bin
Co
un
t
N=1505
Allele frequencies on Chromosome 22Ceph founders
0
0.05
0.1
0.15
0.2
0.25
< 0.10 .11-.20 .21-.30 .31-.40 < 0.50
Category
Fre
qu
ency
0.00
0.20
0.40
0.60
0.80
1.00
0 200 400 600 800 1000
Physical Distance (kb)
D'
0.00
0.20
0.40
0.60
0.80
1.00
0 200 400 600 800 1000
Physical Distance (kb)
r2
D’
r2
Variability in Pairwise LD
Decay of LD on chromosome 22Means in CEPHs, Unrelateds, Combined & Estonian Samples
Representing LD along a chromosome
Following several trends in genetics, genotyping technology outpaced ability to analyze LD information…
How to characterize regions of ‘interesting’ linkage disequilibrium?
1. Simply examine average levels across region/chromosome?2. Fit models to data, look at expectations & specific predictions3. Consider ‘interesting’ LD tracts as long runs of LD – borrow from
extant statistical approaches4. Look for ‘blocks’ of LD in the genome
LD Along Chromosome 22
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25 30
D'
0
200
400
600
0 5 10 15 20 25 30
Position (Mb)
Pre
dic
ted
Hal
f-L
ife
(kb
)Average D’
D’ Half-Life
Disequilibrium Fingerprint
Plus 3 individual blocks:Position SNPs Haplos Length4.6-4.8 M 11 6 231 kb8.2-8.4 M 8 4 264 kb34.3 M 11 3 82 kb
Chromosome 22 Haplotype Blocks
Chr22 High LD: 22-27 Mb
Chr22 Low LD: 27-32 Mb
Recombination Pattern on Chromosome 22
1 Mb/cM
Microsatellite distance
0
10
20
30
40
50
60
0 5 10 15 20 25 30 35
Sequence Position (Mb)
cM
1 Mb/cM
Microsatellite distance
0
10
20
30
40
50
60
0 5 10 15 20 25 30 35
Sequence Position (Mb)
cM
GeneDensity
Recombination and Gene Density on Chromosome 22
Linkage Disequilibrium Map of Chromosome 22 - Summary -
• LD ‘half-length’ ~ 50 kb, but depends on measure & what is “useful” LD
• Family & unrelated samples yield consistent patterns
• Different analytical tools provide complementary views of long blocks
• 15% chromosome 22 in long LD blocks in these samples (40% in shorter blocks) Why? Selection, selective sweeps? Chromosome structure? Popln age?
• LD correlated with gene-density, GC content and related repeats.Gene/GC correlations almost entirely collinear with genetic distance.
LD patterns can immediately assist positional association studies:
Prioritise candidate regions.Use extant genetic maps and simple repeat structures in design & power.
Mapping QTLs in families:Summary
• Linkage and association studies follow directly from fundamental biometrical principles.
• Linkage studies of complex traits can work: All principles of this course apply
- power, study design, careful phenotype selection/modelling, comparison of statistical models
• New information about LD patterns should facilitate association studies
- help form a priori hypotheses and guide replication.
16th Annual Course on Methodology for Twins and FamiliesAdvanced workshop: Boulder, Colorado, March 2003
Monday, 5 March 2001
Eaves 9:00-10:30 Introduction: Cause of human variation
Amos & Heath 11:00-12:00 Basic Statistics: Likelihood models
Lessem 12:00-12:30 Introduction: Computer System P
Eaves & Sham 13:30-15:00 Genetic Theory
Neale, Martin & Boomsma 15:30-17:00 MX practical P
Tuesday, 6 March 2001
Sham 9:00-10:30 Linkage: Basic Principles
Abecasis, Cherny & Cardon 11:00-12:30 IBD estimation: Theory and Practice P
Martin & Maes 13:30-15:00 QTL Linkage Analysis in Sibships P
Eaves 15:30-17:00 Introduction to Bayesian Methods P
Wednesday, 7 March 2001
Neale & Heath 9:00-10:30 Linkage on Selected Samples P
Purcell & Sham 11:00-12:30 Power Calculation in Linkage Analysis P
Boomsma & van Baal 13:30-15:00 Multivariate Applications P
Purcell & Sham 15:30-17:00 Epistasis/Multi-locus modelling P
Thursday, 8 March 2001
Rice & Heath 9:00-10:30 Association Study Principles P
Cherny & Abecasis 11:00-12:30 Family Based Association Studies P
van den Oord 13:30-15:00 Population Stratification and General Association
P
Sham & Abecasis 15:30-17:00 Power for Association Analysis P
Friday, 9 March 2001
Cardon & Sham 9:00-10:30 Bioinformatics and Genome Patterns of Disequilibrium
Rice 11:00-12:30 Multiple Testing: Power and Type I Error
Flint 13:30-15:00 Animal models of complex traits
Cherny, Purcell & Abecasis 15:30-17:00 General computational issues P
http://ibgwww.colorado.edu/twins2001/schedule.html