Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK...
Transcript of Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK...
![Page 1: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/1.jpg)
Haplotypes
02-‐223 Personalized Medicine:
Understanding Your Own Genome Fall 2014
![Page 2: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/2.jpg)
Terminology Review
• Allele: different forms of geneCc variaCons at a given gene or geneCc locus – Locus 1 has two alleles, A and C,
and Locus 2 has two alleles, T and G
• Genotype: specific allelic make-‐up of an individual’s genome – Individual 1 has genotype AA at
Locus 1 and genotype TG at Locus 2
• Heterozygous/Homozygous – Locus 1 of Individual 1 is
homozygous, and Locus 2 is heterozygous
A
A
T
G
Locus 1
Locus 2
A
C
T
T
Locus 1
Locus 2
Individual 1
Individual 2
![Page 3: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/3.jpg)
a diploid individual
Cp
Cm
• SNP: “Binary” nucleotide substitutions at a single locus on a chromosome"• each variant is called an "allele”"
GATCTTCGTACTGAGT GATCTTCGTACTGAGT GATTTTCGTACGGAAT GATTTTCGTACTGAGT GATCTTCGTACTGAAT GATTTTCGTACGGAAT GATTTTCGTACGGAAT GATCTTCGTACTGAAT
Single Nucleo<de Polymorphism (SNP)
![Page 4: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/4.jpg)
a diploid individual
Cp
Cm
chromosome"• Haplotype: a stretch of consecutive nucleotides that lie on the same chromosome"• What are the alleles here? "
GATCTTCGTACTGAGT GATCTTCGTACTGAGT GATTTTCGTACGGAAT GATTTTCGTACTGAGT GATCTTCGTACTGAAT GATTTTCGTACGGAAT GATTTTCGTACGGAAT GATCTTCGTACTGAAT
From SNPs to Haplotypes
![Page 5: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/5.jpg)
C T A
T G A
C G A
T T A
haplotype h≡(h1, h2) possible associations of alleles to
chromosome
Heterozygous diploid individual
C T A
T G A Cp
Cm
Genotype g pairs of alleles with association of alleles to chromosomes unknown
ATGC sequencing
TC TG AA
Haplotypes from SNP Array?
![Page 6: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/6.jpg)
Why Haplotypes?
• Haplotypes have a greater power for discriminaCng genomic regions – Consider J binary markers (e.g., SNPs) in a genomic region – There are 2J possible haplotypes – SNPs have only two alleles, whereas haplotypes have a larger number
of alleles
– Good geneCc marker for populaCon, evoluCon and hereditary diseases
![Page 7: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/7.jpg)
GATCTTCGTACTGAGT GATCTTCGTACTGAGT GATTTTCGTACGGAAT GATTTTCGTACTGAGT GATCTTCGTACTGAAT GATTTTCGTACGGAAT GATTTTCGTACGGAAT GATCTTCGTACTGAAT
CTG 3/8 TGA 3/8 CTA 2/8
Haplotype
chromosome"
Haplotypes and SNPs
• SNPs can disCnguish between two groups of individuals (a group with C, another group with T) • Haplotypes can disCnguish between three groups of individuals (each group with CTG, TGA, and CTA)
![Page 8: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/8.jpg)
GATCTTCGTACTGAGT GATCTTCGTACTGAGT GATTTTCGTACGGAAT GATTTTCGTACTGAGT GATCTTCGTACTGAAT GATTTTCGTACGGAAT GATTTTCGTACGGAAT GATCTTCGTACTGAAT
CTG 3/8 TGA 3/8 CTA 2/8
Haplotype
chromosome"
disease X healthy healthy
Haplotypes and SNPs
• Haplotypes can have a greater power to detect disease-‐related genome region
![Page 9: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/9.jpg)
Inferring Haplotypes from SNP Array Data
• Genotype: AC//AA//TG – Maternal genotype: CA//AA//TT – Paternal genotype: CC//AA//TG – Then the haplotype is AAC/TAG.
• Genotype: AC//AA//TG – Maternal genotype: AC//AA//TG – Paternal genotype: AC//AA//TG – Cannot determine unique haplotype
• Problem: How can we determine haplotypes without parental genotypes
![Page 10: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/10.jpg)
Phasing: Inferring Haplotypes from SNP Data
• Given mulClocus genotypes at a set of SNPs for many individuals, phasing means – Reconstruct haplotypes for all individuals – EsCmate frequencies of all possible haplotypes
• Haplotype reconstrucCon algorithm – Clark’s parsimony algorithm (Clark, Mol. Biol. Evol. 1990)
![Page 11: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/11.jpg)
Genotype representations
0/0 ! 0
1/1 ! 1
0/1 ! 2
Genotypes of 14 individual
21 2 222 02
02 1 111 22
11 0 000 01
02 1 111 22
21 2 222 02
02 1 111 22
11 0 000 01
02 1 111 22
21 2 222 02
22 2 222 21
21 1 222 02
02 1 111 22
22 2 222 21
21 2 222 02
|| | ||| ||
Iden<fiability
![Page 12: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/12.jpg)
01 1 000 00 11 0 000 01 01 1 000 00 00 1 111 11 11 0 000 01 11 0 000 01 01 1 000 00 00 1 111 11 01 1 000 00 11 0 000 01 00 1 111 11 01 1 000 00 11 0 000 01 11 0 000 01 01 1 000 00 00 1 111 11 01 1 000 00 11 0 000 01 00 1 111 11 11 0 000 01 11 0 000 01 01 1 000 00 00 1 111 11 01 1 000 00 11 0 000 01 00 1 111 11 01 1 000 00 11 0 000 01
11 0 000 01
|| | ||| ||
01 1 000 00
|| | ||| ||
00 1 111 11
|| | ||| ||
11"
10"
7"
01 1 111 00 11 0 000 01 01 1 111 00 00 1 111 11 11 0 000 01 11 0 000 01 01 1 111 00 00 1 111 11
01 1 111 00 11 0 000 01 00 1 111 11 01 1 111 00 11 0 000 01 11 0 000 01 01 1 111 00 00 1 111 11 01 1 111 00 11 0 000 01
00 1 111 11 11 0 000 01 11 1 000 01 01 1 111 00 00 1 111 11 01 1 111 00 11 0 000 01 00 1 111 11 01 1 111 00 11 0 000 01
11 0 000 01 || | ||| ||
11 0 010 01 || | ||| ||
11 1 000 01 || | ||| ||
11 0 000 11 || | ||| ||
01 1 111 00 || | ||| ||
01 1 101 00 || | ||| ||
01 0 111 00 || | ||| ||
00 1 111 11 || | ||| ||
00 1 111 01 || | ||| ||
8"
1"
1"
1"
8"
1"
1"
6"
1"
01 1 101 00 11 0 010 01 01 1 111 00 00 1 111 11 11 0 000 01 11 0 000 01 01 1 111 00 00 1 111 11
01 0 111 00 11 1 000 01 00 1 111 11 01 1 111 00 11 0 000 01 11 0 000 01 01 1 111 00 00 1 111 11 01 1 111 00 11 0 000 01
00 1 111 11 11 0 000 01 11 1 000 01 01 1 111 00 00 1 111 11 01 1 111 00 11 0 000 11 00 1 111 01 01 1 111 00 11 0 000 01
Parsimonious solution"
Iden<fiability
![Page 13: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/13.jpg)
Haplotype Reconstruc<on Algorithm by Clark (1990)
• Choose individuals that are homozygous at every locus (e.g. TT//AA//CC) – Haplotype: TAC
• Choose individuals that are heterozygous at just one locus (e.g. TT//AA//CG) – Haplotypes: TAC or TAG
• Tally the resulCng known haplotypes. • For each known haplotype, look at all remaining unresolved cases: is there
a combinaCon to make this haplotype? – Known haplotype: TAC
• Unresolved pa`ern: AT//AA//CG • Inferred haplotype: TAC/AAG. Add to list.
– Known haplotype: TAC and TAG • Unresolved pa`ern: AT//AA//CG • Inferred haplotypes: TAC and TAG. Add both to list.
• ConCnue unCl all haplotypes have been recovered or no new haplotypes can be found this way.
![Page 14: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/14.jpg)
Problems: Clark (1990)
• Many unresolved haplotypes at the end
• Ignores recombinaCon – Error in haplotype inference if a crossover of two actual haplotypes is
idenCcal to another true haplotype
– Frequency of such errors depends on recombinaCon rate
• Clark (1990): algorithm "performs well" even with small sample sizes.
![Page 15: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/15.jpg)
RECOMBINATION & LINKAGE DISEQUILIBRIUM
![Page 16: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/16.jpg)
Morgan’s frui.ly data (1909): 2,839 flies
Eye color A: red a: purple Wing length B: normal b: vesCgial
AABB x aabb"
AaBb x aabb"
AaBb Aabb aaBb aabb"Exp 710 710 710 710"Obs 1,339 151 154 1,195"
Morgan’s FruiYly Experiment
![Page 17: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/17.jpg)
“Linked” Genes
• When two genes lies on the same chromosome, they are transmi`ed to offspring in a non independent manner
A a
B b
![Page 18: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/18.jpg)
A A
B B a a
b b ×
F1: A a
B b a a
b b ×
F2:"A a
B b a a
b b A a
b b a a
B b
Recombination has taken place"
Morgan’s Explana<on: Recombina<on
![Page 19: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/19.jpg)
Recombina<on
• Parental types: AaBb, aabb • Recombinants: Aabb, aaBb
– The proporCon of recombinants between the two genes (or characters) is called the recombina*on frac*on between these two genes.
![Page 20: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/20.jpg)
Review: Correla<on
• “GPA” and “TV in hours per week” are negaCvely correlated
How can we quanCfy the level of correlaCon?
Mean 3.02 13.8
![Page 21: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/21.jpg)
Covariance and Correla<on
• Degree of associaCon between two variables x and y • Given observaCons x1, …, xn and y1, …, yn
– Covariance
– CorrelaCon coefficient:
• Falls between -‐1 and +1, with sign indicaCng direcCon of associaCon
(Variance of xi’s) x (n-‐1)
(Variance of yi’s) x (n-‐1)
![Page 22: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/22.jpg)
Correla<on between X1 and X2
X1 X2
![Page 23: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/23.jpg)
Basic Concepts
A B a b
A B
a b
High LD -> No Recombination (r2 = 1) SNP1 “tags” SNP2
A B
A B
A B a b
a b a b
Low LD -> Recombination Many possibilities
A b
A b a B a b A B A B
a B A b
etc…
A B
A B
X
OR
Parent 1 Parent 2
![Page 24: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/24.jpg)
Linkage Disequilibrium (LD)
• LD reflects the relaConship between alleles at different loci.
• Omen, r2 (squared correlaCon coefficient) is used as a measure of LD.
Locus A Locus B
![Page 25: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/25.jpg)
How to Compute r2 on SNP Data
1.0 1.0 0.0
1.0 1.0 0.0
0.0 0.0 1.0
1 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 1
Individu
als
SNP1 SNP2 SNP3
r2=1.0
r2=0.0
R2=0.0 SN
P1 SNP2
SNP3
SNP1 SNP2 SNP3
r2 matrix
![Page 26: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/26.jpg)
Linkage Disequilibrium in SNP Data
• r2 in SNP data from a populaCon of individuals (Black: r2=1, white: r2=0) genome
geno
me
PopulaCon 1
PopulaCon 2
PopulaCon 1
PopulaCon 2
![Page 27: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/27.jpg)
Reducing Genotyping Costs with Tag SNPs
• Nearby SNPs in the genome are in linkage disequilibrium (LD), and thus contain redundant informaCon.
• If we knew which SNPs are in LD, we can pre-‐select the representaCve SNPs for each LD block of chromosome, and genotype only for those SNPs.
Genome
These two SNPs are in high LD and thus are redundant
r2 values (black: r2=1, white: r2=0)
![Page 28: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/28.jpg)
Reducing Genotyping Costs with Tag SNPs
• Two-‐stage data collecCon process – Stage 1:
• Collect genotype data for a dense set of SNPs for mulCple individuals
• Select a non-‐redundant set of tag SNPs by examining the LD pa`ern
– Stage 2: • Collect genotype data only for the tagSNPs for a large number of individuals
![Page 29: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/29.jpg)
Algorithm for Selec<ng Tag SNPs
• Greedy algorithm
Genome
Randomly select a tag SNP Genome
Find the SNPs with a high LD with the previously selected tag SNP (r2>0.8) and remove those SNPs from the set of candidate tag SNPs
Iterate unCl the set of candidate tag SNPs is empty
![Page 30: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/30.jpg)
Recombina<on and Haplotypes
• Remember Clark’s method does not take into account recombinaCon
• How can we find haplotypes from SNP data collected for a populaCon of individuals under recombinaCon? – Assume haplotypes of ancestor chromosomes and treat modern
individuals’ chromosomes as a mosaic of ancestor chromosomes
– However, ancestor chromosomes cannot be observed!
• Key idea: – Haplotype of each individual is a mosaic of other individuals’ haplotypes
– unresolved haplotypes are similar to known haplotypes
![Page 31: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/31.jpg)
Recombina<on and Haplotypes
• h1, h2, h3: unobserved ancestral haplotypes – we have no SNP data
• h4A, h4B: unobserved haplotypes for modern individuals – Haplotypes are unobserved, however, we have SNP data
• Circles: mutaCons
ATCGAAATTTTAAACGTTACGTGATAAAAGTATTACTGAAAAAATTACTAGATAAGATCGATAAATC
ATCGAAATTTTATTCTTTATGCGATAAAAGTATTACTGACTGACATTACTAGATAAGATCGATAAATC
Mosaic of ancestor chromosomes
![Page 32: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/32.jpg)
PHASE Model as an HMM
• Inferring the unobserved state labels for each of the observed SNP amounts to haplotype reconstrucCon
ATCGAAATTTTAAACGTTACGTGATAAAAGTATTACTGAAAAAATTACTAGATAAGATCGATAAATC
ATCGAAATTTTATTCTTTATGCGATAAAAGTATTACTGACTGACATTACTAGATAAGATCGATAAATC
h3h3h3h3h3h3h3h3h3h3h3h3h2h2h2h2h2h2h2………
h3h3h3h3h3h1h1h1h1h2h2h2h2h2h2h2h2h2h3h3h3….
![Page 33: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/33.jpg)
• States: h1, h2, h3, unobserved ancestral haplotypes
• TransiCon probabiliCes (from SNP Xl to Xl+1) are dependent on – distance between adjacent SNPs dl – RecombinaCon rate between adjacent SNPs ρl
• Emission probabiliCes: mutaCon model
• Task: infer hidden state labels for each locus of each individual (h4A, h4B)
PHASE Model as an HMM
h1
h3 h2
State space with possible transiCons
![Page 34: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/34.jpg)
INTERNATIONAL HAPMAP PROJECT (HAPMAP.ORG)
![Page 35: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/35.jpg)
HapMap Phase 3 Samples
label population sample # samples QC+ Draft 1ASW* African ancestry in Southwest USA 90 71
CEU* Utah residents with Northern and Western European ancestry from the CEPH collection 180 162
CHB Han Chinese in Beijing, China 90 82CHD Chinese in Metropolitan Denver, Colorado 100 70GIH Gujarati Indians in Houston, Texas 100 83JPT Japanese in Tokyo, Japan 91 82LWK Luhya in Webuye, Kenya 100 83MEX* Mexican ancestry in Los Angeles, California 90 71MKK* Maasai in Kinyawa, Kenya 180 171TSI Toscans in Italy 100 77YRI* Yoruba in Ibadan, Nigeria 180 163
1,301 1,115
* Population is made of family trios
![Page 36: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/36.jpg)
Haplotype Structure and Recombina<on Rate Es<mates: HapMap I vs. HapMap II
![Page 37: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/37.jpg)
HapMap: Allele Frequencies in Different Popula<ons
• Comparison of allele frequencies for individuals from pairs of populaCons
• The red regions show that there are many SNPs that have similar low frequencies in each pair of analysis panels/populaCons.
• CHB (Chinese) and JPT (Japanese) have similar allele frequencies
![Page 38: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/38.jpg)
Why Haplotypes?
• Haplotypes have a greater power for discriminaCng genomic regions – Consider J binary markers (e.g., SNPs) in a genomic region – There are 2J possible haplotypes
• but in fact, far fewer are seen in human popula<on
– SNPs have only two alleles, whereas haplotypes have a larger number of alleles
– Good geneCc marker for populaCon, evoluCon and hereditary diseases
![Page 39: Haplotypessssykim/teaching/f14/slides/haplotypes.pdf · JPT Japanese in Tokyo, Japan 91 82 LWK Luhya in Webuye, Kenya 100 83 MEX* Mexican ancestry in Los Angeles, California 90 71](https://reader034.fdocuments.us/reader034/viewer/2022050610/5fb19955df018524cb6be8fb/html5/thumbnails/39.jpg)
Summary
• Haplotype: a set of geneCc markers that lie on the same chromosome
• How can we find haplotypes from SNPs?
• RecombinaCon, linkage disequilibrium, and how to take advantage of them – Haplotypes as a set of linked SNPs with a greater discriminaCve power – Tag SNPs for saving the genotyping cost
• HapMap Project