010101100010010100001010101010011011100110001100101000100101
description
Transcript of 010101100010010100001010101010011011100110001100101000100101
010101100010010100001010101010011011100110001100101000100101
Introduction: Human Population Genomics
ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG
• Cost• Killer apps• Roadblocks?
How soon will we all be sequenced?
Time
2013?2018?
Cost
Applications
The Hominid Lineage
Human population migrations
• Out of Africa, Replacement– Single mother of all humans (Eve)
~150,000yr– Single father of all humans (Adam)
~70,000yr– Humans out of Africa ~50000 years
ago replaced others (e.g., Neandertals)
• Multiregional Evolution– Generally debunked, however,– ~5% of human genome in Europeans,
Asians is Neanderthal, Denisova
Coalescence
Y-chromosome coalescence
Why humans are so similar
A small population that interbred reduced the genetic variation
Out of Africa ~ 50,000 years ago
Out of Africa
Migration of Humans
Migration of Humans
http://info.med.yale.edu/genetics/kkidd/point.html
Migration of Humans
http://info.med.yale.edu/genetics/kkidd/point.html
Some Key DefinitionsMary: AGCCCGTACGJohn: AGCCCGTACGJosh: AGCCCGTACGKate: AGCCCGTACGPete: AGCCCGTACGAnne: AGCCCGTACGMimi: AGCCCGTACGMike: AGCCCTTACGOlga: AGCCCTTACGTony: AGCCCTTACG
Alleles: G, T
Major Allele: GMinor Allele: T
Heterozygosity:Prob[2 alleles picked at random with replacement are different]
2*.75*.25 = .375
H = 4Nu/(1+4Nu)
G/GG/GG/TG/GG/GG/GG/GT/TT/GT/G
Recombinations:At least 1/chromosomeOn average ~1/100 Mb
Linkage Disequilibrium:The degree of correlation between two SNP locations
Mom Dad
Human Genome Variation
SNP TGCTGAGATGCCGAGA Novel Sequence TGCTCGGAGA
TGC - - - GAGA
Inversion Mobile Element orPseudogene Insertion
Translocation Tandem Duplication
Microdeletion TGC - - AGATGCCGAGA Transposition
Large Deletion Novel Sequenceat Breakpoint
TGC
The Fall in Heterozygosity
H – HPOP
FST = ------------- H
The HapMap ProjectASW African ancestry in Southwest USA 90CEU Northern and Western Europeans (Utah) 180CHB Han Chinese in Beijing, China 90CHD Chinese in Metropolitan Denver 100GIH Gujarati Indians in Houston, Texas 100JPT Japanese in Tokyo, Japan 91LWK Luhya in Webuye, Kenya 100MXL Mexican ancestry in Los Angeles 90MKK Maasai in Kinyawa, Kenya 180TSI Toscani in Italia 100YRI Yoruba in Ibadan, Nigeria 100
Genotyping:Probe a limited number (~1M) of known highly variable positions of the human genome
Linkage Disequilibrium & Haplotype Blocks
pA pG
Linkage Disequilibrium (LD):
D = P(A and G) - pApG
Minor allele: A G
Population Sequencing – 1000 Genomes Project
The 1000 Genomes Project Consortium et al. Nature 467, 1061-1173 (2010) doi:10.1038/nature09534
Association Studies
Control
Disease
A/GA/GG/GG/GA/GG/GG/G
A/AA/GA/AA/GA/GA/AA/A
AA 0 4AG 3 3GG 4 0
p-value
Wellcome Trust Case Control
Nature 447, 661-678(7 June 2007) Nature 464, 713-720(1 April 2010)
Many associations of small effect sizes (<1.5)
Disease ClusteringDisease Genotyping
Multiple Sclerosis (MS)Illumina chip,
15K non-synon SNPs
Ankylosing Spondylitis (AS)
Autoimmune Thyroid (ATD)
Breast Cancer (BC)
Rheumatoid Arthritis (RA)
Affy 500K array
Bipolar Disorder (BD)
Crohn's Disease (CD)
Coronary Artery (CAD)
Hypertension (HT)
Type 1 Diabetes (T1D)
Type 2 Diabetes (T2D)
Randomization to determine significance
Use results as a distance metric for clustering diseases
Compute disease-disease correlations
PLoS Genet 5(12): e1000792. doi:10.1371/journal.pgen.1000792. 2009.
Disease Clustering
• RA vs. ATD• RA vs. MS
– No recorded co-occurrence of RA and MS
SNP - Allele Gene Symbol
Genetic Variation Score (GVS)RA
(NARAC) RA AS T1D ATD MS (IMSGC) MS
rs11752919 - C ZSCAN23 -3.48 -3.21 -9.39 1.10 0.70 3.25 2.99
rs3130981 - A CDSN -0.46 -1.00 -9.47 -4.94 0.33 10.00 13.41
rs151719 - G HLA-DMB -6.71 -4.77 -1.08 -13.63 0.34 8.58 17.76
rs10484565 - T TAP2 25.52 8.37 1.34 15.74 -1.36 -0.56 -0.30
rs1264303 - G VARS2 11.51 7.36 18.76 0.89 -1.76 -1.85 -1.75
rs1265048 - C CDSN 6.59 2.97 50.13 6.34 -0.85 -2.39 -4.16
rs2071286 - A NOTCH4 5.30 0.78 6.42 4.04 -0.03 -1.89 -2.45
rs2076530 - G BTNL2 67.49 56.46 14.06 13.58 -6.41 -9.50 -18.52
rs757262 - T TRIM40 14.58 9.11 6.27 1.56 -0.79 -2.05 -7.34
Ancestry Inference
?Danish
French
Spanish
Mexican
Global Ancestry Inference
Fixation, Positive & Negative Selection
Neutral Drift Positive SelectionNegative Selection
How can we detect negative
selection?
How can we detect positive
selection?
Conservation and Human SNPs
CNSs have fewer SNPs
SNPs have shifted allele frequency spectra
Neutral CNS
How can we detect positive selection?
Ka/Ks ratio:Ratio of nonsynonymous tosynonymous substitutions
Very old, persistent, strong positive selection for a protein that keeps adapting
Examples: immune response, spermatogenesis
How can we detect positive selection?
Long Haplotypes –iHS test
Less time:• Fewer mutations• Fewer recombinations