010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics...

Post on 04-Jan-2016

216 views 2 download

Transcript of 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics...

010101100010010100001010101010011011100110001100101000100101

Introduction: Human Population Genomics

ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

• Cost• Killer apps• Roadblocks?

How soon will we all be sequenced?

Time

2013?2018?

Cost

Applications

The Hominid Lineage

Human population migrations

• Out of Africa, Replacement– Single mother of all humans (Eve)

~150,000yr– Single father of all humans (Adam)

~70,000yr– Humans out of Africa ~50000 years

ago replaced others (e.g., Neandertals)

• Multiregional Evolution– Generally debunked, however,– ~5% of human genome in Europeans,

Asians is Neanderthal, Denisova

Coalescence

Y-chromosome coalescence

Why humans are so similar

A small population that interbred reduced the genetic variation

Out of Africa ~ 50,000 years ago

Out of Africa

Migration of Humans

Migration of Humans

http://info.med.yale.edu/genetics/kkidd/point.html

Migration of Humans

http://info.med.yale.edu/genetics/kkidd/point.html

Some Key Definitions

Mary: AGCCCGTACGJohn: AGCCCGTACGJosh: AGCCCGTACGKate: AGCCCGTACGPete: AGCCCGTACGAnne: AGCCCGTACGMimi: AGCCCGTACGMike: AGCCCTTACGOlga: AGCCCTTACGTony: AGCCCTTACG

Alleles: G, T

Major Allele: GMinor Allele: T

Heterozygosity:Prob[2 alleles picked at random with replacement are different]

2*.75*.25 = .375

H = 4Nu/(1+4Nu)

G/GG/GG/TG/GG/GG/GG/GT/TT/GT/G

Recombinations:At least 1/chromosomeOn average ~1/100 Mb

Linkage Disequilibrium:The degree of correlation between two SNP locations

Mom Dad

Human Genome Variation

SNPTGCTGAGATGCCGAGA

Novel SequenceTGCTCGGAGATGC - - - GAGA

InversionMobile Element orPseudogene Insertion

Translocation Tandem Duplication

MicrodeletionTGC - - AGATGCCGAGA

Transposition

Large DeletionNovel Sequenceat Breakpoint

TGC

The Fall in Heterozygosity

H – HPOP

FST = ------------- H

The HapMap Project

ASW African ancestry in Southwest USA 90CEU Northern and Western Europeans (Utah) 180CHB Han Chinese in Beijing, China 90CHD Chinese in Metropolitan Denver 100GIH Gujarati Indians in Houston, Texas 100JPT Japanese in Tokyo, Japan 91LWK Luhya in Webuye, Kenya 100MXL Mexican ancestry in Los Angeles 90MKK Maasai in Kinyawa, Kenya 180TSI Toscani in Italia 100YRI Yoruba in Ibadan, Nigeria 100

Genotyping:Probe a limited number (~1M) of known highly variable positions of the human genome

Linkage Disequilibrium & Haplotype Blocks

pA pG

Linkage Disequilibrium (LD):

D = P(A and G) - pApG

Minor allele: A G

Population Sequencing – 1000 Genomes Project

The 1000 Genomes Project Consortium et al. Nature 467, 1061-1173 (2010) doi:10.1038/nature09534

Association Studies

Control

Disease

A/GA/GG/GG/GA/GG/GG/G

A/AA/GA/AA/GA/GA/AA/A

AA 0 4

AG 3 3

GG 4 0

p-value

Wellcome Trust Case Control

Nature 447, 661-678(7 June 2007) Nature 464, 713-720(1 April 2010)

Many associations of small effect sizes (<1.5)

Disease ClusteringDisease Genotyping

Multiple Sclerosis (MS)Illumina chip,

15K non-synon SNPs

Ankylosing Spondylitis (AS)

Autoimmune Thyroid (ATD)

Breast Cancer (BC)

Rheumatoid Arthritis (RA)

Affy 500K array

Bipolar Disorder (BD)

Crohn's Disease (CD)

Coronary Artery (CAD)

Hypertension (HT)

Type 1 Diabetes (T1D)

Type 2 Diabetes (T2D)

Randomization to determine significance

Use results as a distance metric for clustering diseases

Compute disease-disease correlations

PLoS Genet 5(12): e1000792. doi:10.1371/journal.pgen.1000792. 2009. 

Disease Clustering

• RA vs. ATD• RA vs. MS

– No recorded co-occurrence of RA and MS

SNP - Allele Gene Symbol

Genetic Variation Score (GVS)RA

(NARAC) RA AS T1D ATD MS (IMSGC) MS

rs11752919 - C ZSCAN23 -3.48 -3.21 -9.39 1.10 0.70 3.25 2.99

rs3130981 - A CDSN -0.46 -1.00 -9.47 -4.94 0.33 10.00 13.41

rs151719 - G HLA-DMB -6.71 -4.77 -1.08 -13.63 0.34 8.58 17.76

rs10484565 - T TAP2 25.52 8.37 1.34 15.74 -1.36 -0.56 -0.30

rs1264303 - G VARS2 11.51 7.36 18.76 0.89 -1.76 -1.85 -1.75

rs1265048 - C CDSN 6.59 2.97 50.13 6.34 -0.85 -2.39 -4.16

rs2071286 - A NOTCH4 5.30 0.78 6.42 4.04 -0.03 -1.89 -2.45

rs2076530 - G BTNL2 67.49 56.46 14.06 13.58 -6.41 -9.50 -18.52

rs757262 - T TRIM40 14.58 9.11 6.27 1.56 -0.79 -2.05 -7.34

Heritability & Environment

Bienvenu OJ, Davydow DS, & Kendler KS (2011).  Psychological medicine, 41 (1), 33-40 PMID:

Ancestry Inference

?Danish

French

Spanish

Mexican

Global Ancestry Inference

Nature. 2008 November 6; 456(7218): 98–101.

Ancestry Painting

?Danish

French

Spanish

Mexican

Ancestry Painting – Haplotype-based

HAPAA, HAPMIX

HAPAA: Genome Res. 2008. 18: 676-682HAPMIX: PLoS Genet 5(6): e1000519, 2009

Fixation, Positive & Negative Selection

Neutral Drift Positive SelectionNegative Selection

How can we detect negative

selection?

How can we detect positive

selection?

Conservation and Human SNPs

CNSs have fewer SNPs

SNPs have shifted allele frequency spectra

Neutral CNS

How can we detect positive selection?

Ka/Ks ratio:Ratio of nonsynonymous tosynonymous substitutions

Very old, persistent, strong positive selection for a protein that keeps adapting

Examples: immune response, spermatogenesis

Long Haplotypes –iHS test

Less time:• Fewer mutations• Fewer recombinations