Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor:...

41
Methods in genome wide Methods in genome wide association studies. association studies. Norú Moreno Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou

Transcript of Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor:...

Page 1: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Methods in genome wide Methods in genome wide association studies.association studies.Norú MorenoNorú Moreno

CS374::Algorithms in BiologyProfessor: Serafim Batzoglou

Page 2: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

AgendaAgendaGWA PolymorphismsHap Map ProjectGenotyping chip

Integrating CNVs and SNPs

Imputation

Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays

Page 3: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Genome-wide Association Genome-wide Association Study (GWA study or Study (GWA study or GWAS)GWAS)•Completion of the Human Genome Project in 2003 •Examination of genetic variation across a given genome.• Objective: Identify genetic associations with observable traits

Page 4: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

GWASGWAS

•Scan SNPs across many individuals to associate alleles with a particular disease

•Use a detected association to detect, treat and prevent the disease

•Pharmacogenomics.

Page 5: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

PolymorphismsPolymorphismsA specific sequence variation that some

individuals possess

Some variations are common, others are rare

Examples:

◦ Blood types◦ Height◦ Skin Color◦ Etc…

Page 6: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Types of polymorphismsTypes of polymorphisms1. Copy Number Variation (CNV)

Segment of DNA that are found in different numbers of copies among individuals

Substantial regions, not single nucleotidesA B C

A C

A B CB B

Page 7: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Types of polymorphismsTypes of polymorphisms2. Single Nucleotide

Polymorphism (SNP)

)Murray 2007(

Page 8: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

HapMapHapMapTwo unrelated people share about

99.5% of their DNA sequence.HapMap focuses only on common

SNPs, : 1% of the population

269 individuals, ~4M SNPs

Genotyped the individuals for these SNPs, and published the results

Page 9: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Genotyping chipGenotyping chip

ACTGGGCTAATCGATCGACTAGCTAGCTAGTCTCGATCAAT

AC

TG

GG

CTA

A

TC

GA

TC

GA

CTA

GC

TA

GC

TA

GT

CTC

GA

TC

AA

T

Probes

Page 10: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Genotyping chipGenotyping chip

(Liu 2007) (Affymetrix)

Page 11: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Genotyping chipGenotyping chip

(Affymetrix)

Page 12: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Genotyping chipGenotyping chip

A

B BB(0)

AB(0.5)

AA(1)

Page 13: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Genotyping chipGenotyping chipAffymetrix 100k chip set

◦Entire genome with 100 000 SNPs (low density).

Affymetrix 500k chip (SNP array 5.0) ◦Entire genome with 500 000 SNPs

(high density) Affymetrix 1M chip (SNP array 6.0)

◦Entire genome with 1 000 000 SNPs (very high density)

Page 14: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Integrated genotype Integrated genotype calling and association calling and association analysis of SNPs, common analysis of SNPs, common copy number copy number polymorphisms and rare polymorphisms and rare CNVs (Birdsuite)CNVs (Birdsuite) Korn, et al. Korn, et al.

Page 15: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

BirdsuiteBirdsuiteTake in count CNVs and SNPs :: Raw

data from genotyping chip as input.

Output: integrated CNVs and SNPS genotype per locus

CNVs and SNPs coexist.

Both common and rare to understand the role of genetic variation in disease.

Page 16: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

BirdsuiteBirdsuite

SNPs(AA, AB, CC)

CNPs

New Genotype

A-null

AAAB

BBBB

Page 17: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Birdsuite – 4 Birdsuite – 4 StagesStagesCanary – ‘Genotypes’ common copy-

number polymorphisms (CNPs) Birdseed - Genotypes SNPs using the

classical AA, AB, and BB genotypes.Birdseye - Identify rare CNVs via

HMMsFawkes - Integrates CNV information

to produce mutually consistent SNP genotypes (i.e. including genotypes such as A-null and AAB)

Page 18: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

BirdsuiteBirdsuite - - CanaryCanaryDetermines the copy number of

each individual at each predefined CNP locus.

CNP = Copy number polymorphismCNV>1% frequency in population

Locus Number of copies

A 1

B 3

C 1

A B CB B

Page 19: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

CanaryCanary

(Korn, p.1255)

Page 20: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Birdsuite - Birdsuite - BirdseedBirdseedWe expect only AA, AB or BB.From canary only CNPs with 2

No fewer or extra copies.

BB

AA

AB

(Korn, p.1257)

Page 21: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Birdsuite - Birdsuite - BirdseyeBirdseyeUsing Canary and Birdseed:

◦Identify rare and de novo CNVs◦Small number of real CNVs at

unknown sites.Search consistent evidence for

copy number variation across multiple neighboring probes.

Implement an HMM-based algorithm to find strong, consistent evidence for altered copy number states

Page 22: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Birdsuite - Birdsuite - BirdseyeBirdseyeHMM to find regions of variable

copy number in a sample.Hidden state: The true copy

number of the individual’s genome.

Observed states: The normalized intensity measurements of each probe on the array.

Page 23: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Birdsuite - Birdsuite - FawkesFawkesMerge all the results.

Show the CNVs within each SNP.

Utilize the imputed locations (in A/B intensity space) of copy-variable clusters.

Assign an allele-specific copy number genotype at each SNP.

(e.g. AAB, ABBB, A or B)

Page 24: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

FawkesFawkes

(Korn, p. 1254,1257)

Page 25: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

(Affymetrix website screenshot)

Page 26: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

ImputationImputationDealing with missing data points by

filling in values.In SNPs:T A G G T ? T G C C T A G C G TWhy?- Cost-saving

- Avoid re-genotyping- Keep effective sample size- SNP comparisons between existing

platforms.

Page 27: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

ImputationImputationHigh rate of occurrence.

◦‘Direct’ imputation.

T A G G T ? T G C C T A G C G T

T A G G T A T G C C T A G C G T

Page 28: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Linkage disequilibrium◦Non-random association of alleles at

two or more loci.

ImputationImputation

LD

SNP of interest

Page 29: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Resolving Individuals Resolving Individuals Contributing Trace Contributing Trace Amounts of DNA to Highly Amounts of DNA to Highly Complex Mixtures Using Complex Mixtures Using High-Density SNP High-Density SNP Genotyping MicroarraysGenotyping Microarrays Homer, et al. Homer, et al.

Page 30: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

TheThe DNA Detective DNA Detective

Is an individual genome present in a DNA mixture?

Query Mixed DNA // Population

Page 31: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

DNA DetectiveDNA Detective

We have:Different laboratories > different

conclusions.Usually not accurate at all.Hard and cannot be automatized.

Page 32: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

DNA Detective - DNA Detective - MethodologyMethodologySummary:Cumulative sum of allele shifts over

all available SNPs.

Shift’s sign > individual of interest is closer to a reference sample or closer to a given mixture.

First genotype a single SNP for a single person, then adapt it to all mixtures and pooled data.

Page 33: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

DNA Detective – Single SNP, Single DNA Detective – Single SNP, Single personpersonRaw preprocessed data > allele

instensity (How much of A and how much of B we have).

1.Transform normalized data into a ratio.

Yi is the estimate of allele frequency

BB AB AA

~0 ~0.5 ~1

Page 34: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

DNA DNA Detective - Detective - MethodologyMethodologyUse relative probe intensity

data.Compare allele frequency

estimates from the mixture (M).

Assume reference population (Pop) has similar ancestral components interchangeable.

Page 35: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Distance measure for individual Yi

DNA Detective - DNA Detective - MethodologyMethodology

Page 36: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Null hypotheses, individual is not in the mixture, D(Yi,j) ~ 0

Alternative hypotheses, D(Yi,j) > 0

More similar to M than Pop

D(Yi,j) < 0 Yi,jc is more ancestral similar to

Pop than to M.

DNA Detective - DNA Detective - MethodologyMethodology

Page 37: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

(Homer, p.4)

Page 38: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

DNA Detective - DNA Detective - ResultsResultsAccurate findings.

Determined if a trace amount (<1%) of DNA is present in a DNA mixture.

Tested with different kinds of Mixtures from public available data.

Page 39: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

DNA Detective - DNA Detective - ImplicationsImplicationsForensics application.TraceabilityLeak of privacy information.

◦Public data from many studies. Summary statistics of Allele Frequency.

Political implications.◦How to share the data now?

Page 40: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

Thank You!

Page 41: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou.

ReferencesReferences Korn J, et al. Integrated genotype calling and

association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics. 2008 Oct;40(10): 1253-60

Homer N, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008 Aug 29;4(8):e1000167

Liu Y, DPhil, Prchal F. SNP-Chip-Based Genome-Wide Analysis of Genetic Alterations in Hematologic Disorders: The Way Forward?. The Hematologist. 2007

Murray, E. IST 341 Issues in Human Genetics. http://www.science.marshall.edu/murraye/341/snps/Human%20Genetics%20MTHFR%20SNP%20Page.html