Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research...

40
Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center

Transcript of Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research...

Page 1: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Statistical Issuesin Human Genetics

Jonathan L. Haines Ph.D.

Center for Human Genetics Research

Vanderbilt University Medical Center

Page 2: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

COMMON COMPLEX DISEASE

Complex DiseaseComplex Disease

EnvironmenEnvironmenttGenesGenes

Page 3: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

COMMON COMPLEX DISEASE

Complex DiseaseComplex Disease

EnvironmenEnvironmenttGenesGenes

Page 4: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

What Can The Genes Tell Us?

• Give us a better understanding of the underlying biology of the trait in question

• Serve as direct targets for better treatments– Pharmacogenetics– Interventions

• Give us better predictions of who might develop disease

• Give us better predictions of the course of the disease

• Lead to knowledge that can help find a cure or prevention

Page 5: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

•Watson and Crick started it all in 1953 with the description of DNA

•53 Year Anniversary of the paper will be in April.

•Both Won Nobel Prize

Page 6: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.
Page 7: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

The DNA Between Individuals is Identical.All differences are in the 0.1% of DNA that varies.

ACCGTCCAGG

ACCGTGCAGG

It’s hard tobelieve sometimes!

Page 8: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

HUMAN CHROMOSOMES

Page 9: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Single-Nucleotide Polymorphisms (SNPs)One of the most common types of variation

GATCCTGTAGCTGATCCTCTAGCT

Extremely frequent across the genome (~1/400 bp) -> high resolution

Easy to genotype -> high-throughput techniques

G/C

1st Chromosome

2nd Chromosome

GATCCTGTAGCTGATCCTGTAGCT

GATCCTCTAGCTGATCCTCTAGCT

Normal Affected

< Normal

< Disease

Page 10: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

What are We Looking For?What are We Looking For?

Human GenomeHuman Genome ChromosomeChromosome Gene (DNA)Gene (DNA)

EarthEarth CityCity StreetStreet AddressAddress

BandBand

Page 11: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

640 cubic yards

1/100 cubic inch

3,000 MB

1 x 10-6 MB

It really is like finding a needle in a haystack!(and a very BIG haystack, at that)

Page 12: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

The Genome Sequence is not THE answer!

Page 13: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

1. Define Phenotypea. Consistency b. Accuracy

2. Define the Genetic Componenta. Twin Studies b. Adoption Studies c. Family Studies d. Heritability e. Segregation Analysis

3. Define Experimental Design

4. Ascertain Familiesa. Case-Control b. Singleton c. Sib Pairs d. Affected Relative Pairs

5. Collect Dataa. Family Histories b. Clinical Results c. Risk Factors d. DNA Samples

6. Perform Genotype Generationa. Genomic Screen b. Candidate Gene

8. Identify, Test, and Localize Regions of Interest

9. Bioinformatics and Gene Identification

10. Identify Susceptibility Variation(s)

11. Define Interactionsa. Gene-Gene b. Gene-Environment

7. Analyze dataa. Model-dependent

Lod scoreb. Model-independent

sib-pair, relative pair c. Association studies case-control, family-

based

Disease Gene Discovery In Complex Disease

Page 14: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

CLASSES OF HUMANGENETIC DISEASE

• Diseases of Simple Genetic Architecture– Can tell how trait is passed in a family: follows a recognizable

pattern– One gene per family– Often called Mendelian disease– Usually quite rare in population– “Causative” gene

• Diseases of Complex Genetic Architecture– No clear pattern of inheritance– Moderate to strong evidence of being inherited– Common in population: cancer, heart disease, dementia etc.– Involves many genes or genes and environment– “Susceptibility” genes

Page 15: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

CLASSES OF HUMANGENETIC DISEASE

• Diseases of Simple Genetic Architecture– Can tell how trait is passed in a family: follows a recognizable

pattern– One gene per family– Often called Mendelian disease– Usually quite rare in population– “Causative” gene

• Diseases of Complex Genetic ArchitectureDiseases of Complex Genetic Architecture– No clear pattern of inheritanceNo clear pattern of inheritance– Moderate to strong evidence of being inheritedModerate to strong evidence of being inherited– Common in population: cancer, heart disease, dementia etc.Common in population: cancer, heart disease, dementia etc.– Involves many genes or genes and environmentInvolves many genes or genes and environment– ““Susceptibility” genesSusceptibility” genes

Page 16: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Modes of Inheritance

• Autosomal Dominant– Huntington disease

• Autosomal Recessive– Cystic fibrosis

• X-linked– Duchenne muscular dystrophy

• Mitochondrial– Leber Optic atrophy

• Additive– HLA-DR in multiple sclerosis

• Combinations of the above– RP (39 loci), Nonsyndromic

deafness

Page 17: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Linkage Analysis

• Traces the segregation of the trait through a family

• Traces the segregation of the chromosomes through a family

• Statistically measures the correlation of the segregation of the trait with the segregation of the chromosome

Page 18: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

A SAMPLE PEDIGREE

The RED chromosome is key

Page 19: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Measures of LinkageParametric Vs Non-Parametric

• Two major approaches toward linkage analysis• Parametric: Defines a genetic model of the action of the

trait locus (loci). This allows more complete use of the available data (inheritance patterns and phenotype information).– The historical approach towards linkage analysis.

Development driven by need to map simple Mendelian diseases

– Quite powerful when model is correctly defined• Non-Parametric: Uses either a partial genetic model or

no genetic model. Relies on estimates of allele/ haplotype/region sharing across relatives. Makes far fewer assumptions about the action of the underlying trait locus(loci).

Page 20: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Linkage Analysis

• Families– Affected sibpairs– Affected relative pairs– Extended families

• Traits– Qualitative (affected or not)– Quantitative (ordinal, continuous)

• There are numerous different methods that can be applied

• These methods differ dramatically depending on the types of families and traits

Page 21: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Recombination: Nature’s way of making new combinations of genetic variants

A. B. C. D.

A. A diploid cell.B. DNA replication and pairing of homologous chromosomes to form bivalent.C. Chiasma are formed between the chromatids of homologous chromosomesD. Recombination is complete by the end of prophase I.

Page 22: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Linkage Analysis in Humans

• Measure the rate of recombination between two or more loci on a chromosome

• Can be done with any loci, but primary application is to find the location of a trait variant by measuring linkage to known marker variants.

Page 23: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

LOD Score AnalysisThe likelihood ratio as defined by Morton (1955):

L(pedigree| = x) L(pedigree | = 0.50)

where represents the recombination fraction and where 0 x 0.49.

When all meioses are “scorable”, the LR is constructed as:

L.R. = N

NRR

)5.0(

))1((

The LOD score (z) is the log10 (L.R.)

: z() is the lod score at a particular valueof the recombination fraction: z() is the maximum lod score, which occurs at the MLE of the recombinationfraction

Page 24: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

CLASSES OF HUMANGENETIC DISEASE

• Diseases of Simple Genetic ArchitectureDiseases of Simple Genetic Architecture– Can tell how trait is passed in a family: follows a recognizable Can tell how trait is passed in a family: follows a recognizable

patternpattern– One gene per familyOne gene per family– Often called Mendelian diseaseOften called Mendelian disease– Usually quite rare in populationUsually quite rare in population– ““Causative” geneCausative” gene

• Diseases of Complex Genetic Architecture– No clear pattern of inheritance– Moderate to strong evidence of being inherited– Common in population: cancer, heart disease, dementia etc.– Involves many genes or genes and environment– “Susceptibility” genes

Page 25: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Large Families Small FamiliesLinkage Analysis

Association Studies

Family-Based Case-Control

Study Designs

Page 26: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Linkage vs. Association

Linkage AssociationShared within Families Shared across Families

Page 27: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

TESTING CANDIDATE GENES

Disease Normal

5/20 5/20

Gene is not important

Page 28: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

TESTING CANDIDATE GENES

Disease Normal

10/20 5/20

Gene may be important

Page 29: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Two Basic Study Designsfor Association Analysis

• Case-Control

• Advantages– Power

– Ascertainment

• Disadvantages– Sensitivity to assumptions

– Matching

• Family-Based– Parent-child Trio

– Discordant sibpairs

• Advantages– Use existing samples

– Robustness to assumptions

• Disadvantages– Ascertainment

– Power

Page 30: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

METHODS FOR FAMILY-BASED ASSOCIATION STUDIES

– Parent-Child• AFBAC• TDT• HHRR• QTDT

– Sibpair• S-TDT• DAT

• Sibship– SDT

– WSDT

– FBAT

• Pedigree– Transmit

– PDT

– FBAT

Page 31: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

TRANSMISSION DISEQUILIBRIUM TEST (TDT)

• Examines transmission of alleles to affected individuals• Requires:

– Linkage (transmission through meioses); and– Association (specific alleles)

• Test of linkage if association assumed• Test of association if linkage assumed• Test of linkage AND association if neither assumed• Uses the non-transmitted alleles, effectively, as the

control group. Can make “pseudocontrol” by creating genotype of the two non-transmitted alleles

• Requires phenotype only for the child

Page 32: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

TDT calculation

A B

C D

Transmitted

Non

-Tra

nsm

itte

d

12 12

11

1 2

21

(B-C)2

TDT= (B+C)

With > 5 per cell, this followsa 2 distribution with 1 df

Page 33: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

12 12

11

TDT

Transmitted

1 2

Not transmitted 1 0 0

2 2 0

Page 34: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

22 12

12

TDT

Transmitted

1 2

Not transmitted 1 0 0

2 1 1

Page 35: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

22 11

12

TDT

Transmitted

1 2

Not transmitted 1 1 0

2 0 1

Page 36: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

TDT Example

A B

C D

Transmitted

Non

-Tra

nsm

itte

d 1 2

21

(B-C)2

TDT= (B+C)

25 42

25 42

Transmitted

Non

-Tra

nsm

itte

d 1 2

21

(42-25)2

TDT= (42+25) = 4.31

Page 37: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Two Basic Study Designsfor Association Analysis

• Case-Control

• Advantages– Power

– Ascertainment

• Disadvantages– Sensitivity to assumptions

– Matching

• Family-BasedFamily-Based– Parent-child TrioParent-child Trio

– Discordant sibpairsDiscordant sibpairs

• AdvantagesAdvantages– Use existing samplesUse existing samples

– Robustness to Robustness to assumptionsassumptions

• DisadvantagesDisadvantages– AscertainmentAscertainment

– PowerPower

Page 38: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Analysis of Case-Control Data

• Standard epidemiological approaches can be used

• Qualitative trait– Logistic regression

• Quantitative trait– Linear regression

• The usual concerns about matching but must also worry about false-positives from population substructure

Page 39: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

Incorporating Geneticsinto Your Studies

• Obtain appropriate IRB approval– DNA studies are quite common– Template language exists for IRB approval and consent forms– Genetic Studies Ascertainment Core (GSAC) can help– Kelly Taylor: [email protected]

• Collect family history information• Obtain DNA sample

– Venipuncture– Buccal wash/swab– Finger stick

• Extract/Store DNA– DNA Resources Core can help– Cara Sutcliffe: [email protected]

• http://chgr.mc.vanderbilt.edu/

Page 40: Statistical Issues in Human Genetics Jonathan L. Haines Ph.D. Center for Human Genetics Research Vanderbilt University Medical Center.

What Can The Genes Tell Us?

• Give us a better understanding of the underlying biology of the trait in question

• Serve as direct targets for better treatments– Pharmacogenetics– Interventions

• Give us better predictions of who might develop disease

• Give us better predictions of the course of the disease

• Lead to knowledge that can help find a cure or prevention