The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human...
-
Upload
harold-griffin -
Category
Documents
-
view
218 -
download
0
Transcript of The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human...
![Page 1: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/1.jpg)
The Complexities of Data Analysis in Human Genetics
Marylyn DeRiggi Ritchie, Ph.D.Center for Human Genetics Research
Vanderbilt UniversityNashville, TN
![Page 2: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/2.jpg)
Biology is complex
BioCarta
![Page 3: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/3.jpg)
![Page 4: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/4.jpg)
Single nucleotide polymorphisms (SNPs)
![Page 5: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/5.jpg)
Mendelian Traits
Aa Aa
Aa
BB bb
AA
aa
BB Bb bbAa AA AaBB bb Bb
Locus 1
Locus 2
AABB AABb AAbb
AaBB AaBb Aabb
aaBB aaBb aabb
affected
affected
affected
![Page 6: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/6.jpg)
Complex Traits
Aa Aa
Aa
BB Bb
AA
aa
BB Bb bbaa AA AaBB bb Bb
Locus 1
Locus 2
AABB AABb AAbb
AaBB AaBb Aabb
aaBB aaBb aabb
affected
affected
![Page 7: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/7.jpg)
Complex Traits
• Complex trait implies the involvement of multiple genes and/or environmental factors
• Mendelian trait implies a single mutation
• Mendelian traits are generally rare
• Complex traits are common and of substantial public health impact
![Page 8: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/8.jpg)
Genetic Analysis
• Two main areas of genetic analysis1. Linkage analysis
2. Association analysis
• Methods have been developed for each approach for a variety of different study designs
![Page 9: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/9.jpg)
Association Analysis
• In disease studies, when the disease gene is unknown, we look for association between genetic markers and the disease
• If a marker occurs more frequently or less frequently in affected individuals than in unaffected individuals, then it is associated with the disease.
![Page 10: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/10.jpg)
Association Analysis
• Case-control studies– Test for association between marker alleles
and the disease phenotype in a group of affected and unaffected individuals randomly from the population
• Family-based studies– Test for association between marker alleles
and the disease phenotype in a group of affected individuals and unaffected family members
![Page 11: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/11.jpg)
Case-control data structureStatus SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
1 1 2 2 1 2 1 2 2 1 2
1 0 0 0 1 0 0 0 0 1 0
1 0 2 0 1 1 0 2 0 1 1
1 2 0 1 1 0 2 0 1 1 0
1 2 1 1 0 0 2 1 1 0 0
1 1 0 0 0 0 1 0 0 0 0
1 1 1 0 1 2 1 1 0 1 2
1 1 0 1 0 2 1 0 1 0 2
1 0 0 0 2 0 0 0 0 2 0
1 0 0 1 0 1 0 0 1 0 1
0 2 1 0 1 0 2 1 0 1 0
0 0 1 1 0 0 0 1 1 0 0
0 1 1 0 2 1 1 1 0 2 1
0 0 0 2 0 1 0 0 2 0 1
0 2 1 0 1 1 2 1 0 1 1
0 0 0 2 0 0 0 0 2 0 0
0 1 0 0 1 2 1 0 0 1 2
0 0 1 1 1 2 0 1 1 1 2
0 1 1 0 0 2 1 1 0 0 2
0 0 1 2 0 0 0 1 2 0 0
![Page 12: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/12.jpg)
Association Analysis
• Single marker tests
• Haplotype association
• Epistasis
![Page 13: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/13.jpg)
Single marker tests
SNP1
Disease DiseaseDisease
? ? ?
SNP2 SNP3
![Page 14: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/14.jpg)
Haplotype
![Page 15: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/15.jpg)
Haplotype Analysis
• May be able to increase power by testing for association with marker haplotype
• Haplotype is a block of DNA that stays intact through generations
• Do not directly observe marker haplotypes
• Use likelihood methods to infer
![Page 16: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/16.jpg)
Haplotype Analysis
![Page 17: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/17.jpg)
Epistasis: Gene-Gene InteractionsW. Bateson, Mendel’s Principles of Heredity (1909)
A.R. Templeton, In: Wade et al. (eds), Epistasis and the Evolutionary Process (2000)
• Epistasis first used by William Bateson (1909) • Literal translation is “standing upon” (I.e. one gene
masks the effects of another gene).
Genotype at Locus A
Genotype at Locus B
BB Bb bb
AA White Grey Grey
Aa Black Grey Grey
Aa Black Grey Grey
Cordell, Human Molecular Genetics 11:2463-8 (2002)
![Page 18: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/18.jpg)
Gene-gene Interactions
• Searching for gene-gene interactions brings about a whole new suite of problems and challenges
• Types of interactions– Additive– Multiplicative– Epistatic
• Curse of dimensionality – big problem
![Page 19: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/19.jpg)
Curse of Dimensionality
AA Aa aa
SNP 1
N = 100 50 Cases, 50 Controls
![Page 20: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/20.jpg)
SNP 2
AA Aa aa
BB
Bb
bb
N = 100 50 Cases, 50 Controls
SNP 1
Curse of Dimensionality
![Page 21: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/21.jpg)
N = 100
50 Cases, 50 Controls
AA Aa aaBBBbbb
CC Cc cc
DD
Dd
dd
AA Aa aaAA Aa aa
BBBbbb
BBBbbb
SNP 1 SNP 1 SNP 1
SN
P 2
SN
P 2
SN
P 2
SN
P 4
SNP 3
Curse of Dimensionality
![Page 22: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/22.jpg)
Three Other Issues to Consider
1. Variable selection
2. Model selection
3. Interpretation
![Page 23: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/23.jpg)
1. Variable Selection
• How can you determine which variables to select?
• Not computationally feasible to evaluate all possible combinations
• Need to select correct variables to detect interactions
![Page 24: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/24.jpg)
How many combinations are there?• ~500,000 SNPs span 80% of common variation in genome (HapMap)
SNPs in each subset
1 2 3 4 5
5 x 105
2 x 1016
1 x 1011
3 x 1021
2 x 1026
Num
ber
of P
ossi
ble
Com
bina
tions
![Page 25: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/25.jpg)
How many combinations are there?• ~500,000 SNPs span 80% of common variation in genome (HapMap)
SNPs in each subset
1 2 3 4 5
5 x 105
2 x 1016
1 x 1011
3 x 1021
2 x 1026
Num
ber
of P
ossi
ble
Com
bina
tions 2 x 1026 combinations
* 1 combination per second
* 86400 seconds per day
---------
2.979536 x 1021 days to complete
(8.163113 x 1018 years)
![Page 26: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/26.jpg)
2. Model Selection
• For each variable subset, evaluate a statistical model
• Goal is to identify the best subset of variables that compose the best model
![Page 27: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/27.jpg)
Finding the best model
Choose variable subset
Choose statistical model
Evaluate model fitness
Best model
![Page 28: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/28.jpg)
Simple Fitness Landscape
Model
Fitn
ess
![Page 29: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/29.jpg)
Complex Fitness LandscapeF
itnes
s
Model
![Page 30: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/30.jpg)
3. Interpretation
• Selection of best statistical model in a vast search space of possible models
• Statistical or computational model may not translate into biology
• May not be able to identify prevention or treatment strategies directly
• Wet lab experiments will be necessary, but may not be sufficient
![Page 31: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/31.jpg)
3. Interpretation
• Strategies to assess biological interpretation of gene-gene interaction models
1. Consider current knowledge about the biochemistry of the system and the biological plausibility of the models
2. Perform experiments in the wet lab to measure the effect of small perturbations to the system
3. Computer simulation algorithms to model biochemical systems
![Page 32: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/32.jpg)
Additional Challenges(true of all association studies)
• Sample size and power/type I error
• Population specific effects– Age, gender
• Poorly matched cases and controls– Ethnic background– Controls must be “at risk”
• Bias
• Heterogeneity
![Page 33: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/33.jpg)
Heterogeneity
• Phenotypic (Clinical, Trait)– Affected individuals vary in clinical expression
• Genetic– Different inheritance patterns for same disease
• Locus– Different genes lead to the same disease
• Allelic– Different alleles at the same gene lead to
same/different disease
Thornton-Wells TA, Moore JH, Haines JL. Trends in Genetics, 2004;20(12):640-7. .
![Page 34: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/34.jpg)
New Statistical Approaches• Data Reduction
– Combinatorial Partitioning Method (CPM)– Multifactor Dimensionality Reduction (MDR)– Detection of informative combined effects (DICE)– Logic Regression– Set Association Analysis
• Pattern Recognition– Symbolic Discriminant Analysis (SDA)– Cellular Automata (CA)– Neural Networks (NN)
![Page 35: The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649ec75503460f94bd32a7/html5/thumbnails/35.jpg)
Areas of Future Work(possible collaborations)
• More analytical methods for gene-gene and gene-environment interactions– Especially including categorical and
continuous variables simultaneously
• Inclusion of pathway information into analyses
• Ways of dealing with heterogeneity of all kinds