Multivariate analysis of genetic data: an...
Transcript of Multivariate analysis of genetic data: an...
![Page 1: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/1.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis of genetic data:an introduction
Thibaut Jombart
MRC Centre for Outbreak Analysis and ModellingImperial College London
XXIV Simposio Internacional De EstadısticaBogota, 25th July 2014
1/34
![Page 2: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/2.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genetic data
Genetic diversity of pathogen populations
2/34
![Page 3: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/3.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genetic data
Genetic diversity of pathogen populations
3/34
![Page 4: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/4.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate data: some examples
Association between individuals? Correlations between variables?
4/34
![Page 5: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/5.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate data: some examples
Association between individuals? Correlations between variables?
4/34
![Page 6: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/6.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/34
![Page 7: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/7.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/34
![Page 8: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/8.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/34
![Page 9: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/9.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/34
![Page 10: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/10.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis: an overview
Multivariate analysis, a.k.a:
• “dimension reduction techniques”
• “ordinations in reduced space”
• “factorial methods”
Purposes:
• summarize diversity amongst observations
• summarize correlations between variables
6/34
![Page 11: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/11.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis: an overview
Multivariate analysis, a.k.a:
• “dimension reduction techniques”
• “ordinations in reduced space”
• “factorial methods”
Purposes:
• summarize diversity amongst observations
• summarize correlations between variables
6/34
![Page 12: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/12.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/34
![Page 13: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/13.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/34
![Page 14: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/14.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/34
![Page 15: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/15.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/34
![Page 16: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/16.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/34
![Page 17: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/17.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
1 dimension, 2 dimensions, P dimensions
Need to find most informative directions in a P -dimensional space.
8/34
![Page 18: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/18.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
1 dimension, 2 dimensions, P dimensions
Need to find most informative directions in a P -dimensional space.
8/34
![Page 19: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/19.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
1 dimension, 2 dimensions, P dimensions
Need to find most informative directions in a P -dimensional space.
8/34
![Page 20: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/20.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• Q ∈ RP×P metric in RP ; D ∈ RN×N metric in RN
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis (‖u‖2Q = 1)
• v ∈ RN ; v = XQu: principal component
→ find u so that ‖v‖2D is maximum.
9/34
![Page 21: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/21.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• Q ∈ RP×P metric in RP ; D ∈ RN×N metric in RN
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis (‖u‖2Q = 1)
• v ∈ RN ; v = XQu: principal component
→ find u so that ‖v‖2D is maximum.
9/34
![Page 22: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/22.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• Q ∈ RP×P metric in RP ; D ∈ RN×N metric in RN
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis (‖u‖2Q = 1)
• v ∈ RN ; v = XQu: principal component
→ find u so that ‖v‖2D is maximum.
9/34
![Page 23: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/23.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• Q ∈ RP×P metric in RP ; D ∈ RN×N metric in RN
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis (‖u‖2Q = 1)
• v ∈ RN ; v = XQu: principal component
→ find u so that ‖v‖2D is maximum.
9/34
![Page 24: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/24.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (i.e., 〈u1,u2〉Q = 0)→ find u2 so that ‖v2‖2D is maximum
10/34
![Page 25: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/25.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (i.e., 〈u1,u2〉Q = 0)→ find u2 so that ‖v2‖2D is maximum
10/34
![Page 26: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/26.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (i.e., 〈u1,u2〉Q = 0)→ find u2 so that ‖v2‖2D is maximum
10/34
![Page 27: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/27.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (i.e., 〈u1,u2〉Q = 0)→ find u2 so that ‖v2‖2D is maximum
10/34
![Page 28: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/28.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
How do we do this?
Things that don’t change:
• take ui the i -th eigenvector of the Q-symmetric matrixXTDXQ
• (alternatively) take vi the i -th eigenvector of theD-symmetric matrix XQXTD
Things that change:
• pre-transformations of X (recoding, standardisation, etc.)
• metrics Q and D (implicitely distances in RP and RN )
• most usual analyses are defined by (X,Q,D)
11/34
![Page 29: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/29.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
How do we do this?
Things that don’t change:
• take ui the i -th eigenvector of the Q-symmetric matrixXTDXQ
• (alternatively) take vi the i -th eigenvector of theD-symmetric matrix XQXTD
Things that change:
• pre-transformations of X (recoding, standardisation, etc.)
• metrics Q and D (implicitely distances in RP and RN )
• most usual analyses are defined by (X,Q,D)
11/34
![Page 30: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/30.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
How do we do this?
Things that don’t change:
• take ui the i -th eigenvector of the Q-symmetric matrixXTDXQ
• (alternatively) take vi the i -th eigenvector of theD-symmetric matrix XQXTD
Things that change:
• pre-transformations of X (recoding, standardisation, etc.)
• metrics Q and D (implicitely distances in RP and RN )
• most usual analyses are defined by (X,Q,D)�� ��packages: ade4, vegan
11/34
![Page 31: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/31.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
How many principal components to retain?
Choice based on “screeplot”: barplot of eigenvalues
Retain only “significant” structures... but not trivial ones.
12/34
![Page 32: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/32.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Outputs of multivariate analyses: an overview
Main outputs:
• principal components: diversity amongst individuals
• principal axes: nature of the structures
• eigenvalues: magnitude of structures13/34
![Page 33: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/33.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Outputs of multivariate analyses: an overview
Main outputs:
• principal components: diversity amongst individuals
• principal axes: nature of the structures
• eigenvalues: magnitude of structures13/34
![Page 34: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/34.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Outputs of multivariate analyses: an overview
Main outputs:
• principal components: diversity amongst individuals
• principal axes: nature of the structures
• eigenvalues: magnitude of structures13/34
![Page 35: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/35.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Usual summary of an analysis: the biplot
Biplot: principal components (points) + loadings (arrows)
• groups of individuals
• structuring variables (longest arrows)
• magnitude of the structures
14/34
![Page 36: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/36.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
15/34
![Page 37: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/37.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
15/34
![Page 38: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/38.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
15/34
![Page 39: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/39.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
15/34
![Page 40: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/40.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genetic data
Genetic diversity of pathogen populations
16/34
![Page 41: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/41.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 42: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/42.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 43: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/43.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 44: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/44.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 45: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/45.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 46: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/46.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 47: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/47.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 48: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/48.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 49: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/49.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
17/34
![Page 50: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/50.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
• more generally, most genetic data can be treated asfrequencies
⇒ Multivariate analysis use to summarize genetic diversity.
18/34
![Page 51: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/51.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
• more generally, most genetic data can be treated asfrequencies
⇒ Multivariate analysis use to summarize genetic diversity.
18/34
![Page 52: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/52.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
• more generally, most genetic data can be treated asfrequencies
⇒ Multivariate analysis use to summarize genetic diversity.
18/34
![Page 53: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/53.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
• more generally, most genetic data can be treated asfrequencies
⇒ Multivariate analysis use to summarize genetic diversity.
18/34
![Page 54: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/54.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
First application of multivariate analysis in genetics
PCA of genetic data, native human populations (Cavalli-Sforza 1966, Proc B)
First 2 principal components separate populations into continents.
19/34
![Page 55: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/55.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
First application of multivariate analysis in genetics
PCA of genetic data, native human populations (Cavalli-Sforza 1966, Proc B)
First 2 principal components separate populations into continents.
19/34
![Page 56: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/56.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Applications: some examples
PCA of genetic data + colored maps of principal components(Cavalli-Sforza et al. 1993, Science)
Signatures of Human expansion out-of-Africa.
20/34
![Page 57: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/57.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Since then...
Multivariate methods used in genetics
• Principal Component Analysis (PCA)
• Principal Coordinates Analysis (PCoA) / Metric MultidimensionalScaling (MDS)
• Correspondance Analysis (CA)
• Discriminant Analysis (DA)
• Canonical Correlation Analysis (CCA)
• ...
21/34
![Page 58: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/58.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Since then...
Multivariate methods used in genetics
• Principal Component Analysis (PCA)
• Principal Coordinates Analysis (PCoA) / Metric MultidimensionalScaling (MDS)
• Correspondance Analysis (CA)
• Discriminant Analysis (DA)
• Canonical Correlation Analysis (CCA)
• ...�� ��packages: adegenet, ade4, pegas
21/34
![Page 59: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/59.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Since then...
Applications
• reveal spatial structures (historical spread)
• explore genetic diversity
• identify cryptic species
• discover genotype-phenotype association
• ...
• review in Jombart et al. 2009, Heredity 102: 330-341
Applications in genetics of pathogen populations.
22/34
![Page 60: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/60.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Since then...
Applications
• reveal spatial structures (historical spread)
• explore genetic diversity
• identify cryptic species
• discover genotype-phenotype association
• ...
• review in Jombart et al. 2009, Heredity 102: 330-341
Applications in genetics of pathogen populations.
22/34
![Page 61: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/61.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genetic data
Genetic diversity of pathogen populations
23/34
![Page 62: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/62.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
24/34
![Page 63: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/63.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
24/34
![Page 64: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/64.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
24/34
![Page 65: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/65.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
24/34
![Page 66: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/66.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Different questions at different scales
Where and how can multivariate analysis of pathogen genetic databe useful?
25/34
![Page 67: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/67.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Different questions at different scales
Where and how can multivariate analysis of pathogen genetic databe useful?
25/34
![Page 68: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/68.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
26/34
![Page 69: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/69.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
26/34
![Page 70: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/70.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
26/34
![Page 71: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/71.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
26/34
![Page 72: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/72.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
26/34
![Page 73: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/73.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Genetic clustering using K-means & BIC(Jombart et al. 2010, BMC Genetics)
Variance partitioning model (ANOVA):
tot . variance = (bet . groups) + (wit . groups)
Performances:
• K-means ≥ STRUCTURE on simulated data (various island andstepping stone models)
• orders of magnitude faster (seconds vs hours/days)
27/34
![Page 74: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/74.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Genetic clustering using K-means & BIC(Jombart et al. 2010, BMC Genetics)
Variance partitioning model (ANOVA):
tot . variance = (bet . groups) + (wit . groups)
Performances:
• K-means ≥ STRUCTURE on simulated data (various island andstepping stone models)
• orders of magnitude faster (seconds vs hours/days)
27/34
![Page 75: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/75.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Genetic clustering using K-means & BIC(Jombart et al. 2010, BMC Genetics)
Variance partitioning model (ANOVA):
tot . variance = (bet . groups) + (wit . groups)
Performances:
• K-means ≥ STRUCTURE on simulated data (various island andstepping stone models)
• orders of magnitude faster (seconds vs hours/days)�� ��package: adegenet, function find.clusters
27/34
![Page 76: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/76.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
PCA of seasonal influenza (A/H3N2) data
Data: seasonal influenza (A/H3N2), 500 HA segments.
Little temporal evolution, burst of diversity in 2002??
28/34
![Page 77: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/77.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
PCA of seasonal influenza (A/H3N2) data
Data: seasonal influenza (A/H3N2), 500 HA segments.
Little temporal evolution, burst of diversity in 2002??
28/34
![Page 78: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/78.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Which diversity to represent?Total diversity not relevant to analyse clusters.
Discriminant Analysis of Principal Components (DAPC):(Jombart et al. 2010, BMC Genetics)
• maximizes group discrimination (“between/within” ratio)
• provides group membership probabilities (prediction possible)
• as computer-efficient as PCA
29/34
![Page 79: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/79.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Which diversity to represent?Total diversity not relevant to analyse clusters.
Discriminant Analysis of Principal Components (DAPC):(Jombart et al. 2010, BMC Genetics)
• maximizes group discrimination (“between/within” ratio)
• provides group membership probabilities (prediction possible)
• as computer-efficient as PCA
29/34
![Page 80: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/80.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Which diversity to represent?Total diversity not relevant to analyse clusters.
Discriminant Analysis of Principal Components (DAPC):(Jombart et al. 2010, BMC Genetics)
• maximizes group discrimination (“between/within” ratio)
• provides group membership probabilities (prediction possible)
• as computer-efficient as PCA�� ��package: adegenet, function dapc29/34
![Page 81: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/81.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DAPC of seasonal influenza (A/H3N2) data
Strong temporal signal, originality of 2006 isolates (new alleles).
30/34
![Page 82: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/82.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DAPC of seasonal influenza (A/H3N2) data
Strong temporal signal, originality of 2006 isolates (new alleles).
30/34
![Page 83: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/83.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Identifying antigenic clusters in influenza (A/H3N2)
Antigenic clusters identified directly from AA sequences.31/34
![Page 84: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/84.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Identifying antigenic clusters in influenza (A/H3N2)
Antigenic clusters identified directly from AA sequences.31/34
![Page 85: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/85.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DAPC to identify structuring alleles
DAPC finds combinations of alleles most differing between groups.
Simulated data:(Jombart & Ahmed 2011, Bioinformatics)
• 2 clusters, 50 isolates each
• 1,000,000 non structured SNPs
• 1,000 structured SNPs(i.e. different frequencies betweengroups)
Possible applications to pathogen GWAS (e.g. SNPs related toantibiotic resistance in bacteria).
32/34
![Page 86: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/86.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
DAPC to identify structuring alleles
DAPC finds combinations of alleles most differing between groups.
Simulated data:(Jombart & Ahmed 2011, Bioinformatics)
• 2 clusters, 50 isolates each
• 1,000,000 non structured SNPs
• 1,000 structured SNPs(i.e. different frequencies betweengroups)
Possible applications to pathogen GWAS (e.g. SNPs related toantibiotic resistance in bacteria).
32/34
![Page 87: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/87.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
33/34
![Page 88: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/88.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
33/34
![Page 89: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/89.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
33/34
![Page 90: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/90.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
33/34
![Page 91: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/91.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
33/34
![Page 92: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/92.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific applications to pathogen genetic data
• limits reached when reconstructing fine-scale processes
• more at: http://adegenet.r-forge.r-project.org/
34/34
![Page 93: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/93.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific applications to pathogen genetic data
• limits reached when reconstructing fine-scale processes
• more at: http://adegenet.r-forge.r-project.org/
34/34
![Page 94: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/94.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific applications to pathogen genetic data
• limits reached when reconstructing fine-scale processes
• more at: http://adegenet.r-forge.r-project.org/
34/34
![Page 95: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/95.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific applications to pathogen genetic data
• limits reached when reconstructing fine-scale processes
• more at: http://adegenet.r-forge.r-project.org/
34/34
![Page 96: Multivariate analysis of genetic data: an introductionadegenet.r-forge.r-project.org/files/lectureBogota2014-MVA.1.1.pdf · Multivariate analysis in a nutshellApplications to genetic](https://reader033.fdocuments.us/reader033/viewer/2022053001/5f0588d57e708231d4137110/html5/thumbnails/96.jpg)
Multivariate analysis in a nutshell Applications to genetic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific applications to pathogen genetic data
• limits reached when reconstructing fine-scale processes
• more at: http://adegenet.r-forge.r-project.org/
34/34