Download - Exploring Data Multidimensionally

Transcript
Page 1: Exploring Data Multidimensionally

1-D

Dmitry Grapov1,2, John W. Newman2,3

1Agricultural and Environmental Chemistry, University of California Davis, 2Obesity & Metabolism Research Unit, USDA-ARS Western Human Nutrition Research Center, Davis, CA, 3Nutrition, University of California Davis, Davis, CA,

Exploring Data Multidimensionally

This work was supported by the National Institutes of Health [T32-GM008799, R01DK078328-01] and the United States Department of Agriculture [5306-51530-019-00D].

imDEV: interactive modules for Data Visualization and Exploration (http://sourceforge.net/projects/imdev/)

Type 2 Diabetes Trends

Hypotheses:Environment + Genotype = Disease Phenotype

Unknowns: Markers of Disease Processes

No Data <4.5% 4.5-5.9% 6.0-7.4% 7.5-8.9% >9.0%

1994 2000 2009

Gathering Information:Quantitative Metabolomics

Interpretation:Multivariate Data Analysis and Visualization

Mathematical Dimensions2-D n-D

Connection

Screenshot of the MS Excel embedded imDEV interface utilizing R to generate: A) PCA scores and loadings trellis-plot displaying the first three components with subject scores’ (bottom left) colors annotated by gender and sized relative to individuals CRF values with an outlier highlighted in red, and loadings plots (top right) sized to display variable p-values based on Mann-Whitney U-test for gender. B) Variable distribution and scatter plot matrix used to evaluate the effects of the covariate adjustment. C) Overview plots used to evaluate the gender-adjusted PLS predictive model for CRF performance through comparison of the models Q2 and RMSEP statistics to their respective permuted null distributions. D) Multidimensionally scaled two and three dimensional variable correlation networks used to visualize variable and variable group (polygons) intercorrelations.

Prediction

ProjectionCorrelation

Visual

An

alyt

ical

Dim

ensi

on

s

Complex pathological states often involve systems of perturbations. In order to fully characterize, diagnose and treat these maladies, researchers are turning to ‘Omic’ methods to measure large arrays of genetic and biochemical reporters. Advances in analytical technologies are increasingly shifting the bottleneck for scientific discovery to data management, analysis and interpretation.

The challenge of ‘Big Data’, may be met through applications of multivariate data analysis and visualization, which makes it possible to carry out the simultaneous analysis of many variables. Interactive modules for Data Exploration and Visualization, imDEV: http://sourceforge.net/apps/mediawiki/imdev,

is an open source graphical user interface for multidimensional data analysis and visualization.

Biological