My first 100 Tb of data
description
Transcript of My first 100 Tb of data
![Page 1: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/1.jpg)
My first 100 Tb of data
STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP
Ciprian M. CrainiceanuJohns Hopkins University
http://www.biostat.jhsph.edu/smnt
![Page 2: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/2.jpg)
Members of the group
• Key personnel• C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D.
Ruppert, C.-Z. Di• Senior Students
• V. Zipunnikov, J.-A. Goldsmith• Other statisticians (>20)• Scientific collaborators
• Direct collaboration• Solving important scientific problems• Diverse scientific applications
![Page 3: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/3.jpg)
Scientific Collaborators
• Susan Bassett – fMRI, Alzheimer’s• Danny Reich – DTI, DCE-MRI, MS• Brian Schwartz – lead exposure,
VBM, DTI, white matter imaging• Stewart Mostofsky – fMRI,
rsfcMRI, Autism, ADHD, Turrets• Naresh Punjabi – EEG, sleep,
sleep diseases• Dzung Pham / Pilou Bazin –
Cortical shape, thickness, lesion detection, MS
• Dean Wong – PET, fMRI substance abuse
• Susan Resnick – BLSA• Jerry Prince – BLSA, ADNI
• Jim Pekar, Peter Van Zijl – 7T MRI, fMRI, rsfcMRI preprocessing, scanner physics
• Christos Davatzikos- RAVENS• Susumu Mori – DTI,
tractography• Dana Boatman – ECOG, EEG,
epilepsy• Graham Redgrave – fMRI, DTI,
Huntington’s, anorexia/bulimia• Tudor Badea, Bruno Jednyak –
Neuron classification, morphometry, 3D structure and shape
• Tom Glass – Gizmos• Merck – EEG, neuroimaging• Pfizer – imaging biomarkers?
![Page 4: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/4.jpg)
Observational Studies 2.0
![Page 5: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/5.jpg)
![Page 6: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/6.jpg)
![Page 7: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/7.jpg)
Longitudinal Functional Principal Component Analysis (LFPCA)
• I=1000, J=4, D=100: 15’• I=1000, J=8, D=200: 70’
Greven, Crainiceanu, Caffo, Reich, 2010. LFPCA, EJS, to appear
![Page 8: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/8.jpg)
A simple regression formula
• Data compression via longitudinal PCA• MoM estimators of covariance matrices, smoothing• Need: all covariance operators
• Solution: regress Yij(d)Yik(d’) on 1, Tik, Tij, TikTij, jk
![Page 9: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/9.jpg)
Variance explained (FA, 3 yrs of long. data)
![Page 10: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/10.jpg)
Longitudinal Penalized Functional Regression
![Page 11: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/11.jpg)
LPFR: recipe and ingredients
![Page 12: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/12.jpg)
PASAT/MD (Corp. Call.), PD (Cortic. spinal)
![Page 13: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/13.jpg)
Functional regression
• No paper on longitudinal functional regression• No paper published with this data structure• Longitudinal extensions are not “simple”• Technical details are hard without the correct
“recipe” for known and published “ingredients”• No available method that scales up
Goldsmith, Feder, Crainiceanu, Caffo, Reich, 2010. PFR, JCGS, to appear
Goldsmith, Crainiceanu, Caffo, Reich, 2010. LPFR, to appear?
![Page 14: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/14.jpg)
Population Value Decomposition (PVD)
![Page 15: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/15.jpg)
PVD
Yi = P ViD + Ei
• P is T*A• D is B*F• Vi is A*B• A << T, B << F
![Page 16: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/16.jpg)
Singular Value Decomposition (SVD) summarizes variance
Subject-specific Data
Eigenvariates EigenfrequenciesDiagonalMatrix
Frequency.
FrequencyTi
me
One subject
![Page 17: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/17.jpg)
Caffo BS, Crainiceanu CM, Verduzco G, Joel SE, Mostofsky SH, Bassett SS, Pekar JJ. Two-Stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer’s disease risk. NeuroImage (In Press).
Default PVD
Subject-specific Data
Low rank approximation
Eigenvariates
Eigenfrequencies
...
Stacked across subjects Population decomposition
Projecting original data onto population bases
(Start here)SVD
SVD
…Subject-specific Data
![Page 18: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/18.jpg)
Population eigenimages
![Page 19: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/19.jpg)
Currently:
•Deploying PVD to the 1000 Functional Connectomes Projecthttp://www.nitrc.org/projects/fcon_1000/
•Comparing rsfcMRI in stroke versus normal subjects
![Page 20: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/20.jpg)
HD-MFPCA/RAVENS Images
![Page 21: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/21.jpg)
Multilevel Functional Principal Component Analysis (MFPCA)
![Page 22: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/22.jpg)
MFPCA
![Page 23: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/23.jpg)
HD-MFPCA
![Page 24: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/24.jpg)
HD-MFPCA, Step 1
![Page 25: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/25.jpg)
HD-MFPCA, Step 2
![Page 26: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/26.jpg)
![Page 27: My first 100 Tb of data](https://reader035.fdocuments.us/reader035/viewer/2022081514/56816094550346895dcfbb91/html5/thumbnails/27.jpg)
Main message, backed by 100Tb of data
• Eventually, good tech makes into observational and clinical trials
• Longitudinal/Multilevel FDA is the natural next step in FDA
• Data is changing the way we do business: availability, size, complexity
• Likely: funding will be based much more on relevance than on technical ability