Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data
description
Transcript of Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data
![Page 1: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/1.jpg)
ACCELERATING SPARSE CANONICAL CORRELATION ANALYSIS FOR LARGE BRAIN IMAGING GENETICS DATAJingwen Yan , Hui Zhang, Lei Du, Eric Wernert, Andew J. Saykin, Li Shen
![Page 2: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/2.jpg)
OUTLINE• Imaging Genetics• Sparse Canonical Correlation Analysis (SCCA)• Computational Challenges and Methods• Data Simulation• Experimental Results
![Page 3: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/3.jpg)
IMAGING GENETICS
Cells Systems
Behavior:Disorders, Complex
interactions, phenomena,
diseases.Genes
UCI, S. Potkin et al.
![Page 4: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/4.jpg)
Underlying Biological Pathway and Mechanism
IMAGING GENETICS
![Page 5: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/5.jpg)
Risacher et al 2010 Sloan et al 2010Potkin et al 2009; Saykin et al 2010
Risacher et al 2013 AV45 ROIs & APOE
Swaminathan et al 2012 PiB ROIs & amyloid pathway
Potkin et al 2009 Mol Psych schizophrenia study
Ho et al 2010 FTO; Reiman et al PNAS 2009
Chiang et al 2012 SNP/Gene networks & WM integrity
Shen et al 2010 ROIs; Stein et al 2010 voxels
SingleROI
Circuit
Whole Brain
Candidate Gene/SNP Biological Pathway Genome-wide
IMAGING GENETICS
![Page 6: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/6.jpg)
OUTLINE• Imaging Genetics• Sparse Canonical Correlation Analysis (SCCA)• Computational Challenges and Methods• Data Simulation• Experimental Results
![Page 7: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/7.jpg)
X1
X2
X3
Xn
Y1
Y2
Y3
Yn
X1
X2
X3
Xn
Y1
Y2
Y3W’X
Yn
X1
X2
X3
Xn
Y1
Y2
Y3Xu
Yn
Yv
𝑹
Massive Univariate Analysis
Multivariate Multiple
RegressionCanonical
Correlation Analysis
SCCA
![Page 8: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/8.jpg)
• Sparse canonical correlation analysis (SCCA)• R package: Penalized Multivariate Analysis (PMA) (Witten, et al,
2009)
• X, Y : imaging and genetics data respectively• : sparse penalties, mostly norm• For simplicity, assuming and • Bi-convex and non differentiable problem• Iterative solution
= 1, = 1
SCCA
![Page 9: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/9.jpg)
• Sparse canonical correlation analysis (SCCA)• Problem
• Iterative solution
• , ) is the soft thresholding operator and is chosen so that
= 1, = 1,
SCCA
1. = 1,
2. = 1,
![Page 10: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/10.jpg)
OUTLINE• Imaging Genetics• Sparse Canonical Correlation Analysis (SCCA)• Computational Challenges and Methods• Data Simulation• Experimental Results
![Page 11: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/11.jpg)
COMPUTATIONAL CHALLENGES• Example SCCA run at a small scale
• Participants: 1000• Genotype: 3,200 SNPs• Phenotype: 10,000 voxels• Permutation: 10,000 permutation tests • Running time: more than 12,000 hours
• Scale up
• Genotype (array): 6M SNPs• Genotype (NGS): 40M variants• Phenotype: 200K voxels, imaging, cognitive and biomarker• Permutation: 10M permutation to reach p=10-7
• Parameter tuning via cross-validation
• 10-fold cross-validation coupled with an 11-by-11 grid search• SCCA runs: 10×11×11 = 1,210
![Page 12: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/12.jpg)
ACCELERATION WITH MKL• Intel Math Kernel Library (MKL)
• accelerate application performance and reduce development time• highly vectorized and threaded linear algebra, fast fourier
transforms (FFT), vector math and statistics functions
• MKL has been optimized to utilize
• multiple processing cores• wider vector units• more varied architectures available in a high end system
• MKL can provide parallelism transparently and speed up programs with supported math routines without changing code.
• Compiling R with MKL
![Page 13: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/13.jpg)
ACCELERATION WITH OFFLOAD MODEL• Xeon Phi SE10P Coprocessor
• 60 cores with 8GB GDDR5
• Intel x86 instruction set
• Usage of familiar programming models, software, and tools
• Pros• The host system can offload
computing workload partially to the Xeon Phi
• Independently run a compatible program
![Page 14: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/14.jpg)
• Texas Advanced Computing Center Stampede cluster
• MKL + offload
• Each computing node
• Two Intel Xeon E5-2680 processors each with eight cores @2.7GHz.
• 32GB DDR3 memory • The Xeon Phi SE10P Coprocessor has
61 cores with 8GB GDDR5• The NVIDIA K20 GPUs on each node
have 5GB of on-board GDDR5
• Software
• CentOS 6.3. • Stock R 3.01 package compiled with
the Intel compilers (v.13) and built with MKL v.11.
COMPUTATIONAL PLATFORM
![Page 15: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/15.jpg)
OUTLINE• Imaging Genetics• Sparse Canonical Correlation Analysis (SCCA)• Computational Challenges and Methods• Data Simulation• Experimental Results
![Page 16: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/16.jpg)
• FREGENE genome simulator
• Simulate sequence-like data over large genomic regions in large diploid populations
• Simulated data
• N=1,000 diploid individuals over 20,000 generations • 10 Mb genome with the average mutation rate as 2.5e-8
/site/generation• 3,274 SNPs with minor allele frequency (MAF) greater
than 0.05 included • Four SNP data sets (i.e., g500, g1000, g2000, and g3274)
by taking the first 500, 1,000, 2,000, and 3,274 SNPs from the entire data, respectively.
SYNTHETIC DATA (GENETICS)
![Page 17: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/17.jpg)
SYNTHETIC DATA (GENETICS)
![Page 18: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/18.jpg)
SYNTHETIC DATA (IMAGING)• Assumption
• Each image with multiple regions of interest (ROIs)• Voxel within each ROI highly correlated
• Simulation
• Random positive definite non-overlapping group structured covariance matrix
• Apply Cholesky decomposition to obtain the background imaging data
• Individual: N=1000, Size: 100x100• We created three sets of phenotypic imaging data (i.e.,
p1000, p5000, and p10000), consisting of 1,000, 5,000 and 10,000 voxels respectively
![Page 19: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/19.jpg)
SYNTHETIC DATA (IMAGING)
![Page 20: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/20.jpg)
OUTLINE• Imaging Genetics• Sparse Canonical Correlation Analysis (SCCA)• Computational Challenges and Methods• Data Simulation• Experimental Results
![Page 21: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/21.jpg)
• R snowfall package (sfLapply) with MKL and offload model
RESULTS
Baseline
Parallel (MKL+ offload)
![Page 22: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/22.jpg)
RESULTS
• Accelerated SCCA implementations yielded the same results
• These correlation coefficients are close to the ground truth value of 1
Correlation coefficient between the first pair of canonical components
![Page 23: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/23.jpg)
RESULTS
![Page 24: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/24.jpg)
CONCLUSION• Initial steps to accelerate the SCCA implementation for brain imaging genetics
applications.
• Parallelism achieved in system implementation level to accelerate linear algebra computation using math kernel library (MKL) and partial offloading computing workload.
• The 2-fold speedup, although encouraging, is still insufficient to handle extremely large-scale neuroimaging genetics data
• millions of image voxels and millions of SNPs.
• Future work
• Big data analytic strategies at the parallel computing model level• Parallelization of multiplicative algorithms using MapReduce and CUDA.
• Application to accelerate enhanced SCCA models as well as other bi-multivariate statistical models for analyzing brain imaging genetics data.
![Page 25: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/25.jpg)
ACKNOWLEDGEMENT
This research was supported by
• NIH R01 LM011360• NIH U01 AG024904• NIH RC2 AG036535• NIH R01 AG19771• NIH P30 AG10133• NSF IIS-1117335
![Page 26: Accelerating Sparse Canonical Correlation Analysis for Large Brain Imaging Genetics Data](https://reader034.fdocuments.us/reader034/viewer/2022051317/568161eb550346895dd21d1a/html5/thumbnails/26.jpg)
Thank you