CPAC Connectome Analysis in the Cloud

24
Harnessing cloud computing for high capacity analysis of neuroimaging data Cameron Craddock, PhD Computational Neuroimaging Lab Center for Biomedical Imaging and Neuromodulation Nathan S. Kline Institute for Psychiatric Research Center for the Developing Brain Child Mind Institute

Transcript of CPAC Connectome Analysis in the Cloud

Page 1: CPAC Connectome Analysis in the Cloud

Harnessing cloud computing for high

capacity analysis of neuroimaging data

Cameron Craddock, PhDComputational Neuroimaging Lab

Center for Biomedical Imaging and Neuromodulation

Nathan S. Kline Institute for Psychiatric Research

Center for the Developing BrainChild Mind Institute

Page 2: CPAC Connectome Analysis in the Cloud

Discovery science in Psychiatric Neuroimaging

1. Characterizing inter-individual variation in connectomes (Kelly et al. 2012)

2. Identifying biomarkers of disease state, severity, and prognosis (Craddock 2009)

3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC (Castellanos 2013)

Data is often shared only in its raw form – must be preprocessed to remove nuisance variation and to be made comparable across individuals and sites.

Page 3: CPAC Connectome Analysis in the Cloud

Connectomics is Big Data

Page 4: CPAC Connectome Analysis in the Cloud

Configurable Pipeline for the Analysis of Connectomes (CPAC)

• Pipeline to automate preprocessing and analysis of large-scale datasets

• Most cutting edge functional connectivity preprocessing and analysis algorithms

• Configurable to enable “plurality” – evaluate different processing parameters and strategies

• Automatically identifies and takes advantage of parallelism on multi-threaded, multi-core, and cluster architectures

• “Warm restarts” – only re-compute what has changed• Open science – open source• http://fcp-indi.github.io

Nypipe

Page 5: CPAC Connectome Analysis in the Cloud

Computing in the Amazon Cloud• No hardware capital cost• No hardware maintenance• No software installation or

configuration*• Resources scale to meet need for

no overhead• Available everywhere and to

everybody• Allows access to exotic

architectures, such as GPUs

*If appropriate AMI is available

Page 6: CPAC Connectome Analysis in the Cloud

Amazon EC2 - Instance• The hardware on which your processing will

run:

Page 7: CPAC Connectome Analysis in the Cloud

Instance Pricing• On-demand Pricing– Always available, fixed

price, non-interruptible, most stable

• Spot instances– Market to sell otherwise

unused time, variable price, interruptible

Page 8: CPAC Connectome Analysis in the Cloud

Spot Instances• Prices fluctuate over

time• If price exceeds the max

you are willing to pay, your instances are terminated

Page 9: CPAC Connectome Analysis in the Cloud

Storage• S3 – Simple Storage Service

– Secure and stable storage with a web service interface, pay for what you use– Big and slow, $0.03 per GB/Month– Can be accessed from anywhere

• EBS – Elastic Block Storage– Provisioned storage (SSD HD) directly connected to instance, pay for what you provision– Fast and expensive, $0.10 per GB/Month– Persistent and transferrable

• Instance Storage– SSD storage provided with some instances, included in instance price– Fast and free– Non-persistent and non-transferrable – good for cache

Page 10: CPAC Connectome Analysis in the Cloud

Amazon EC2 - Instance• The hardware on which your processing will

run:

Page 11: CPAC Connectome Analysis in the Cloud

Data Transfer• In general, free in - pay out– Out to other Amazon service such as S3, EBS, etc is

free– Out to Internet is $0.09 per GB (becomes slightly

cheaper after 10TB or so)

Page 12: CPAC Connectome Analysis in the Cloud

Amazon Machine Images• Virtual machines that provide the software

environment for your processing• You can build your own, or use one

maintained by others

Page 13: CPAC Connectome Analysis in the Cloud

StarCluster• Star cluster simplifies the process of building a

Sun Grid Engine based cluster in EC2– Dynamically add and remove compute nodes– Uses spot instances– Provides scripts for performing many

administrative tasks

Page 14: CPAC Connectome Analysis in the Cloud

C-PAC Amazon Machine Image

Nypipe

Page 15: CPAC Connectome Analysis in the Cloud

Proof of concept• Preprocessed 1,112 datasets from ABIDE

with C-PAC– 4 different preprocessing strategies (+/-

temporal filter, +/- global signal regression)– 24 derivatives:

• ReHo, ALFF, fALFF, 10 RSNs, VMHC, binary degree centrality, weighted degree centrality, lFCD, time courses for 5 atlases (AAL, TT, EZ, HO, CC200, CC400)

http://preprocessed-connectomes-project.github.io/abide

Page 16: CPAC Connectome Analysis in the Cloud

• Requires 45 minute to process 1 dataset• 3 datasets can be processed in parallel• Processing results in .5GB of data

Model Parameters

Page 17: CPAC Connectome Analysis in the Cloud

Cloud vs. Traditional Computing

Page 18: CPAC Connectome Analysis in the Cloud

Impact of Spot Instances

Simulations using past 90 days of spot price history

Page 19: CPAC Connectome Analysis in the Cloud

Other Pipelines

Page 20: CPAC Connectome Analysis in the Cloud

What about HIPAA?• Amazon AWS meets FedRAMP and NIST 800-53 standards,

which are more rigorous than HIPAA– Access to instances controlled using 256-bit AES– Default firewalls deny all outside access– EC2, EBS, and S3 storage are compatible with encryption

• AWS HIPAA whitepaper–

http://d0.awsstatic.com/whitepapers/compliance/AWS_HIPAA_Compliance_Whitepaper.pdf

Page 21: CPAC Connectome Analysis in the Cloud

C-PAC Amazon Machine Image

Nypipe

Page 22: CPAC Connectome Analysis in the Cloud

Preprocessed INDI Data in the Cloud

http://preprocessed-connectomes-project.github.io/

• Available through S3 Bucket generously provided by AWS

• Raw INDI will be available soon

Page 23: CPAC Connectome Analysis in the Cloud

- HCP Data available in the cloud:- https://wiki.humanconnectome.org/display/PublicData/Home

- Receive $100 AWS Credits at the HCP workshop in Hawaii- http://humanconnectome.org/course-registration/2015/exploring-the-human-conn

ectome.php

Page 24: CPAC Connectome Analysis in the Cloud

AcknowledgementsCPAC Team: Daniel Clark, Steven Giavasis and Michael Milham.

NDAR “Cloud Team”: Christian Haselgrove, Dave Kennedy, and Jack van Horn.

NDAR Team: Dan Hall, Brian Koser, David Obenshain, Svetlana Novikova, and Malcom Jackson.

CPAC-NDAR integration was funded by a contract from NDAR.ABIDE Preprocessed data is hosted in a Public S3 Bucket provided

by AWS.