A Data Driven Approach to Tackling Big Data Connectomics · A Data Driven Approach to Tackling Big...
Transcript of A Data Driven Approach to Tackling Big Data Connectomics · A Data Driven Approach to Tackling Big...
A Data Driven Approach to Tackling Big Data ConnectomicsGreg Kiar
McGill University Healthy Brains for Healthy Lives Fellow,McGill Centre for Integrative Neuroscience, Montreal Neurological Institute,Ph.D. student McGill University,M.S.E. Johns Hopkins University,B.Eng Carleton University
2018-02-05
Outline
2018-02-05 2
• Context
• The common approach
• An approach based on accessibility, robustness, and scalability
Publicly available MRI datasets• ADNI• ABCD• ABIDE• ADHD-200• Age-ility• AIBL• BRAINS• CamCAN• CMI-HBN• COBRE• CoRR/FCP-INDI• DLBS
• fBIRN• GSP• HCP• IXI• Kirby21• MASSIVE• MindBoggle-101• MIRIAD• MPI-LMBB• MSC• NACC• NCANDA
• NKIRS• OASIS-CS• OASIS-Long• OpenfMRI• PING• PNC• PTBP• SALD• SchizConnect• StudyForrest• UK-Biobank
2018-02-05 3
Source:https://github.com/cMadan/openMorph
Publicly supported BIDS apps• AFNI• ANTS Cortical Thickness• Baracus• Brainiak-srm• BROCCOLI• CPAC• DPARSF• Fibre Density and Cross-section• fMRIprep• Freesurfer• FSL Tools• HCP Pipelines• Hyper Alignment
• MAGeTbrain• MindBoggle• MRIQC• MRtrix3 Connectome• ndmg• NIAK• OPPNI• SRM• SPM• Tracula• QAP
2018-02-05 4
Source:http://bids-apps.neuroimaging.io/apps/
Common approach1. Pose a hypothesis2. Collect + curate dataset3. Manually perform QC on dataset4. Pick processing pipeline and parameters5. Process random subset with pipeline in 4.6. Manually perform QC on derivatives7. Redo from 4. if not happy with 6.8. Process all data with pipeline in 4.9. Answer statistical question10. Publish claim11. Get tenure
2018-02-05 5
Common Honest approach1. Poste ahoc hypothesis2. Collect + curate dataset3. Undergrads Manually perform QC on dataset4. Pick processing pipeline and parameters5. Process random subset first subject with pipeline in 4.6. Grad students Manually perform QC on derivatives7. Don’t redo from 4. if not happy with 6.8. Process all data with pipeline in 4.9. Answer statistical question (see updated 1.)10. Publish claim11. Get tenure
2018-02-05 6
Problems with this approach
1. Manual QC is subjective; we need LOTS to be reliable
2. Pipelines and parameters aren’t ”optimized” objectively
3. Datasets aren’t homogeneous
4. Incentive to publish/graduate rather than redo experiments
5. Computer infrastructures are expensive, and not always equal
2018-02-05 7
Proposed solution
1. Optimize pipeline for stability and robustness; remove bias
2. Automate QC where possible
3. Evaluate on public data with known “truths” (i.e. TestReTest)
4. Automate pipeline deployment
5. Automate data discovery*
2018-02-05 8
Proposed solution
1. Optimize pipeline for stability and robustness; remove bias
2. Automate QC where possible
3. Evaluate on public data with known “truths” (i.e. TestReTest)
4. Automate pipeline deployment
5. Automate data discovery*
2018-02-05 9
ndmg: one-click connectomes
2018-02-05 10
Registration to MNI152
2018-02-05 11
DWI corr. DWI
T1wint. DWI
Template
xfmreg. DWI QAExternal Dep.
SubjectIntermediateOutput
2018-02-05 12
Compute Tensor Field
2018-02-05 13
bvals
grad. table
reg. DWI
Mask
Tensors QA
bvecs
External Dep.SubjectIntermediateOutput
2018-02-05 14
Estimate Streamlines
2018-02-05 15
FA thresh.
Mask
Fibers QA
Tensors
External Dep.SubjectIntermediateOutput
2018-02-05 16
Graph Generation
2018-02-05 17
Parcellation
Connectome QA
Fibers
External Dep.SubjectIntermediateOutput
2018-02-05 18
Optimizing for discriminability
2018-02-05 19
g: connectomei: class label (i.e. subject)j: observation label (i.e. session)
“My brain looks more like my brain than my brain looks like your brain”or
“My brain looks more like brains of the same {dataset, sex, age, handedness, etc.} than brains of another {“, “, “,”}”
Reliable structural connectivity
2018-02-05 20
2018-02-05 21
2018-02-05 22
2018-02-05 23
easy to use/install$ # Installable on Python2.7 if you have FSL installed…$ pip install ndmg$$ # Run on your dataset$ ndmg_bids /data /outs session $ ndmg_bids /data /outs group$$ # Or install and run through Docker$ docker run –ti –v /data –v /outs bids/ndmg /data /outs session$ docker run –ti –v /data –v /outs bids/ndmg /data /outs group
2018-02-05 24
Proposed solution
1. Optimize pipeline for stability and robustness; remove bias
2. Automate QC where possible
3. Evaluate on public data with known “truths” (i.e. TestReTest)
4. Automate pipeline deployment
5. Automate data discovery*
2018-02-05 25
Boutiques
2018-02-05 26
Boutiques
2018-02-05 27
{
"name": "echo",
"tool-version": "1.0",
"description": "A simple script to test output files",
"command-line": "echo [PARAM] > output.txt",
"schema-version": "0.5",
"inputs": [{ "id": "param",
"name": "Parameter",
"value-key": "[PARAM]",
"type": "Number" }],
"output-files": [{ "id": "output_file",
"name": "Output file",
"path-template": "output.txt" }]
}
Clowdr
2018-02-05 28
2018-02-05 29
2018-02-05 30
Boutiques on pip, Clowdr soon
2018-02-05 31
$ # Installable on Python2 or 3…$ pip install boutiques$$ # Describe, validate, launch your tool, or more!$ bosh validate descriptor.json$ bosh exec simulate descriptor.json –r$ bosh exec launch descriptor.json invocation.json$$ # Soon… (currently the API is a bit uglier than this)$ clowdr deploy descriptor.json invocation.json s3://dataset
Proposed solution
1. Optimize pipeline for stability and robustness; remove bias
2. Automate QC where possible
3. Evaluate on public data with known “truths” (i.e. TestReTest)
4. Automate pipeline deployment
5. Automate data discovery*
2018-02-05 32
Apine
2018-02-05 33
So, why are we doing this again?• If tools and platforms are made to be reproducible and robust…
SCIENTISTS CAN FOCUS ON SCIENCE!
• Free validation and summary of the quality of work being done• Enables scaling to datasets beyond reach of manual curation
2018-02-05 34
The dream• Go to a website• Pick a dataset• Pick an analysis• Design a hypothesis• Launch it• Go outside & run around• Come back to your answer• Share the results, form new hypotheses, and collect new data
2018-02-05 35
Acknowledgements• McGill Centre for Integrative Neuroscience (Alan Evans, et al.)• Big-Data for Neuroinformatics Lab (Tristan Glatard, et al.)• Jean-Baptiste Poline, Pierre Bellec, Christine Tardif• Montreal Neurological Institute/The Neuro• Healthy Brains for Healthy Lives• Lab-mates, Family, Friends, Universe
2018-02-05 36
All code demonstrated in this presentation is publicly available on GitHub.
Thanks!Find me @
gkiar
g_kiar
2018-02-05 37