Machine Learning and Deep Contemplation of Data
-
Upload
joel-saltz -
Category
Software
-
view
157 -
download
0
Transcript of Machine Learning and Deep Contemplation of Data
![Page 1: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/1.jpg)
Machine Learning and Deep Contemplation of Data
Joel SaltzDepartment of Biomedical Informatics
Stony Brook University
CCDSCOctober 5, 2016
![Page 2: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/2.jpg)
From BDEC: “Domain”: Spatio-temporal Sensor Integration, Analysis, Classification
• Multi-scale material/tissue structural, molecular, functional characterization. Design of materials with specific structural, energy storage properties, brain, regenerative medicine, cancer
• Integrative multi-scale analyses of the earth, oceans, atmosphere, cities, vegetation etc – cameras and sensors on satellites, aircraft, drones, land vehicles, stationary cameras
• Digital astronomy • Hydrocarbon exploration, exploitation, pollution remediation• Solid printing integrative data analyses• Data generated by numerical simulation codes – PDEs, particle
methods
![Page 3: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/3.jpg)
Things that Need to be Done with Spatio Temporal Data
• Generation of Features• Sanity Checking and Data Cleaning• Qualitative Exploration• Descriptive Statistics• Classification• Identification of Interesting Phenomena• Prediction• Control• Save Data for Later (Compression)
![Page 4: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/4.jpg)
Precision Medicine Meta Application
• Predict treatment outcome, select, monitor treatments
• Reduce inter-observer variability in diagnosis
• Computer assisted exploration of new classification schemes
• Multi-scale cancer simulations
Multi-Scale Precision Medicine
![Page 5: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/5.jpg)
Imaging and Precision Medicine - Pathomics, Radiomics
Identify and segment trillions of objects – nuclei, glands, ducts, nodules, tumor niches … from Pathology, Radiology imaging datasetsExtract features from objects and spatio-temporal regionsSupport queries against ensembles of features extracted from multiple datasetsStatistical analyses and machine learning to link Radiology/Pathology features to “omics” and outcome biological phenomenaPrinciple based analyses to bridge spatio-temporal scales – linked Pathology, Radiology studies
![Page 6: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/6.jpg)
Things that Need to be Done with Spatio Temporal Data• Generation of Features• Sanity Checking and Data Cleaning• Qualitative Exploration• Descriptive Statistics• Classification• Identification of Interesting Phenomena• Prediction• Control• Save Data for Later (Compression)
![Page 7: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/7.jpg)
Current Driving Applications
• Checkpoint Inhibitors – when to use, when to stop
• Pathology, Imaging data obtained prior to and during treatment
• Integration of “omics”, tissue and imaging to manage treatment
• Non Small Cell Lung Cancer, Melanoma, Brain
• Virtual Tissue Respository• SEER Cancer Epidemiology• 500K Cancer Patients per
year• DOE/NCI pilot involving
text• Our co-located
companion Virtual Tissue Repository pilot targets SEER images
![Page 8: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/8.jpg)
Radiomics
Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach
Hugo J. W. L. Aerts et. Al.Nature Communications 5, Article number: 4006 doi:10.1038/ncomms5006
Features
Patients
![Page 9: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/9.jpg)
Integrative Morphology/”omics”
Quantitative Feature Analysis in Pathology: Emory In Silico Center for Brain Tumor Research (PI = Dan Brat, PD= Joel Saltz)
NLM/NCI: Integrative Analysis/Digital Pathology R01LM011119, R01LM009239 (Dual PIs Joel Saltz, David Foran)
J Am Med Inform Assoc. 2012 Integrated morphologic analysis for the identification and characterization of disease subtypes.
Pathomics
Lee Cooper, Jun Kong
![Page 10: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/10.jpg)
Things that Need to be Done with Spatio Temporal Data
• Generation of Features• Sanity Checking and Data Cleaning• Qualitative Exploration• Descriptive Statistics• Classification• Identification of Interesting Phenomena• Prediction• Control• Save Data for Later (Compression)
![Page 11: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/11.jpg)
![Page 12: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/12.jpg)
Robust Nuclear Segmentation
• Robust ensemble algorithm to segment nuclei across tissue types• Optimized algorithm tuning methods• Parameter exploration to optimize quality• Systematic Quality Control pipeline encompassing tissue image
quality, human generated ground truth, convolutional neural network critique
• Yi Gao, Allen Tannenbaum, Dimitris Samaras, Le Hou, Tahsin Kurc
![Page 13: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/13.jpg)
Cell Morphometry Features
![Page 14: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/14.jpg)
Things that Need to be Done with Spatio Temporal Data• Generation of Features• Sanity Checking and Data Cleaning• Qualitative Exploration• Descriptive Statistics• Classification• Identification of Interesting Phenomena• Prediction• Control• Save Data for Later (Compression)
![Page 15: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/15.jpg)
3D Slicer Pathology – Generate High Quality Ground Truth
![Page 16: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/16.jpg)
Apply Segmentation Algorithm
![Page 17: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/17.jpg)
Adjust algorithm parameters, manual fine tuning
![Page 18: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/18.jpg)
Sanity Check FeaturesRelationship Between Image and Features
![Page 19: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/19.jpg)
![Page 20: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/20.jpg)
Select Feature Pair – dots correspond to nuclei
![Page 21: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/21.jpg)
Subregion selected – form of gating analogous to flow cytometry
![Page 22: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/22.jpg)
Sample Nuclei from Gated Region
![Page 23: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/23.jpg)
Gated Nuclei in Context
![Page 24: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/24.jpg)
Compare Algorithm Results
![Page 25: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/25.jpg)
Heatmap – Depicts Agreement Between Algorithms
![Page 26: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/26.jpg)
Things that Need to be Done with Spatio Temporal Data• Generation of Features• Sanity Checking and Data Cleaning• Qualitative Exploration• Descriptive Statistics• Classification• Identification of Interesting Phenomena• Prediction• Control• Save Data for Later (Compression)
![Page 27: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/27.jpg)
Auto-tuning and feature extraction
• Goal – correctly segment trillions of objects (nuclei)• Adjust algorithm parameters• Autotuning – finds parameters that best match ground
truth in an image patch• Region template runtime support to optimize
generation and management of multi-parameter algorithm results
• Eliminates redundant computation, manages locality• Active Harmony – Jeff Hollingsworth!!• Collaboration – George Teodoro, Tahsin Kurc
![Page 28: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/28.jpg)
![Page 29: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/29.jpg)
E=Eliminate Duplicate Compuations
![Page 30: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/30.jpg)
Performance Optimization256 nodes of Stampede. Each node of the cluster has a dual socket Intel Xeon E5-2680 processors, an Intel Xeon Phi SE10P co-processor and 32GB RAM.The nodes
are inter-connected via Mellanox FDR Infiniband switches.
![Page 31: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/31.jpg)
Good Bad
Test as Good 2916 33
Test as Bad 28 2094
Machine Learning and Quality Critiquing
SVM Approach
![Page 32: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/32.jpg)
Things that Need to be Done with Spatio Temporal Data• Generation of Features• Sanity Checking and Data Cleaning• Qualitative Exploration• Descriptive Statistics• Classification• Identification of Interesting Phenomena• Prediction• Control• Save Data for Later (Compression)
![Page 33: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/33.jpg)
Feature Explorer - Integrated Pathomics Features, Outcomes and “omics” – TCGA NSCLC Adeno Carcinoma Patients
![Page 34: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/34.jpg)
Feature Explorer - Integrated Pathomics Features, Outcomes and “omics” – TCGA NSCLC Adeno Carcinoma Patients
![Page 35: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/35.jpg)
Collaboration with MGH – Feature Explorer – Radiology Brain MR/Pathology Features
![Page 36: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/36.jpg)
Collaboration with SBU Radiology – TCGA NSCLC Adeno Carcinoma
Integrative Radiology, Pathology, “omics”, outcome
Mary Saltz, Mark Schweitzer SBU Radiology
![Page 37: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/37.jpg)
Things that Need to be Done with Spatio Temporal Data• Generation of Features• Sanity Checking and Data Cleaning• Qualitative Exploration• Descriptive Statistics• Classification• Identification of Interesting Phenomena• Prediction• Control• Save Data for Later (Compression)
![Page 38: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/38.jpg)
Classification• Automated or semi-automated identification of
tissue or cell type• Variety of machine learning and deep learning
methods• Classification of Neuroblastoma• Classification of Gliomas• Quantification of lymphocyte infiltration
![Page 39: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/39.jpg)
Classification and Characterization of Heterogeneity
Gurcan, Shamada, Kong, Saltz
Hiro Shimada, Metin Gurcan, Jun Kong, Lee Cooper Joel Saltz
BISTI/NIBIB Center for Grid Enabled Image Analysis - P20 EB000591, PI Saltz
Classification and Characterization of Heterogeneity
![Page 40: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/40.jpg)
Neuroblastoma Classification
FH: favorable histology UH: unfavorable histologyCANCER 2003; 98:2274-81
<5 yr
SchwannianDevelopment
≥50%Grossly visible Nodule(s)
absent
present
Microscopic Neuroblastic
foci
absent
present
Ganglioneuroma(Schwannian stroma-dominant)
Maturing subtypeMature subtype
Ganglioneuroblastoma, Intermixed(Schwannian stroma-rich)
FH
FH
Ganglioneuroblastoma, Nodular(composite, Schwannian stroma-rich/stroma-dominant and stroma-poor) UH/FH*
Variant forms*
None to <50%
Neuroblastoma(Schwannian stroma-poor)
Poorly differentiatedsubtype
Undifferentiatedsubtype
Differentiatingsubtype
Any age UH
≥200/5,000 cellsMitotic & karyorrhectic cells
100-200/5,000 cells
<100/5,000 cells
Any age
≥1.5 yr
<1.5 yr
UH
UH
FH
≥200/5,000 cells
100-200/5,000 cells
<100/5,000 cells
Any age UH
≥1.5 yr
<1.5 yr
≥5 yr
UH
FH
UH
FH
![Page 41: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/41.jpg)
Multi-Scale Machine Learning Based Shimada Classification System
• Background Identification
• Image Decomposition (Multi-resolution levels)
• Image Segmentation (EMLDA)
• Feature Construction (2nd order statistics, Tonal Features)
• Feature Extraction (LDA) + Classification (Bayesian)
• Multi-resolution Layer Controller (Confidence Region)
No
YesImage Tile Initialization
I = L Background? Label
Create Image I(L)
Segmentation
Feature Construction
Feature Extraction
Classification
Segmentation
Feature Construction
Feature Extraction
Classifier Training
Down-sampling
Training Tiles
Within ConfidenceRegion ?
I = I -1
I > 1?
Yes
Yes
No
No
TRAINING
TESTING
![Page 42: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/42.jpg)
![Page 43: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/43.jpg)
Brain Tumor Classification – CVPR 2016
![Page 44: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/44.jpg)
Combining Information from Patches
![Page 45: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/45.jpg)
Brain Tumor Classification Results
Le Hou, Dimitris Samaras, Tahsin Kurc, Yi Gao, Liz Vanner, James Davis, Joel Saltz
![Page 46: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/46.jpg)
Tumor Infiltrating Lymphocyte quantification
• Convolutional neural network to classify lymphocyte infiltration in tissue patches
• Convolutional neural network and random forest to classify individual segmented nuclei
• Extensive collection of ground truth
• Joint work with Emory and TCGA PanCanAtlas Immune group
Unsupervised Autoencoder – 100 feature dimensions
![Page 47: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/47.jpg)
Lymphocyte identification
Lymphocytes Infiltration No LymphocyteInfiltration
![Page 48: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/48.jpg)
Receiver Operating Characteristic – Area Under Curve – 95%
![Page 49: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/49.jpg)
Lymphocyte Classification Heat Map
Trained with 22.2K image patchesPathologist corrects and edits
![Page 50: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/50.jpg)
Commonalities• Provided quick but pretty deep dive into aspects of
spatio temporal data analytics• Requirements, methods and I think core
infrastructure can be shared between disparate application classes
• These application classes are definitely data but spatio-temporal aspects are HPC community context friendly
• Most of this holds for analysis of scientific program generated data – ORNL Klasky collaborations
![Page 51: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/51.jpg)
ITCR TeamStony Brook UniversityJoel SaltzTahsin KurcYi GaoAllen TannenbaumErich BremerJonas AlmeidaAlina JasniewskiFusheng WangTammy DiPrimaAndrew WhiteLe HouFurqan BaigMary Saltz
Emory UniversityAshish SharmaAdam Marcus
Oak Ridge National LaboratoryScott KlaskyDave PugmireJeremy Logan
Yale UniversityMichael Krauthammer
Harvard University Rick Cummings
![Page 52: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/52.jpg)
Funding – Thanks!• This work was supported in part by U24CA180924-
01, NCIP/Leidos 14X138 and HHSN261200800001E from the NCI; R01LM011119-01 and R01LM009239 from the NLM
• This research used resources provided by the National Science Foundation XSEDE Science Gateways program under grant TG-ASC130023 and the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the NSF under Contract OCI-0910735.
![Page 53: Machine Learning and Deep Contemplation of Data](https://reader035.fdocuments.us/reader035/viewer/2022062901/58f141521a28abb8488b4585/html5/thumbnails/53.jpg)
Thanks!