Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of...
Transcript of Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of...
![Page 1: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/1.jpg)
Applications of Statistical Data Analysis at CCNY and
the Graphyte ToolkitIrina Gladkova
Michael GrossbergDept. of Computer Science, CCNY, CUNY
NOAA/CREST
Thursday, July 23, 2009
![Page 2: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/2.jpg)
Flood of DataGOES 9,10,12NOAA-15,16,17,18LandSat 5,7DMSP F13,14,15,16Meteosat 6,7,8,9CBERS-2,2BSPOT-2,4,5ENVISATResourcesat 1CARTOSAT-1,2,2ARADARSAT-1,2KOMPSAT-1THEO-1GOMsGMS-5METEOR-3OKEANFeng-Yun
50 > Multi-sensor Platforms
1 Sensor (MODIS) = 125 GB/DAY
Thursday, July 23, 2009
![Page 3: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/3.jpg)
Moore’s Law
WJet System - 3376 CPUs
NOAA High Performance Computing SystemsExponential Growth Continues
Thursday, July 23, 2009
![Page 4: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/4.jpg)
Complex Relationships
• High Dimensionality:
• Hyper-spectral images
• High resolution
• Non-linear relationships
• Statistical Analysis:
• Starting point for physical modeling
• Pre-processes for visualizations
Thursday, July 23, 2009
![Page 5: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/5.jpg)
Application and Data Driven
• Built tools
• Developed expertise
• Applying statistical analysis to NOAA data and problems in collaboration with NOAA Scientists
Thursday, July 23, 2009
![Page 6: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/6.jpg)
Reality: Detectors Break
Band 6: 1628 - 1652 nm
Manufacturing FlawsLaunch DamageSpace is Harsh
Band 6: 15/20 Detectors Noisy or Totally Non-FunctionalThursday, July 23, 2009
![Page 7: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/7.jpg)
Lost Opportunity• NASA MYD10_L2: “Aqua MODIS band 7 is used in the
algorithm. The test for snow in dense vegetation in the algorithm was disabled because it was observed to result in frequent erroneous snow mapping in some situations." (http://modis-snow-ice.gsfc.nasa.gov/val.html)
• The National Snow and Ice Data Center: "Version 4 (V004) MYD29 data, the most current version available, uses Aqua/MODIS band 7 instead of band 6." (http://www-nsidc.colorado.edu/data/myd29.html)
• NOAA/STAR: “On Aqua the retrievals are made in band 7 (2.119 µm) because of poor quality data from band 6."(Ignatov A., et al "Two MODIS Aerosol Products over Ocean on the Terra and Aqua CERES SSF Datasets")
Thursday, July 23, 2009
![Page 8: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/8.jpg)
What is ‘Plan B’?
NASA: Column-wise Interpolation?
Bad: Visible Artifacts
Worse: Derivatives (Gradient) Fully Corrupted
Essential Features Destroyed
Simulate Aqua Damage with Terra for Evaluation
Damaged Interpolated Ground Truth
Values
Gradient
Hue = Gradient Direction, Value = Gradient Magnitude
Thursday, July 23, 2009
![Page 9: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/9.jpg)
Gradients
Hue = Gradient Direction, Value = Gradient Magnitude
Thursday, July 23, 2009
![Page 10: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/10.jpg)
Not Much Proposed
• Only 2 papers try to fix
• Both Use Band 7 to Predict Band 6
• 2006: Global Polynomial Regression
• 2009: Local Polynomial Regression
Fundamental Problem: Band 6 not a function of 7
Thursday, July 23, 2009
![Page 11: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/11.jpg)
More Information Available
• 500m Bands have Significant Correlations
• Why not use all available information?
4
3
5
6
7
76543
Thursday, July 23, 2009
![Page 12: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/12.jpg)
Statistical Approach
• Hypothesis:
We can predict band 6 from bands 3,4,5,7.
• MODIS on Terra has same bands
• Quantify prediction accuracy from test data (not used to build predictor)
Thursday, July 23, 2009
![Page 13: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/13.jpg)
Train Using Terra
Training
Terra Radiance Band 3,4,5,6,7
TrainingData
PredictorParameters
TestingData
Terra Radiance Band 3,4,5,6,7
Band 3,4,5,7
PredictionPredictedBand 6
EvaluateErrors
Band 6
Prediction used for Quantitative restorationThursday, July 23, 2009
![Page 14: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/14.jpg)
Preliminary Terra EvaluationDamaged Interpolated Ground Truth
Values
Gradient
Predicted (Restored)
Thursday, July 23, 2009
![Page 15: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/15.jpg)
Histogram Of Angles
Thursday, July 23, 2009
![Page 16: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/16.jpg)
Aqua Restoration
Thursday, July 23, 2009
![Page 17: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/17.jpg)
Aqua Restoration
Thursday, July 23, 2009
![Page 18: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/18.jpg)
Evaluate For Products
• Work with STAR to potential impact for aerosol M and A products
• Investigate use for snow mapping, and cloud mask algorithms
• Adapt prediction for products directly
• Collaborate with STAR to explain physical models driving prediction
Thursday, July 23, 2009
![Page 19: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/19.jpg)
Sensor Synthesis
AvailableBands
DesiredBand
Eg, Band 3,4,5,7 Eg, Band 6
StatisticalPrediction
Old Elements: Prediction ~ Regression ~ EstimationNew Elements: More and higher quality data Much faster computers Able to handle non-linear multivariate problems in higher dimensions
Thursday, July 23, 2009
![Page 20: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/20.jpg)
No Green on GOES-R
Thursday, July 23, 2009
![Page 21: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/21.jpg)
6 Channels close to visible
Thursday, July 23, 2009
![Page 22: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/22.jpg)
Why is Green Band Important?
• Primary reason: generate color images (RGB)
• GOES-R will have Red 640nm, and Cyan 470nm
• Current methods use lookup tables to predict green then produce RGB
Problem: Human color vision not based on
narrow band RGB
Thursday, July 23, 2009
![Page 23: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/23.jpg)
Vision: Wide Band Response
Thursday, July 23, 2009
![Page 24: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/24.jpg)
Tristimulus and XYXI(λ) spectral power
distributionI’(λ)
Two objects have same color <=> XYZ=X’Y’Z’
Don’t estimate green! Estimate XYZ and get accurate RGB
Thursday, July 23, 2009
![Page 25: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/25.jpg)
Hyperion as Spectrometer
Hyperion Data, 220 bands
XYZ GOES-R Bands 1,2,3,4,5,6
Spectral Projection
Spectral Projection
Statistical Prediction
Thursday, July 23, 2009
![Page 26: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/26.jpg)
Proof of Concept Results
Thursday, July 23, 2009
![Page 27: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/27.jpg)
Equalized Images
Equalization simply for magnifying differences
Thursday, July 23, 2009
![Page 28: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/28.jpg)
Beyond Prediction
• Statistical Estimation applies to clustering and classification tasks
• Example Clustering Problem (from Paul Menzel)
• What bands are most important for separating different cloud states?
• How do statistical clusters with those predicted by physics models?
Thursday, July 23, 2009
![Page 29: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/29.jpg)
Library of Algorithms
• Many different statistical clustering algorithms
• Hard to evaluate: what defines a good cluster?
• We built a library: implements/wraps major clustering algorithms
Agglomerative
Agglomerative Hierarchical
Average Link
Best One Element Move Consensus
Best of K Consensus
CC Average Link
CC Pivot
Competitive Learning
Connected Component
Connected Components
Expectation Maximization
Fuzzy K-means
Graph Cut
Hierarchal Dimensionality Reduction
K-means
Leader Follower
Majority Rule Consensus
Mean Shift
Multi-Dimensional Scaling
Spectral Clustering
Stepwise Optimal Hierarchical
Available Clustering Algorithms
Thursday, July 23, 2009
![Page 30: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/30.jpg)
Eg: Competitive Learning
Input: 7 dimensions/pixel
MODIS
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Thursday, July 23, 2009
![Page 31: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/31.jpg)
Consensus Clustering
Consensus Label AgreementThursday, July 23, 2009
![Page 32: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/32.jpg)
Data Clustering For Classification
What multispectral signatures correlate with presence of a bloom?
Algae Bloom Bulletin: bloom outlined in red Clustering Result: bloom shown in red
Algae Blooms
Thursday, July 23, 2009
![Page 33: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/33.jpg)
h
Modeled Remote Sensing Reflectance Spectra
The solid green spectra are when chlorophyll fluorescence is excluded from the simulation and solid red spectra are when fluorescence is included in the simulation assuming 0.75% quantum yield. Band 13 and 14 are MODIS bands centered at 667nm and 678nm respectively.
S. Ahmet et. al. “Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery”Thursday, July 23, 2009
![Page 34: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/34.jpg)
Graphyte Tool Kit
• Web based interface to:
• Data
• Computation
• Algorithms
• 2D/3D graphical interactive tools
• Data Exploration
• Data Visualization
Thursday, July 23, 2009
![Page 35: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/35.jpg)
Hardware Architecture
Thursday, July 23, 2009
![Page 36: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/36.jpg)
Edit Code In Browser
Thursday, July 23, 2009
![Page 37: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/37.jpg)
Software Architecture
Thursday, July 23, 2009
![Page 38: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/38.jpg)
Interactive 3D Scatter Plot
Thursday, July 23, 2009
![Page 39: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/39.jpg)
Rich Internet Application
Thursday, July 23, 2009
![Page 40: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/40.jpg)
Edit/Run Code Through Browser
Thursday, July 23, 2009
![Page 41: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/41.jpg)
Near Universal Availability
Thursday, July 23, 2009
![Page 42: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg](https://reader031.fdocuments.us/reader031/viewer/2022021821/5b085fbb7f8b9ac90f8c6ca6/html5/thumbnails/42.jpg)
Conclusion• Provide Expertise
• High Dimensionality
• Large Data Sets
• Statistical Clustering, Estimation, Classification
• Provide Tools for
• Computation
• Data Access
• Visualization
• Remote Collaboration
Thursday, July 23, 2009