S7582 - DEEP LEARNING APPLICATIONS ACROSS THE BREADTH...
Transcript of S7582 - DEEP LEARNING APPLICATIONS ACROSS THE BREADTH...
S7582 - DEEP LEARNING APPLICATIONS ACROSS THE BREADTH OF GEOINT
WILL RORRER
PROGRAM MANAGER
DAVID GORODETZKY
RESEARCH LEAD FOR REMOTE SENSING AND MACHINE LEARNING
NVIDIA GPU Technology Conference08 May 2017
ENVIUS | ENVI Classified User Symposium2S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning to Harris?• Harris Deep Learning Solutions And Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium3S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Segment overviews
Space and Intelligence Systems
$1.9BComplete Earth observation, weather, geospatial,
space protection and intelligence solutions from
advanced sensors, antennas and payloads, as well as
ground processing and information analytics
Electronic Systems
$2.1BExtensive portfolio serving the defense industry with
electronic warfare, avionics, robotics, advanced
communications and maritime systems as well as air
traffic management solutions for the civil aviation industry
Communication Systems $1.9BTactical and airborne radios, night vision technology,
and defense and public safety networks
Pro forma revenue FY16
ENVIUS | ENVI Classified User Symposium4S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Harris Corp. has been internally investing in taking
state-of-the-art deep learning technologies and
applying them to remote sensing and geospatial
intelligence customer problems
How long?
• Harris has been working on deep learning for five years
How much?
• Multimillion dollar internal research and development
investment in the last three years
• Additional commercialization investment
Investment Approach
• Research
• Pilot Projects
• Software tool development
• Commercialization
Focus Areas:
Reducing the Cost of Training
• Reduce dollar cost, human cost, and
computer cost of building new models
Extensibility
• Ability to quickly redeploy and
repackage tools to support new problem
sets
Support multiple types of data
• New sensors and data fusion
Automation
• Ability for processes to interface to tools,
removing human from the loop
Deep Learning at Harris
ENVIUS | ENVI Classified User Symposium5S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium6S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
1.2M training images
1,000 object categories
2016 ImageNet Challenge
Object Classification Winner:
Trimps-Soushen
2.99% Error Rate
Label Data Burden Reduction Background
Graph from www.image-net.org
ENVIUS | ENVI Classified User Symposium7S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Labeled Data is the Currency of ML
Real world problems don’t have an ImageNet for training.
How do we overcome the burden of labeled data?
Do More With Less
• Pretrain the network and Transfer Learn where possible
• Enforce Domain Adaptation if necessary
• Use synthetic data from models or simulators
• Use Curriculum Learning and Dynamically Balanced datasets
Use Unsupervised and Active Learning
• Use a deep embedding to optimize clustering of unlabeled data
• Iterate training of weakly supervised models to progressively improve training set
• Use Convolutional Autoencoders or GANs to generate embedding from unlabeled data
• Use Siamese NN to find similar labels or extract an embedding
Auto-ML• Automated methods to search for optimal model hyperparameters
ENVIUS | ENVI Classified User Symposium8S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
DIRSIG
10 am, 7 degree look angle, Jan 1, Scene
Azimuth 0
2 pm, 7 degree look angle, Jan 1, Scene
Azimuth 225
Use of Synthetic Training Data
• 100% of training data synthesized using CAD models and Scene Simulator
• The trained model is applied to real imagery
• Successful detector produced for fighter jets in WV-2 Pan imagery
• Limiting factors: (1) Content of scene generator (2) Quality of Simulation
6 CAD models used
Objects placed in scene at
various geometries
Heat Map for fighter jets in IKONOS
Pan Imagery
ENVIUS | ENVI Classified User Symposium9S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Auto-Clustering in Deep Embedding Space
+Drop
365
fc7
Deep Embedding from VGG-16 trained on Places365 scenes used to cluster mixed
RGB imagery into like labels
VGG-16$
* Places: An Image Database for Deep Scene Understanding, B. Zhou, A. Khosla, A. Lapedriza, A. Torralba and A. Oliva arXiv:1610.02055$ Very Deep Convolutional Networks for Large-Scale Image Recognition K. Simonyan, A. Zisserman arXiv:1409.1556
*
Unlabeled, mixed RGB
images
Roads
Golf, Baseball
Harbors,
Parking Lots
ENVIUS | ENVI Classified User Symposium10S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Improvements to Conv-Autoencoder
One of the few unsupervised methods for training CNNs.
Can they be as discriminative as supervised training?
• When conv-autoencoder is applied to satellite imagery tend to learn simple convolutional features
• Far less effective for transfer learning then supervised classification CNNs
output predicts input
VGG16 (ImageNet training: 1.2M samples, 1000 classes), Conv-0
Typical “Winner-Take-All” Conv-AE on Satellite RGB imagery – poor Conv-0 structure
Improvements in Low-level Conv Filters
Improved Harris Conv-AE on Satellite RGB imagery – better Conv-0 structure, closer to VGG16Mods: Train all layers together, Enforce Sparsity, Selectively block gradient
ENVIUS | ENVI Classified User Symposium11S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What is Deep Learning to Harris• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium12S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Near ceiling performance in Pan, RGB, MSI
• Robust against occlusions, orientation, image quality
Sample Targets Tested:
o Airplanes
o Storage Tanks
o Sports Stadiums
o Athletic Fields
o Smokestacks
o Cooling Towers
o Clouds
o Crosswalks
o Swimming Pools
o Buildings
o Paved Roads
o Overpasses/Cloverleafs
o Tollbooths
o Beaches
o Cemeteries
o Wind Turbines
o Orchards & Row Crops
Tokyo Int’l Airport IKONOS Pan
20/20 Planes detected, 1 FP
Automatic Target / Feature Detection – 2D Overhead
Sau Paulo, Brazil 30cm WV-3 © 2010 DigitalGlobe, Inc.
43/43 crosswalks detected, 0 FP© 2010 DigitalGlobe, Inc.
ENVIUS | ENVI Classified User Symposium13S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Automatic Feature Detection - Wind Turbine Blade Inspection
Which wind turbines need repair in this wind farm?
• Drone Captured Data
• Thousands of raw images organized
• Deep learning analyzed
• Damage identified and categorized
• Prioritize repair needs and budget
• Change detection through time
• Turbine lifecycle management
Copyright © 2016 EdgeData, LLC
http://edgedata.net/bladeedge/
ENVIUS | ENVI Classified User Symposium14S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
A Closer Look – BladeEdge’s Dashboard
Copyright © 2016 EdgeData, LLC
Co-registered mosaic
ENVIUS | ENVI Classified User Symposium15S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
A Closer Look – BladeEdge’s Dashboard
Copyright © 2016 EdgeData, LLC
Co-registered mosaic
Co-Registered Mosaic
ENVIUS | ENVI Classified User Symposium16S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Tokyo Int’l Airport IKONOS Pan
20/20 Planes detected, 1 FP
• Real-time ground weather intelligence system
• Data from traffic-cam videos• Scene detection instead of target detection
Is it raining? Are the roads wet? Is snow present?
Dramatic performance improvement
p(Wet) AUC = 98.9%
In use today in operational
commercial product
https://www.helios.earth/
Wet
Dry
Traffic-Cam / Road Conditions
ENVIUS | ENVI Classified User Symposium17S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Helios for Autonomous Vehicles
Taking Harris Analytics to Connected Cars at the Consumer Electronics Show
ENVIUS | ENVI Classified User Symposium18S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium19S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Problem: Clandestine airfields in South American countries used for illegal narcotic trafficking
Goal: Detect new airfields and determine activity levels with high temporal resolution data ( Planet )
Case Study – Clandestine Airfields
ENVIUS | ENVI Classified User Symposium20S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Leverage Harris Geospatial expertise in human geography and open source analysis to identify known airfields for training, tipping and queuing and validation
Compare and collect high revisit data over the areas of interest, tipped by human geo information
Explore multiple network architectures for optimal signature recognition
Preliminary results encouraging
Sources: • Planet Imagery• DigitalGlobe EGD Imagery• Ecuadorean geoportal shapefile of known remote landing strips• Google Earth Imagery• Wikimapia• OpenStreetMap
Case Study – Clandestine Airfields
Temporal ConvNet Temporal LSTM Temporal GRU
• Temporal ConvNet
• Temporal LSTM
• Temporal GRU
ENVIUS | ENVI Classified User Symposium21S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Case Study – Airport Traffic Monitoring
Goal – Identify broad trends in air traffic
• Deep Learning a critical building block
• Object ID: Find airplanes on tarmac
• Combined with other methods to make monitoring feasible
• Too time consuming for analysts to look at every image
• Need a way to compress information
• Aggregation of multiple detection algorithm results over time to identify patterns and anomalies in behaviour
• Applicable to many similar monitoring problems
Test Case:
• Data: WorldView-3 Pan
• Airport: London Southend
• Multiple observations over several
months
Object Detection
• LeNet-style 5-layer model converted
for fully convolutional operation
• Detects all large airplanes
ENVIUS | ENVI Classified User Symposium22S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Dashboard: Collects detection results for each observation
ENVIUS | ENVI Classified User Symposium23S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Collecting Training Data for Meta-Model
Observational
Data
@time n
Object Detection Algorithms
Dashboard
Analyst “label” for observation n
me
tad
ata
(d
ate
, tim
e, lo
catio
n)
image sample
(ch
ips o
r d
ow
nsiz
ed im
ag
e)
new training sample for decision
support model is collected each
time an analyst looks at
dashboard and assigns a label,
e.g.,
• Condition normal
• Emergency response
• Snowstorm
• Flood
• Power outage
• Traffic jam
• Suspicious activity
ENVIUS | ENVI Classified User Symposium24S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium25S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
LiDAR Point Cloud is Dense with Information
LiDAR sensors have very powerful discriminatory capabilities.
Can DL models learn to recognize objects and terrain types?
Many Challenges
• Raw 3D point clouds are difficult to use compared to images.
• Typically very large and spatially dense, yet sparse in Z-dim.
• Complementary to imagery, could be used together if available.
• Harder to label
• By individual point, or 3D bounding box?
• Software tools less developed (or don’t exist).
• Not much DL research on airborne LiDAR classification.
3D point cloud color-coded by height
ENVIUS | ENVI Classified User Symposium26S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
LiDAR in ConvNets
Data must be transformed into ConvNet-consumable form.
How to Preserve LiDAR context in space?
• Satellite imagery is almost always nadir.
• Natural Images are usually from human perspective.
• 3D point cloud is disconnected from sensor location.
Geiger-Mode LiDAR point cloud
Voxel Grid Overlay
Voxel map extracted from point cloud w/discretized 3D grid:
• Binary (is there a return from this region?)
• Aggregate ( relative density of returns in region)
• Multichannel (density + intensity, rgb, other stats)
ENVIUS | ENVI Classified User Symposium27S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multi-Planar Reprojection (MPR) for LiDAR
2D ConvNet Model• Early approach to object detection in 3D point clouds• Very successful, but somewhat limited applicability
Multi-Planar Reconstruction (MPR)• Flatten 3D Voxel Map into 2D views• 2D views capture mean density across collapsed dim• Some 3D signal is retained – like an X-Ray• Number and orientation of planes are configurable• Commonly use orthogonal planes: [XY, XZ, YZ]• Some data amenable to augmentation via rotation about Z-Axis
• Equivalent to using rotated planes• Increases number of observed orientations
Railroad Signal
Voxel Map
ENVIUS | ENVI Classified User Symposium28S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
MPR Model Data Flow
High-density point cloudapprox 2000-4000 pts/m2
5 cm voxel gridmean intensity/voxel[80 x 80 x 350]
MPR Views
XZ YZ
XY
Lowest 50 cm “ground” data are ignored
Target Object : Railroad Signals
• Volumetric Sample (ROI): 4m x 4m x 17.5 m
• External Sliding Window for Test Volume
European railroad asset inventory project
• High density terrestrial LiDAR from moving railcar
Sig
na
l
Conv 2
Conv 1
Conv 2
Conv 1
Fu
lly-C
onnecte
d
Not-S
ignal
Conv 1
Con
v 2
ENVIUS | ENVI Classified User Symposium29S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
RR Signals Heat Map
• Extension of 2D ConvNet to 3D data source
• Finds variety of 3D objects (signals, crossings, boxes, poles)
• Ongoing model development for customer
• 2nd Generation Model
• 3D ConvNet
• Faster and more flexible
• Less sensitive to location and size of objects
Heat Map
(all detections shown)
Full point cloud(held-out test area)
Test region
~ 200m x 130m
Test Target: Railroad
Signals
ENVIUS | ENVI Classified User Symposium30S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Large-Scale Point Cloud Classification Benchmark
• Public domain challenge dataset (http://www.semantic3d.net/)
• 15 urban and suburban scenes with point-level labelling
• Data includes intensity and RGB values
• Highly imbalanced classes
• Some ambiguity between classes
• Submissions judged on Intersection-over-Union and Overall Accuracy
Classes:
1. man-made terrain
2. natural terrain
3. high vegetation
4. low vegetation
5. buildings
6. hard scape
7. scanning artifacts
8. cars
Example scene from challenge dataset
ENVIUS | ENVI Classified User Symposium31S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Evaluation of Initial LiDAR Models
Data used internally to test performance of early 2D and 3D ConvNet models
ENVIUS | ENVI Classified User Symposium32S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Current Leaderboard
HarrisNet model was top of leaderboard at time of Submission (Oct’16)
• Multi-scale 3D ConvNet of Voxel Neighborhood Metrics
• Participation was halted after initial submission
man
made
terrain
natural
terrain hi-veg lo-veg bldg hardscape artifacts cars
ENVIUS | ENVI Classified User Symposium33S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Geigermode LIDAR Applications
• Bare ground point labeling
• Improvements to base terrain classifications
• Utility Infrastructure (transmission, distribution, and wires)
• Building footprints
• Building rooftop geometry
DL used as building blocks for developing advanced products.
Utility, Insurance, Municipal Mapping, Urban Planning.
ENVIUS | ENVI Classified User Symposium34S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
New Class of Models Under Development
Multi-scale Fully Convolutional Segmentation
Best of U-Net and DenseNet
• Multi-scale feature representation that can
be used for fine grained predictions
• Inspired by work from Ronneberger et al.
(2015) and Jégou et al. (2016)
• Convolutional feature extraction trunk
• Scale-specific predictors upsampled or
deconvolved to yield better resolution
• Data from earlier layers carried forward
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference
on Medical Image Computing and Computer-Assisted Intervention (MICAI), 2015. (https://arxiv.org/pdf/1505.04597.pdf)
S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for
Semantic Segmentation,” CVPR, 2016. (https://arxiv.org/pdf/1611.09326.pdf)
ENVIUS | ENVI Classified User Symposium35S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Promising results from new models
Multi-scale Fully Convolutional Segmentation Results
Bare Ground point
classification in GM-LiDAR
Building Footprints
in WV-3 imagery
(SpaceNet)
TP
TN
FP
FN
Contingency Matrix
ENVIUS | ENVI Classified User Symposium36S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium37S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multiple Data Inputs & Conv Features
What’s the best way to handle multiple data inputs?
1. different data dimensions
• input 1 = [200, 200, 3] Panchromatic mage
• input 2 = [50, 50] Multispectral image
2. different sensor modalities
• input 1 = Multispectral image
• input 2 = LiDAR points (or derived image product)
• input 3 = SAR image
3. disparate data sources
• input 1 = natural image
• input 2 = audiogram
• input 3 = RADON transform
MSI/HSI
SAR
TIME
SERIES
LiDAR
PAN
RADON
AUDIO
ENVIUS | ENVI Classified User Symposium38S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multiple Data Inputs & Conv Features
• Will resizing change the dataset’s features?
• Padding will increase the memory requirements and waste model capacity.
• Will combining data into a single input reduce or enhance the quality of the features?
• Could muddy the waters (blur or average-out any single inputs features)?
• Or, each feature map could end up being specific to only one input channel?
• Deeper into the model the features could be combined in unique and potentially powerful ways.
• Perhaps more likely when the sensor or data sources are similar?
Alternate Approach: Keep the inputs separate during the Conv feature extraction stage.
Input 2
Input 1
Stacked into single
input with 2 channels
Input 1
Input 2
Input 2
Input 1
or
ENVIUS | ENVI Classified User Symposium39S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multi-Stream Model Architecture
Class 1
Fully-Connected Layer
Input 1 Input 2 Input 3 Input 4
Fully-Connected Layer
Class 2 Class 3
Individual
Convolutional
Feature
Extractions
A very flexible model architecture
• Enables natural data fusion – all data input in original form
• Maximizes feature extraction from complimentary data independently
• Effective way of providing critical context for targeted task
“Late” Data Fusion
ENVIUS | ENVI Classified User Symposium40S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Image and Derived Data in Multi-Stream
Conv250 filters / Kernel 5 / Stride 1
Rect-Linear / Pool 2
Conv140 filters / Kernel 5 / Stride 2
Rect-Linear / Pool 2
Fully-Connected 11000 units / Rect-Linear / Dropout
Fully-Connected 2500 units / Rect-Linear / Dropout
Conv250 filters / Kernel 5 / Stride 1
Rect-Linear / Pool 2
Conv250 filters / Kernel 5 / Stride 1
Rect-Linear / Pool 2
Conv140 filters / Kernel 5 / Stride 2
Rect-Linear / Pool 2
Conv140 filters / Kernel 5 / Stride 2
Rect-Linear / Pool 2
PanRadon Transform
Log Power Spectrum
Airplane Not-airplane
Data Fusion for Optimal Detector Performance with Minimal Training Data• Derived datasets have different failure modes (precision/recall performance)
• Combining all 3 into single model yields most robust classifier
• Effective means of defeating most persistent false positivesSuccessful proof-of-concept for
fusing disparate data types.
ENVIUS | ENVI Classified User Symposium41S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Heat Map Improvements w/Multi-Stream
Tokyo Int’l Airport IKONOS Pan
Persistent false positives overcome with use of derived
data in multi-stream model with highly limited training data
Multi-Stream Inputs Padded & Stacked Inputs
ENVIUS | ENVI Classified User Symposium42S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multiple 2D Density Maps from LiDAR Point Cloud
XZ YZ
XY
Voxel Map density collapsed
onto 3 orthogonal planes
Not-T
arg
etC
onv2
50 filte
rs / K
ern
el 5
/ Strid
e 1
Rect-L
inear / P
ool 2
Conv1
40 filte
rs / K
ern
el 5
/ Strid
e 2
Rect-L
inear / P
ool 2
Conv2
50 filte
rs / K
ern
el 5
/ Strid
e 1
Rect-L
inear / P
ool 2
Conv1
40 filte
rs / K
ern
el 5
/ Strid
e 2
Rect-L
ine
ar / P
oo
l 2
Fully
-Connecte
d1000 u
nits
/ Rect-L
inear / D
ropout
Targ
et
Con
v1
40 filte
rs / K
ern
el 5
/ Strid
e 2
Rect-L
inear / P
ool 2
Conv2
50
filters
/ Ke
rne
l 5 / S
tride
1
Rect-L
inear / P
ool 2
RR-Signals or other
object in 3D Point Cloud
2D Views
Fully
-Connecte
d500 u
nits
/ Rect-L
inear / D
ropout
ENVIUS | ENVI Classified User Symposium43S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multi-Stream for Foveal Vision Models
Foveal (central vision)
Hi-Res (fine acuity)
16 m scale @ 0.5 m GSD
object identity is ambiguous
Context (peripheral vision)
Low-Res
128 m scale @ 2m GSD
object identity is clear
Adding context almost always improves vision performance.
Need both scales in
many cases to make
correct object ID
ENVIUS | ENVI Classified User Symposium44S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Foveal Multi-Stream for Land Use Classification
HARRIS ImageLinks, 8-bit, 4-band WV-2, Melbourne, FL Analyst-marked Ground Truth (5-class)
Bare Ground
Water
Veg
Buildings
Transportation
Worldview2 4-Band Multispectral Imagery
ENVIUS | ENVI Classified User Symposium45S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multiple Input Arrangements Investigated
Best practices spectral mapping / (pansharp > QUAC > SAM)
Bldg: 14% Improvement over benchmark*
Trans: 25% Improvement over benchmark
Landcover Classification Results
Multi-stream ConvNet
Confusion Matrix
*Game-theoretic combination of 4 algorithms w/CRF post-processing.
Optimizing Supervised Learning for Pixel Labeling and Material Classification, Rahmes, et al 2014.
ENVIUS | ENVI Classified User Symposium46S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Mapping Roads in Mombassa w/Foveal Multi-stream
WV-2 Dataset
• 8-band MSI (2m/pix)
• Pan (0.5 m/pix)
• ~10k pix DIM
Complicated Road Networks
• Highly variable in
appearance, materials,
markings, size
• Dense, sprawling
development creates many
obstructions
No labeled data
ENVIUS | ENVI Classified User Symposium47S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Training Data Preparation
No Training (labeled) Data Provided
• Implemented a semi-automated approach
• Manually digitize approximate polyline vectors of major roads
• Define sampling distance (16-pixels in ½ m Pan)
• Extract “POS” road image patches along road vectors
• Compute Buffer Zone Image for road polylines
• Extract ‘NEG” non-road image patches at intersection of buffer zone and
sampling grid
Training data is sloppy
• Smaller roads in NEG
• Initial classifier is weak
• Iterate to improve training
1. Use initial model to find high
probability POS/NEG results
2. Update training data
3. Retrain…
25-pix road buffer (blue) 16-pix sampling grid (red)
375k NEG / 15k POS
POS Aug 10x via rot to 150k
ENVIUS | ENVI Classified User Symposium48S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Multi-Scale Foveal Model & Results
4-Input Data Streams
• Progressive scales of observation
• Precise co-registration not required
• Spectrum provided as separate input
Road networks in Mombassa• R0: Detected roads overlaid on TrueColor RGB
Foveal Vision Model
ENVIUS | ENVI Classified User Symposium49S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Questions?
Will RorrerMachine Learning Business [email protected]
Look for upcoming blog posts about Machine Learning at Harris’ Blog website:
http://www.harrisgeospatial.com/Company/PressRoom/Blogs.aspx
David GorodetzkyResearch Lead for Remote Sensing and Machine [email protected]
Trademarks are registered marks of their respective companies.
ENVIUS | ENVI Classified User Symposium50S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Backup Material
ENVIUS | ENVI Classified User Symposium51S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Technology to Connect, Inform and ProtectTM
Advanced technologies for customers whose missions
are vital to the world’s safety and security
Technology innovator with industry leading
commitment to research and development
Agile, commercial mindset to meet the most
demanding budgets and deadlines
The result: innovation with a purpose and missions
that succeed
Our promise is simple: to connect, inform and protect
ENVIUS | ENVI Classified User Symposium52S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
About Harris
NYSE: HRS
17,000 employees worldwide
7,700 engineers and scientists
Supports customers in more
than 100 countries
Industry leading R&D investment
Broad portfolio, strong franchises, key experts, and
a proven track record of innovation and success
*Pro forma
United States
International
FY16
~$6 Billion
Annual
Revenue*
ENVIUS | ENVI Classified User Symposium53S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Harris Geospatial Solutions
From sensors and software to actionable information,
our solutions and products help you make informed
decisions – when and where they are needed.
• Off-the-shelf and custom solutions for advanced geospatial
analysis
• Tools for hosting and managing distributed data processing in a
high-performance computing environment
• An online marketplace that puts the best commercially available
geospatial data and imagery at your fingers.
• Innovative Geiger-mode LiDAR sensor to collect data faster and
at 10x the resolution of traditional LiDAR.
• Value-added services (VAS) that give you actionable information.
Geospatial
Marketplace
Geiger-mode
LiDAR
Custom
Services
Jagwire
ENVI
IDL
ENVIUS | ENVI Classified User Symposium54S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Input image patch (pixels)
What is Deep Learning to Harris?
ENVIUS | ENVI Classified User Symposium55S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Object Detection Automatic Language Translation
Driverless CarsCaption Generation
Social Media & Online Shopping
Deep Learning Applications
ENVIUS | ENVI Classified User Symposium56S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium57S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium58S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium59S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium60S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Planet Data Deep Stack –15 July 2016 - 3 Jan 2017
ENVIUS | ENVI Classified User Symposium61S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Harvest Training Data
ENVIUS | ENVI Classified User Symposium62S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Data Set Collection and Annotation
Date May 2 ‘16 July 15 ‘16 Aug 2 ‘16 Oct 11’16 Jan 3 ‘17
Scene
Annotation
Activity No Airfield Airfield
Operational
Airfield
“Cratered”
Airfield
Operational
Airfield
“Cratered”
ENVIUS | ENVI Classified User Symposium63S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Data Augmentation (flips and 90 degree rotations)
Original Scene: Horizontal Flip:
Vertical Flip: Rotation, 90 deg: Rotation, 90 deg + vertical flip:
- Airfield (Cratered or Operational)
- No Airfield
ENVIUS | ENVI Classified User Symposium64S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Network Architectures
ConvNet
Input
ConvNet
Input
ConvNet
Input
ConvNet
Input
ConvNet
Input
Output
Network
Time
3 Network Architectures Explored:
• Temporal ConvNet (Dense)• ConvNet Parameters are shared,
extension of Siamese
• 3 Layers of VGG16
• Temporal LSTM
• Memory unit based
• GRU
• Memory based
• More efficient that LSTM
ENVIUS | ENVI Classified User Symposium65S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Preliminary Results - Heatmaps
Architecture 1: Temporal ConvNet
Architecture 2: Temporal LSTM
Architecture 2: GRU
Next Steps:
• Need more positive samples (Annotate
more data)
• There is a lot of room to fine tune model
parameters (both the feature extraction
block and the memory blocks)
• DenseNet/ResNet–style skip connections
may prove beneficial
• Incorporate metadata into decision loop
(time since last frame, time of the day, etc.)
ENVIUS | ENVI Classified User Symposium66S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Results
Accuracy,
Overall
Accuracy,
Positive
Accuracy,
Negative
ConvNet 0.62 0.28 0.97
LSTM 0.93 0.89 0.96
GRU 0.91 0.87 0.94
ENVIUS | ENVI Classified User Symposium67S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Results
Test Case:
• Data: WorldView-3 Pan
• Airport: London Southend
• Multiple observations over several months
Object Detection
• LeNet-style 5-layer model converted for fully convolutional operation
• Detects all large airplanes
ENVIUS | ENVI Classified User Symposium68S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium69S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research
• Label Data Burden
• Automatic Target / Feature Detection
• 2D Overhead Satellite Imagery
• Forward Facing Drone Imagery
• Determine Road Conditions
• High Temporal Resolution / Frequent Revisit Data
• Airfield Detection And Activity
• Airport Monitoring
• LIDAR Data
• Multiple Stream
• Point Cloud 3D Object Detection
• Land Use Classification
• Road Classification
• Q & A
Agenda
ENVIUS | ENVI Classified User Symposium70S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Ground Truth5 Classes
Bare Ground
Water
Veg
Buildings
Transportation
ENVIUS | ENVI Classified User Symposium71S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Confusion Between Classes
Existing classifiers had trouble with class confusion:
Buildings v. Transportation
• Spectrally similar (made from same materials in some cases)
• Textures can be similar
• Scale of observation is critical to interpretation
• Provides aspect ratio, shape, size info
Veg v. Bare Ground (less problematic but still significant)
• Spectrally similar when veg is unhealthy, senescent, or data poorly calibrated
• Object shape is often not diagnostic
• Small plants hard to resolve spatially (leading to similar textures)
Use multi-stream inputs at progressively lower resolution to provide context
ENVIUS | ENVI Classified User Symposium72S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
ImageLinks Model Architectures
water
FC1 (4096)
spectrum 64-pix
4-band
0.5m
neigh
128 pix
4-band
1m
neigh
FC2 (500)
bare bldg transveg
4
Experiment w/inserting 4-band spectrum into embedding at various depths
Dropout @ 0.5
Dropout @ 0.5
ENVIUS | ENVI Classified User Symposium73S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
ImageLinks Model Architectures
A variety of architectures were evaluated for a proof-of-concept using 4-band pan-sharpened imagery:
Exp1
Stream 1: 16-pixel neighborhood [4x16x16]
Stream 2: 64-pixel neighborhood [4x64x64]
Stream 3: 256 pixel neighborhood resampled to 1m GSD [4x128x128]
Exp2
Stream 1: spectrum of central pixel introduced at FC1 (4096)
Stream 2/3: same as above
Exp3
Stream 1: spectrum of central pixel introduced at FC2 (500)
Stream 2/3: same as above
Exp4
Stream 1: spectrum of central pixel introduced after FC2 (500) <after dropout>
Stream 2/3: same as above
ENVIUS | ENVI Classified User Symposium74S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Caffe ConvNet Architecture Details
Stream 1 (16-pix neighborhood @ 0.5 m GSD)
Conv-1: 40, 3x3, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)
Conv-2: 50, 5x5, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)
Stream 2 (64-pix neighborhood @ 0.5 m GSD)
Identical to Stream 1
Stream 3 (128-pix neighborhood resampled to 1m)*
Conv-1: 64, 12x12, stride 4 + ReLU + LRN + Max Pool (3x3, stride 1)
Conv-2: 112, 4x4, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)*
Conv-3: 112, 3x3, stride 1 + ReLU + LRN
FC Layers
4096 + ReLU + Dropout (0.5)
500 + ReLU + Dropout (0.5)
5
*Based on Vlad Mnih, PhD Dissertation, U Toronto, 2013
ENVIUS | ENVI Classified User Symposium75S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
ImageLinks Data Prep
8-bit 4-band WV-2 Data [7900 x 5700]
Pansharpened using Gram-Schmidt Sharpening (4-band output)
Mean-centered
No augmentation
Random uniform sampling across 75% of image at 100k locations
- Approx 0.2 % of image sampled for training
- Training data were unbalanced
- Proportion approx. matched distribution in image
85k for training, 5k for validation, and 10k for test
Compared to “SAM Baseline”
- Best practices spectral classification
- Gram-Schmidt Sharpening
- QUAC atmospheric correction (8-bit data is not radiometrically calibrated)
- SAM using spectral means from training data
Compared to Game-theoretic competition among 4 supervised classifiers
- Optimizing Supervised Learning for Pixel Labeling and Material Classification, Rahmes, et al 2014.
ENVIUS | ENVI Classified User Symposium76S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Ground Truth
Multi-Stream
Model (MSM)
SAM “Baseline”(GS-Sharp > QUAC)
Full Scene Results
*Rahmes, et al Classification Image Not Available
ENVIUS | ENVI Classified User Symposium77S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Ground Truth
MSM
SAM(confusion in Bldg , Trans, Bare Ground)
Significant Improvement in Confused Classes
ENVIUS | ENVI Classified User Symposium78S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
(Exp1)
Confusion Matrix Comparison
MSM
Rahmes, et al
ENVIUS | ENVI Classified User Symposium79S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
ImageLinks Experiment
Done for proof-of-concept only
- Models never improved or optimized after first results
- No augmentation
- Not tested on other datasets
- Would be good to see how well this model generalizes
- The impact of training set balancing (bias) on independent test area
Why Pansharp first? Use each dataset in original form
- Didn’t do the experiment to compare results, but this makes sense!
- Pansharp algorithm will alter the data and potentially introduce artifacts
- NN will learn relationships b/w the Pan and MSI inputs
All models did very well (about 94% for validation set accuracy)
- Exp1 (i.e., no spectrum) had best confusion matrix on held out test set
- Why is use of spectrum not helping?
- 4-band spectrum is too weak of a contributor to embedding vector?
- Spectral data poorly calibrated thus spectral quality not great?
- Too much spectral variability w/in class?
- Spectral signal adequately learned through 4-band spatial data?
ENVIUS | ENVI Classified User Symposium80S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Mombassa Road Extraction
Caffe / CNN Model Architecture and Params
Pan Streams
Conv-1: 40, 3x3, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)
Conv-2: 50, 5x5, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)
CIR Streams*
Conv-1: 64, 12x12, stride 4 + ReLU + LRN + Max Pool (3x3, stride 1)
Conv-2: 112, 4x4, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)
Conv-3: 112, 3x3, stride 1 + ReLU + LRN
FC Layers
4096 + ReLU + Dropout (0.5)
500 + ReLU + Dropout (0.5)
2
All weight inits with xavier / all bias init as constant
LRN: across channels, local size 5, alpha 1, beta 5
Base LR: 0.01, policy: “inv”
Momentum: 0.9
Pre-Processing
• calibrated to at-sensor radiance > linear stretch to 8-bit
*Architecture based on Vlad Mnih PhD dissertation, U Toronto, 2013
Weight Decay: 0.0005
Gamma: 0.001
Power: 0.75
Mini batch size: 128
ENVIUS | ENVI Classified User Symposium81S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information
Mombassa Road Extraction
Results from 4-stream and 2-Stream* models nearly identical in performance:
Stream 1: 32-pix Pan patch at 0.5 m/pix (16x16 m coverage)*
Stream 2: 50-pix Pan patch at 1 m/pix (50x50 m coverage)
Stream 3: 50-pix CIR patch at 2 m/pix (100x100 m coverage)*
Stream 4: 50-pix CIR patch at 4 m/pix (200x200 m coverage)
Best validation accuracy 4-stream: 99.39% @ iter 738kEER = 2.85 | AUC = 99.4
Sensitivity (Recall) = 95.5
Specificity (Precision) = 98.7
Best validation accuracy 2-stream: 98.69% @ iter 910kEER = 2.46 | AUC = 99.6
Sensitivity (Recall) = 95.5
Specificity (Precision) = 98
Proof-of-Concept Model Only
• Model was not refined w/improved training data or optimal arch/params
• Model not applied to other data
prediction = POS True Positive False Positive
predicion = NEG False Negative True Negative
189 54
9 3983
prediction = POS True Positive False Positive
predicion = NEG False Negative True Negative
189 82
9 3955