S7582 - DEEP LEARNING APPLICATIONS ACROSS THE BREADTH...

S7582 - DEEP LEARNING APPLICATIONS ACROSS THE BREADTH OF GEOINT

WILL RORRER

PROGRAM MANAGER

DAVID GORODETZKY

RESEARCH LEAD FOR REMOTE SENSING AND MACHINE LEARNING

NVIDIA GPU Technology Conference08 May 2017

ENVIUS | ENVI Classified User Symposium2S7582 - Deep Learning Applications Across the Breadth of GEOINT NON-Export Controlled Information

• Harris Corporation Introduction• What Is Deep Learning to Harris?• Harris Deep Learning Solutions And Research

• Label Data Burden

• Automatic Target / Feature Detection

• 2D Overhead Satellite Imagery

• Forward Facing Drone Imagery

• Determine Road Conditions

• High Temporal Resolution / Frequent Revisit Data

• Airfield Detection And Activity

• Airport Monitoring

• LIDAR Data

• Multiple Stream

• Point Cloud 3D Object Detection

• Land Use Classification

• Road Classification

• Q & A

Agenda


Segment overviews

Space and Intelligence Systems

$1.9BComplete Earth observation, weather, geospatial,

space protection and intelligence solutions from

advanced sensors, antennas and payloads, as well as

ground processing and information analytics

Electronic Systems

$2.1BExtensive portfolio serving the defense industry with

electronic warfare, avionics, robotics, advanced

communications and maritime systems as well as air

traffic management solutions for the civil aviation industry

Communication Systems $1.9BTactical and airborne radios, night vision technology,

and defense and public safety networks

Pro forma revenue FY16


Harris Corp. has been internally investing in taking

state-of-the-art deep learning technologies and

applying them to remote sensing and geospatial

intelligence customer problems

How long?

• Harris has been working on deep learning for five years

How much?

• Multimillion dollar internal research and development

investment in the last three years

• Additional commercialization investment

Investment Approach

• Research

• Pilot Projects

• Software tool development

• Commercialization

Focus Areas:

Reducing the Cost of Training

• Reduce dollar cost, human cost, and

computer cost of building new models

Extensibility

• Ability to quickly redeploy and

repackage tools to support new problem

sets

Support multiple types of data

• New sensors and data fusion

Automation

• Ability for processes to interface to tools,

removing human from the loop

Deep Learning at Harris


• Harris Corporation Introduction• What Is Deep Learning To Harris?• Harris Deep Learning Solutions and Research









• LIDAR Data

• Multiple Stream




• Q & A

Agenda


1.2M training images

1,000 object categories

2016 ImageNet Challenge

Object Classification Winner:

Trimps-Soushen

2.99% Error Rate

Label Data Burden Reduction Background

Graph from www.image-net.org

http://www.image-net.org/


Labeled Data is the Currency of ML

Real world problems don’t have an ImageNet for training.

How do we overcome the burden of labeled data?

Do More With Less

• Pretrain the network and Transfer Learn where possible

• Enforce Domain Adaptation if necessary

• Use synthetic data from models or simulators

• Use Curriculum Learning and Dynamically Balanced datasets

Use Unsupervised and Active Learning

• Use a deep embedding to optimize clustering of unlabeled data

• Iterate training of weakly supervised models to progressively improve training set

• Use Convolutional Autoencoders or GANs to generate embedding from unlabeled data

• Use Siamese NN to find similar labels or extract an embedding

Auto-ML• Automated methods to search for optimal model hyperparameters


DIRSIG

10 am, 7 degree look angle, Jan 1, Scene

Azimuth 0

2 pm, 7 degree look angle, Jan 1, Scene

Azimuth 225

Use of Synthetic Training Data

• 100% of training data synthesized using CAD models and Scene Simulator

• The trained model is applied to real imagery

• Successful detector produced for fighter jets in WV-2 Pan imagery

• Limiting factors: (1) Content of scene generator (2) Quality of Simulation

6 CAD models used

Objects placed in scene at

various geometries

Heat Map for fighter jets in IKONOS

Pan Imagery


Auto-Clustering in Deep Embedding Space

+Drop

365

fc7

Deep Embedding from VGG-16 trained on Places365 scenes used to cluster mixed

RGB imagery into like labels

VGG-16$

* Places: An Image Database for Deep Scene Understanding, B. Zhou, A. Khosla, A. Lapedriza, A. Torralba and A. Oliva arXiv:1610.02055$ Very Deep Convolutional Networks for Large-Scale Image Recognition K. Simonyan, A. Zisserman arXiv:1409.1556

*

Unlabeled, mixed RGB

images

Roads

Golf, Baseball

Harbors,

Parking Lots


Improvements to Conv-Autoencoder

One of the few unsupervised methods for training CNNs.

Can they be as discriminative as supervised training?

• When conv-autoencoder is applied to satellite imagery tend to learn simple convolutional features

• Far less effective for transfer learning then supervised classification CNNs

output predicts input

VGG16 (ImageNet training: 1.2M samples, 1000 classes), Conv-0

Typical “Winner-Take-All” Conv-AE on Satellite RGB imagery – poor Conv-0 structure

Improvements in Low-level Conv Filters

Improved Harris Conv-AE on Satellite RGB imagery – better Conv-0 structure, closer to VGG16Mods: Train all layers together, Enforce Sparsity, Selectively block gradient


• Harris Corporation Introduction• What is Deep Learning to Harris• Harris Deep Learning Solutions and Research









• LIDAR Data

• Multiple Stream




• Q & A

Agenda


• Near ceiling performance in Pan, RGB, MSI

• Robust against occlusions, orientation, image quality

Sample Targets Tested:

o Airplanes

o Storage Tanks

o Sports Stadiums

o Athletic Fields

o Smokestacks

o Cooling Towers

o Clouds

o Crosswalks

o Swimming Pools

o Buildings

o Paved Roads

o Overpasses/Cloverleafs

o Tollbooths

o Beaches

o Cemeteries

o Wind Turbines

o Orchards & Row Crops

Tokyo Int’l Airport IKONOS Pan

20/20 Planes detected, 1 FP

Automatic Target / Feature Detection – 2D Overhead

Sau Paulo, Brazil 30cm WV-3 © 2010 DigitalGlobe, Inc.

43/43 crosswalks detected, 0 FP© 2010 DigitalGlobe, Inc.


Automatic Feature Detection - Wind Turbine Blade Inspection

Which wind turbines need repair in this wind farm?

• Drone Captured Data

• Thousands of raw images organized

• Deep learning analyzed

• Damage identified and categorized

• Prioritize repair needs and budget

• Change detection through time

• Turbine lifecycle management

Copyright © 2016 EdgeData, LLC

http://edgedata.net/bladeedge/

http://edgedata.net/bladeedge/


A Closer Look – BladeEdge’s Dashboard


Co-registered mosaic


A Closer Look – BladeEdge’s Dashboard


Co-registered mosaic

Co-Registered Mosaic



20/20 Planes detected, 1 FP

• Real-time ground weather intelligence system

• Data from traffic-cam videos• Scene detection instead of target detection

Is it raining? Are the roads wet? Is snow present?

Dramatic performance improvement

p(Wet) AUC = 98.9%

In use today in operational

commercial product

https://www.helios.earth/

Wet

Dry

Traffic-Cam / Road Conditions

https://www.helios.earth/


Helios for Autonomous Vehicles

Taking Harris Analytics to Connected Cars at the Consumer Electronics Show

https://www.harris.com/impact/2016/12/taking-harris-analytics-to-connected-cars-at-the-consumer-electronics-show











• LIDAR Data

• Multiple Stream




• Q & A

Agenda


Problem: Clandestine airfields in South American countries used for illegal narcotic trafficking

Goal: Detect new airfields and determine activity levels with high temporal resolution data ( Planet )

Case Study – Clandestine Airfields


Leverage Harris Geospatial expertise in human geography and open source analysis to identify known airfields for training, tipping and queuing and validation

Compare and collect high revisit data over the areas of interest, tipped by human geo information

Explore multiple network architectures for optimal signature recognition

Preliminary results encouraging

Sources: • Planet Imagery• DigitalGlobe EGD Imagery• Ecuadorean geoportal shapefile of known remote landing strips• Google Earth Imagery• Wikimapia• OpenStreetMap

Case Study – Clandestine Airfields

Temporal ConvNet Temporal LSTM Temporal GRU

• Temporal ConvNet

• Temporal LSTM

• Temporal GRU


Case Study – Airport Traffic Monitoring

Goal – Identify broad trends in air traffic

• Deep Learning a critical building block

• Object ID: Find airplanes on tarmac

• Combined with other methods to make monitoring feasible

• Too time consuming for analysts to look at every image

• Need a way to compress information

• Aggregation of multiple detection algorithm results over time to identify patterns and anomalies in behaviour

• Applicable to many similar monitoring problems

Test Case:

• Data: WorldView-3 Pan

• Airport: London Southend

• Multiple observations over several

months

Object Detection

• LeNet-style 5-layer model converted

for fully convolutional operation

• Detects all large airplanes


Dashboard: Collects detection results for each observation


Collecting Training Data for Meta-Model

Observational

Data

@time n

Object Detection Algorithms

Dashboard

Analyst “label” for observation n

me

tad

ata

(d

ate

, tim

e, lo

catio

n)

image sample

(ch

ips o

r d

ow

nsiz

ed im

ag

e)

new training sample for decision

support model is collected each

time an analyst looks at

dashboard and assigns a label,

e.g.,

• Condition normal

• Emergency response

• Snowstorm

• Flood

• Power outage

• Traffic jam

• Suspicious activity











• LIDAR Data

• Multiple Stream




• Q & A

Agenda


LiDAR Point Cloud is Dense with Information

LiDAR sensors have very powerful discriminatory capabilities.

Can DL models learn to recognize objects and terrain types?

Many Challenges

• Raw 3D point clouds are difficult to use compared to images.

• Typically very large and spatially dense, yet sparse in Z-dim.

• Complementary to imagery, could be used together if available.

• Harder to label

• By individual point, or 3D bounding box?

• Software tools less developed (or don’t exist).

• Not much DL research on airborne LiDAR classification.

3D point cloud color-coded by height


LiDAR in ConvNets

Data must be transformed into ConvNet-consumable form.

How to Preserve LiDAR context in space?

• Satellite imagery is almost always nadir.

• Natural Images are usually from human perspective.

• 3D point cloud is disconnected from sensor location.

Geiger-Mode LiDAR point cloud

Voxel Grid Overlay

Voxel map extracted from point cloud w/discretized 3D grid:

• Binary (is there a return from this region?)

• Aggregate ( relative density of returns in region)

• Multichannel (density + intensity, rgb, other stats)


Multi-Planar Reprojection (MPR) for LiDAR

2D ConvNet Model• Early approach to object detection in 3D point clouds• Very successful, but somewhat limited applicability

Multi-Planar Reconstruction (MPR)• Flatten 3D Voxel Map into 2D views• 2D views capture mean density across collapsed dim• Some 3D signal is retained – like an X-Ray• Number and orientation of planes are configurable• Commonly use orthogonal planes: [XY, XZ, YZ]• Some data amenable to augmentation via rotation about Z-Axis

• Equivalent to using rotated planes• Increases number of observed orientations

Railroad Signal

Voxel Map


MPR Model Data Flow

High-density point cloudapprox 2000-4000 pts/m2

5 cm voxel gridmean intensity/voxel[80 x 80 x 350]

MPR Views

XZ YZ

XY

Lowest 50 cm “ground” data are ignored

Target Object : Railroad Signals

• Volumetric Sample (ROI): 4m x 4m x 17.5 m

• External Sliding Window for Test Volume

European railroad asset inventory project

• High density terrestrial LiDAR from moving railcar

Sig

na

l

Conv 2

Conv 1

Conv 2

Conv 1

Fu

lly-C

onnecte

d

Not-S

ignal

Conv 1

Con

v 2


RR Signals Heat Map

• Extension of 2D ConvNet to 3D data source

• Finds variety of 3D objects (signals, crossings, boxes, poles)

• Ongoing model development for customer

• 2nd Generation Model

• 3D ConvNet

• Faster and more flexible

• Less sensitive to location and size of objects

Heat Map

(all detections shown)

Full point cloud(held-out test area)

Test region

~ 200m x 130m

Test Target: Railroad

Signals


Large-Scale Point Cloud Classification Benchmark

• Public domain challenge dataset (http://www.semantic3d.net/)

• 15 urban and suburban scenes with point-level labelling

• Data includes intensity and RGB values

• Highly imbalanced classes

• Some ambiguity between classes

• Submissions judged on Intersection-over-Union and Overall Accuracy

Classes:

1. man-made terrain

2. natural terrain

3. high vegetation

4. low vegetation

5. buildings

6. hard scape

7. scanning artifacts

8. cars

Example scene from challenge dataset


Evaluation of Initial LiDAR Models

Data used internally to test performance of early 2D and 3D ConvNet models


Current Leaderboard

HarrisNet model was top of leaderboard at time of Submission (Oct’16)

• Multi-scale 3D ConvNet of Voxel Neighborhood Metrics

• Participation was halted after initial submission

man

made

terrain

natural

terrain hi-veg lo-veg bldg hardscape artifacts cars


Geigermode LIDAR Applications

• Bare ground point labeling

• Improvements to base terrain classifications

• Utility Infrastructure (transmission, distribution, and wires)

• Building footprints

• Building rooftop geometry

DL used as building blocks for developing advanced products.

Utility, Insurance, Municipal Mapping, Urban Planning.


New Class of Models Under Development

Multi-scale Fully Convolutional Segmentation

Best of U-Net and DenseNet

• Multi-scale feature representation that can

be used for fine grained predictions

• Inspired by work from Ronneberger et al.

(2015) and Jégou et al. (2016)

• Convolutional feature extraction trunk

• Scale-specific predictors upsampled or

deconvolved to yield better resolution

• Data from earlier layers carried forward

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference

on Medical Image Computing and Computer-Assisted Intervention (MICAI), 2015. (https://arxiv.org/pdf/1505.04597.pdf)

S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for

Semantic Segmentation,” CVPR, 2016. (https://arxiv.org/pdf/1611.09326.pdf)


Promising results from new models

Multi-scale Fully Convolutional Segmentation Results

Bare Ground point

classification in GM-LiDAR

Building Footprints

in WV-3 imagery

(SpaceNet)

TP

TN

FP

FN

Contingency Matrix











• LIDAR Data

• Multiple Stream




• Q & A

Agenda


Multiple Data Inputs & Conv Features

What’s the best way to handle multiple data inputs?

1. different data dimensions

• input 1 = [200, 200, 3] Panchromatic mage

• input 2 = [50, 50] Multispectral image

2. different sensor modalities

• input 1 = Multispectral image

• input 2 = LiDAR points (or derived image product)

• input 3 = SAR image

3. disparate data sources

• input 1 = natural image

• input 2 = audiogram

• input 3 = RADON transform

MSI/HSI

SAR

TIME

SERIES

LiDAR

PAN

RADON

AUDIO


Multiple Data Inputs & Conv Features

• Will resizing change the dataset’s features?

• Padding will increase the memory requirements and waste model capacity.

• Will combining data into a single input reduce or enhance the quality of the features?

• Could muddy the waters (blur or average-out any single inputs features)?

• Or, each feature map could end up being specific to only one input channel?

• Deeper into the model the features could be combined in unique and potentially powerful ways.

• Perhaps more likely when the sensor or data sources are similar?

Alternate Approach: Keep the inputs separate during the Conv feature extraction stage.

Input 2

Input 1

Stacked into single

input with 2 channels

Input 1

Input 2

Input 2

Input 1

or


Multi-Stream Model Architecture

Class 1

Fully-Connected Layer

Input 1 Input 2 Input 3 Input 4

Fully-Connected Layer

Class 2 Class 3

Individual

Convolutional

Feature

Extractions

A very flexible model architecture

• Enables natural data fusion – all data input in original form

• Maximizes feature extraction from complimentary data independently

• Effective way of providing critical context for targeted task

“Late” Data Fusion


Image and Derived Data in Multi-Stream

Conv250 filters / Kernel 5 / Stride 1

Rect-Linear / Pool 2



Fully-Connected 11000 units / Rect-Linear / Dropout

Fully-Connected 2500 units / Rect-Linear / Dropout









PanRadon Transform

Log Power Spectrum

Airplane Not-airplane

Data Fusion for Optimal Detector Performance with Minimal Training Data• Derived datasets have different failure modes (precision/recall performance)

• Combining all 3 into single model yields most robust classifier

• Effective means of defeating most persistent false positivesSuccessful proof-of-concept for

fusing disparate data types.


Heat Map Improvements w/Multi-Stream


Persistent false positives overcome with use of derived

data in multi-stream model with highly limited training data

Multi-Stream Inputs Padded & Stacked Inputs


Multiple 2D Density Maps from LiDAR Point Cloud

XZ YZ

XY

Voxel Map density collapsed

onto 3 orthogonal planes

Not-T

arg

etC

onv2

50 filte

rs / K

ern

el 5

/ Strid

e 1

Rect-L

inear / P

ool 2

Conv1

40 filte

rs / K

ern

el 5

/ Strid

e 2

Rect-L

inear / P

ool 2

Conv2

50 filte

rs / K

ern

el 5

/ Strid

e 1

Rect-L

inear / P

ool 2

Conv1

40 filte

rs / K

ern

el 5

/ Strid

e 2

Rect-L

ine

ar / P

oo

l 2

Fully

-Connecte

d1000 u

nits

/ Rect-L

inear / D

ropout

Targ

et

Con

v1

40 filte

rs / K

ern

el 5

/ Strid

e 2

Rect-L

inear / P

ool 2

Conv2

50

filters

/ Ke

rne

l 5 / S

tride

1

Rect-L

inear / P

ool 2

RR-Signals or other

object in 3D Point Cloud

2D Views

Fully

-Connecte

d500 u

nits

/ Rect-L

inear / D

ropout


Multi-Stream for Foveal Vision Models

Foveal (central vision)

Hi-Res (fine acuity)

16 m scale @ 0.5 m GSD

object identity is ambiguous

Context (peripheral vision)

Low-Res

128 m scale @ 2m GSD

object identity is clear

Adding context almost always improves vision performance.

Need both scales in

many cases to make

correct object ID


Foveal Multi-Stream for Land Use Classification

HARRIS ImageLinks, 8-bit, 4-band WV-2, Melbourne, FL Analyst-marked Ground Truth (5-class)

Bare Ground

Water

Veg

Buildings

Transportation

Worldview2 4-Band Multispectral Imagery


Multiple Input Arrangements Investigated

Best practices spectral mapping / (pansharp > QUAC > SAM)

Bldg: 14% Improvement over benchmark*

Trans: 25% Improvement over benchmark

Landcover Classification Results

Multi-stream ConvNet

Confusion Matrix

*Game-theoretic combination of 4 algorithms w/CRF post-processing.

Optimizing Supervised Learning for Pixel Labeling and Material Classification, Rahmes, et al 2014.


Mapping Roads in Mombassa w/Foveal Multi-stream

WV-2 Dataset

• 8-band MSI (2m/pix)

• Pan (0.5 m/pix)

• ~10k pix DIM

Complicated Road Networks

• Highly variable in

appearance, materials,

markings, size

• Dense, sprawling

development creates many

obstructions

No labeled data


Training Data Preparation

No Training (labeled) Data Provided

• Implemented a semi-automated approach

• Manually digitize approximate polyline vectors of major roads

• Define sampling distance (16-pixels in ½ m Pan)

• Extract “POS” road image patches along road vectors

• Compute Buffer Zone Image for road polylines

• Extract ‘NEG” non-road image patches at intersection of buffer zone and

sampling grid

Training data is sloppy

• Smaller roads in NEG

• Initial classifier is weak

• Iterate to improve training

1. Use initial model to find high

probability POS/NEG results

2. Update training data

3. Retrain…

25-pix road buffer (blue) 16-pix sampling grid (red)

375k NEG / 15k POS

POS Aug 10x via rot to 150k


Multi-Scale Foveal Model & Results

4-Input Data Streams

• Progressive scales of observation

• Precise co-registration not required

• Spectrum provided as separate input

Road networks in Mombassa• R0: Detected roads overlaid on TrueColor RGB

Foveal Vision Model


Questions?

Will RorrerMachine Learning Business [email protected]

Look for upcoming blog posts about Machine Learning at Harris’ Blog website:

http://www.harrisgeospatial.com/Company/PressRoom/Blogs.aspx

David GorodetzkyResearch Lead for Remote Sensing and Machine [email protected]

Trademarks are registered marks of their respective companies.

http://www.harrisgeospatial.com/Company/PressRoom/Blogs.aspx


Backup Material


Technology to Connect, Inform and ProtectTM

Advanced technologies for customers whose missions

are vital to the world’s safety and security

Technology innovator with industry leading

commitment to research and development

Agile, commercial mindset to meet the most

demanding budgets and deadlines

The result: innovation with a purpose and missions

that succeed

Our promise is simple: to connect, inform and protect


About Harris

NYSE: HRS

17,000 employees worldwide

7,700 engineers and scientists

Supports customers in more

than 100 countries

Industry leading R&D investment

Broad portfolio, strong franchises, key experts, and

a proven track record of innovation and success

*Pro forma

United States

International

FY16

~$6 Billion

Annual

Revenue*


Harris Geospatial Solutions

From sensors and software to actionable information,

our solutions and products help you make informed

decisions – when and where they are needed.

• Off-the-shelf and custom solutions for advanced geospatial

analysis

• Tools for hosting and managing distributed data processing in a

high-performance computing environment

• An online marketplace that puts the best commercially available

geospatial data and imagery at your fingers.

• Innovative Geiger-mode LiDAR sensor to collect data faster and

at 10x the resolution of traditional LiDAR.

• Value-added services (VAS) that give you actionable information.

Geospatial

Marketplace

Geiger-mode

LiDAR

Custom

Services

Jagwire

ENVI

IDL


Input image patch (pixels)

What is Deep Learning to Harris?


Object Detection Automatic Language Translation

Driverless CarsCaption Generation

Social Media & Online Shopping

Deep Learning Applications











• LIDAR Data

• Multiple Stream




• Q & A

Agenda


Planet Data Deep Stack –15 July 2016 - 3 Jan 2017


Harvest Training Data


Data Set Collection and Annotation

Date May 2 ‘16 July 15 ‘16 Aug 2 ‘16 Oct 11’16 Jan 3 ‘17

Scene

Annotation

Activity No Airfield Airfield

Operational

Airfield

“Cratered”

Airfield

Operational

Airfield

“Cratered”


Data Augmentation (flips and 90 degree rotations)

Original Scene: Horizontal Flip:

Vertical Flip: Rotation, 90 deg: Rotation, 90 deg + vertical flip:

- Airfield (Cratered or Operational)

- No Airfield


Network Architectures

ConvNet

Input

ConvNet

Input

ConvNet

Input

ConvNet

Input

ConvNet

Input

Output

Network

Time

3 Network Architectures Explored:

• Temporal ConvNet (Dense)• ConvNet Parameters are shared,

extension of Siamese

• 3 Layers of VGG16

• Temporal LSTM

• Memory unit based

• GRU

• Memory based

• More efficient that LSTM


Preliminary Results - Heatmaps

Architecture 1: Temporal ConvNet

Architecture 2: Temporal LSTM

Architecture 2: GRU

Next Steps:

• Need more positive samples (Annotate

more data)

• There is a lot of room to fine tune model

parameters (both the feature extraction

block and the memory blocks)

• DenseNet/ResNet–style skip connections

may prove beneficial

• Incorporate metadata into decision loop

(time since last frame, time of the day, etc.)


Results

Accuracy,

Overall

Accuracy,

Positive

Accuracy,

Negative

ConvNet 0.62 0.28 0.97

LSTM 0.93 0.89 0.96

GRU 0.91 0.87 0.94


Results

Test Case:

• Data: WorldView-3 Pan

• Airport: London Southend

• Multiple observations over several months

Object Detection

• LeNet-style 5-layer model converted for fully convolutional operation

• Detects all large airplanes











• LIDAR Data

• Multiple Stream




• Q & A

Agenda


Ground Truth5 Classes

Bare Ground

Water

Veg

Buildings

Transportation


Confusion Between Classes

Existing classifiers had trouble with class confusion:

Buildings v. Transportation

• Spectrally similar (made from same materials in some cases)

• Textures can be similar

• Scale of observation is critical to interpretation

• Provides aspect ratio, shape, size info

Veg v. Bare Ground (less problematic but still significant)

• Spectrally similar when veg is unhealthy, senescent, or data poorly calibrated

• Object shape is often not diagnostic

• Small plants hard to resolve spatially (leading to similar textures)

Use multi-stream inputs at progressively lower resolution to provide context


ImageLinks Model Architectures

water

FC1 (4096)

spectrum 64-pix

4-band

0.5m

neigh

128 pix

4-band

1m

neigh

FC2 (500)

bare bldg transveg

4

Experiment w/inserting 4-band spectrum into embedding at various depths

Dropout @ 0.5

Dropout @ 0.5


ImageLinks Model Architectures

A variety of architectures were evaluated for a proof-of-concept using 4-band pan-sharpened imagery:

Exp1

Stream 1: 16-pixel neighborhood [4x16x16]

Stream 2: 64-pixel neighborhood [4x64x64]

Stream 3: 256 pixel neighborhood resampled to 1m GSD [4x128x128]

Exp2

Stream 1: spectrum of central pixel introduced at FC1 (4096)

Stream 2/3: same as above

Exp3

Stream 1: spectrum of central pixel introduced at FC2 (500)


Exp4

Stream 1: spectrum of central pixel introduced after FC2 (500) <after dropout>



Caffe ConvNet Architecture Details

Stream 1 (16-pix neighborhood @ 0.5 m GSD)

Conv-1: 40, 3x3, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)


Stream 2 (64-pix neighborhood @ 0.5 m GSD)

Identical to Stream 1

Stream 3 (128-pix neighborhood resampled to 1m)*


Conv-2: 112, 4x4, stride 1 + ReLU + LRN + Max Pool (2x2, stride 2)*

Conv-3: 112, 3x3, stride 1 + ReLU + LRN

FC Layers

4096 + ReLU + Dropout (0.5)


5

*Based on Vlad Mnih, PhD Dissertation, U Toronto, 2013


ImageLinks Data Prep

8-bit 4-band WV-2 Data [7900 x 5700]

Pansharpened using Gram-Schmidt Sharpening (4-band output)

Mean-centered

No augmentation

Random uniform sampling across 75% of image at 100k locations

- Approx 0.2 % of image sampled for training

- Training data were unbalanced

- Proportion approx. matched distribution in image

85k for training, 5k for validation, and 10k for test

Compared to “SAM Baseline”

- Best practices spectral classification

- Gram-Schmidt Sharpening

- QUAC atmospheric correction (8-bit data is not radiometrically calibrated)

- SAM using spectral means from training data

Compared to Game-theoretic competition among 4 supervised classifiers

- Optimizing Supervised Learning for Pixel Labeling and Material Classification, Rahmes, et al 2014.


Ground Truth

Multi-Stream

Model (MSM)

SAM “Baseline”(GS-Sharp > QUAC)

Full Scene Results

*Rahmes, et al Classification Image Not Available


Ground Truth

MSM

SAM(confusion in Bldg , Trans, Bare Ground)

Significant Improvement in Confused Classes


(Exp1)

Confusion Matrix Comparison

MSM

Rahmes, et al


ImageLinks Experiment

Done for proof-of-concept only

- Models never improved or optimized after first results

- No augmentation

- Not tested on other datasets

- Would be good to see how well this model generalizes

- The impact of training set balancing (bias) on independent test area

Why Pansharp first? Use each dataset in original form

- Didn’t do the experiment to compare results, but this makes sense!

- Pansharp algorithm will alter the data and potentially introduce artifacts

- NN will learn relationships b/w the Pan and MSI inputs

All models did very well (about 94% for validation set accuracy)

- Exp1 (i.e., no spectrum) had best confusion matrix on held out test set

- Why is use of spectrum not helping?

- 4-band spectrum is too weak of a contributor to embedding vector?

- Spectral data poorly calibrated thus spectral quality not great?

- Too much spectral variability w/in class?

- Spectral signal adequately learned through 4-band spatial data?


Mombassa Road Extraction

Caffe / CNN Model Architecture and Params

Pan Streams



CIR Streams*



Conv-3: 112, 3x3, stride 1 + ReLU + LRN

FC Layers



2

All weight inits with xavier / all bias init as constant

LRN: across channels, local size 5, alpha 1, beta 5

Base LR: 0.01, policy: “inv”

Momentum: 0.9

Pre-Processing

• calibrated to at-sensor radiance > linear stretch to 8-bit

*Architecture based on Vlad Mnih PhD dissertation, U Toronto, 2013

Weight Decay: 0.0005

Gamma: 0.001

Power: 0.75

Mini batch size: 128


Mombassa Road Extraction

Results from 4-stream and 2-Stream* models nearly identical in performance:

Stream 1: 32-pix Pan patch at 0.5 m/pix (16x16 m coverage)*

Stream 2: 50-pix Pan patch at 1 m/pix (50x50 m coverage)

Stream 3: 50-pix CIR patch at 2 m/pix (100x100 m coverage)*

Stream 4: 50-pix CIR patch at 4 m/pix (200x200 m coverage)

Best validation accuracy 4-stream: 99.39% @ iter 738kEER = 2.85 | AUC = 99.4

Sensitivity (Recall) = 95.5

Specificity (Precision) = 98.7

Best validation accuracy 2-stream: 98.69% @ iter 910kEER = 2.46 | AUC = 99.6

Sensitivity (Recall) = 95.5

Specificity (Precision) = 98

Proof-of-Concept Model Only

• Model was not refined w/improved training data or optimal arch/params

• Model not applied to other data

prediction = POS True Positive False Positive

predicion = NEG False Negative True Negative

189 54

9 3983

prediction = POS True Positive False Positive

predicion = NEG False Negative True Negative

189 82

9 3955

S7582 - DEEP LEARNING APPLICATIONS ACROSS THE BREADTH...

Documents

Transcript of S7582 - DEEP LEARNING APPLICATIONS ACROSS THE BREADTH...