Machine Learning and Ecosystem Informatics: Challenges and...

62
Machine Learning and Ecosystem Informatics: Challenges and Opportunities Challenges and Opportunities Tom Dietterich School of Electrical Engineering and Computer Science Oregon State University http://web.engr.oregonstate.edu/~tgd [email protected]

Transcript of Machine Learning and Ecosystem Informatics: Challenges and...

Page 1: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Machine Learning and gEcosystem Informatics:

Challenges and OpportunitiesChallenges and Opportunities

Tom Dietterich

School of Electrical Engineering and Computer ScienceOregon State University

http://web.engr.oregonstate.edu/~tgdp g g g

[email protected]

Page 2: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Threats to the BiospherepPollution including Greenhouse GasesPollution including Greenhouse Gases Habitat Loss and FragmentationHabitat Loss and Fragmentation

OverOver--HarvestingHarvesting

ACML 2009 2211/7/2009

Page 3: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Lack of Scientific Knowledgeg

Our understanding of ecosystem structure andOur understanding of ecosystem structure and function is poor

Extremely complex interactionsOperate at many temporal and spatial scalesHard to do controlled experimentsI ibl t b iti l t tImpossible to observe critical past events

Long record of policy failures: “Ecological Surprises”Surprises

Doak et al. Ecology 39(4), 2008.“Surprises are common and extreme”Surprises are common and extreme

ACML 2009 311/7/2009

Page 4: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

A Limiting Factor: Ecological DataEcological Data

Many ecological simulation models areMany ecological simulation models are based on little or no dataHistorical time series only extend back 100Historical time series only extend back 100 yearsF l t ll i th i l tiFor almost all species, their location, population size, and interactions are

b dunobserved

ACML 2009 411/7/2009

Page 5: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Transforming Ecosystem Scienceg y

od C

olle

ctio

nod

Col

lect

ion

OS

Arth

ropo

OS

Arth

ropo

rSco

perS

cope

HypothesisHypothesisDrivenDriven

DataDataDrivenDriven

Past approachesNaturalists: museum collectionsArtificial ecosystems (test tubes; barrels) Jo

n C

hase

Jon

Cha

se Sen

soS

enso

Artificial ecosystems (test tubes; barrels)Isotope tagging of fluxes

Emerging approachesIn-situ sensor networks

JJ

Radio/RFID tagging and tracking of organismsRadar ornithologyR t iRemote sensing

ACML 2009 511/7/2009

Page 6: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

DataPi li

SensorSensorPlacementPlacementPipeline

DataDataCollectionCollection

PlacementPlacement

FeatureFeatureExtractionExtraction

CollectionCollection

ExtractionExtraction

DataDataCleaningCleaning

Model FittingModel Fitting

CleaningCleaning

PolicyPolicyOptimizationOptimization

ACML 2009 6

OptimizationOptimization

11/7/2009

Page 7: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

DataPi li

SensorSensorPlacementPlacement Optimal Sensor PlacementOptimal Sensor Placement

PipelineDataData

CollectionCollection

PlacementPlacement

FeatureFeatureExtractionExtraction

CollectionCollection

ExtractionExtraction

DataDataCleaningCleaning

Model FittingModel Fitting

CleaningCleaning

PolicyPolicyOptimizationOptimization

ACML 2009 7

OptimizationOptimization

11/7/2009

Page 8: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Optimal Sensor Placement for Environmental Data CollectionEnvironmental Data Collection

ObjectivesObjectivesmaximize detection probabilityprobabilityimprove model accuracyaccuracyimprove causal understandinggimprove policy effectiveness

Leskovec et al, KDD2007Leskovec et al, KDD2007

ACML 2009 811/7/2009

Page 9: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

DataPi li

SensorSensorPlacementPlacement Optimal Sensor PlacementOptimal Sensor Placement

PipelineDataData

CollectionCollection

PlacementPlacement

DetectabilityDetectabilityErrors / NoiseErrors / NoiseSampling BiasSampling Bias

FeatureFeatureExtractionExtraction

CollectionCollection Sampling BiasSampling Bias

ExtractionExtraction

DataDataCleaningCleaning

Model FittingModel Fitting

CleaningCleaning

PolicyPolicyOptimizationOptimization

ACML 2009 9

OptimizationOptimization

11/7/2009

Page 10: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Sampling Bias: ebird.orgp g g

Citizen science CardinalsCitizen science collected by amateur bird watchersStrong bias toward where people liveE li it d l fExplicit models of sampling bias

Phillips, Dudik, Elith, Graham, Lehmann, Leathwick, Ferrier: Sample Phillips, Dudik, Elith, Graham, Lehmann, Leathwick, Ferrier: Sample Selection Bias and PresenceSelection Bias and Presence--only Distribution models: implications for only Distribution models: implications for background and pseudobackground and pseudo--absence data. absence data. Ecological ApplicationsEcological Applications, 19(1), , 19(1), 181181--197 2009197 2009

ACML 2009 10

181181--197. 2009.197. 2009.

11/7/2009

Page 11: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Detectabilityy

Birds may be present but not detected by y p yobserverCoupled models of detectability and presence can be fit simultaneouslycan be fit simultaneouslyRoyle, Dorazio (2008). Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations and Communitiesand Communities.

t=1,…,Tt=1,…,T

presentpresentii observedobserveditit xoxoititxpxpii

ACML 2009 1111/7/2009

i=1,…,Mi=1,…,M

Page 12: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

DataSensorSensor

PlacementPlacement Optimal Sensor PlacementOptimal Sensor PlacementDataPipeline

DataDataCollectionCollection

PlacementPlacement

DetectabilityDetectabilityErrors / NoiseErrors / NoiseSampling BiasSampling Bias

FeatureFeatureExtractionExtraction

CollectionCollection

Species classificationSpecies classificationRecognizing individualsRecognizing individualsT ki i di id lT ki i di id l

Sampling BiasSampling Bias

ExtractionExtraction

DataDataCleaningCleaning

Tracking individualsTracking individuals

Model FittingModel Fitting

CleaningCleaning

PolicyPolicyOptimizationOptimization

ACML 2009 12

OptimizationOptimization

11/7/2009

Page 13: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

The BugID Project:Rapid Throughput Arthropod CountingRapid Throughput Arthropod CountingArthropods are a powerful data sourcesource

Found in virtually all environmentsstreams, lakes, oceans, soils, birds, mammals

Easy to collectProvide valuable information on ecosystem function

Consume the primary producers:Consume the primary producers: bacteria, fungi, plantsAre consumed by more charismatic organisms: birds, mammals, fish

Problem: Identification is time-Problem: Identification is timeconsuming and requires scarce expertiseSolution: Combine robotics, computer vision, and machine learning to automate classification and population counting

ACML 2009 1311/7/2009

Page 14: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

DataPi li

SensorSensorPlacementPlacement Optimal Sensor PlacementOptimal Sensor Placement

PipelineDataData

CollectionCollection

PlacementPlacement

DetectabilityDetectabilityErrors / NoiseErrors / NoiseSampling BiasSampling Bias

FeatureFeatureExtractionExtraction

CollectionCollection

Species classificationSpecies classificationRecognizing individualsRecognizing individualsT ki i di id lT ki i di id l

Sampling BiasSampling Bias

ExtractionExtraction

DataDataCleaningCleaning

Tracking individualsTracking individuals

Sensor failuresSensor failuresNetworking failuresNetworking failuresRecognition errorsRecognition errors

Model FittingModel Fitting

CleaningCleaning Recognition errorsRecognition errors

PolicyPolicyOptimizationOptimization

ACML 2009 14

OptimizationOptimization

11/7/2009

Page 15: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Multi-Sensor Anomaly Detection[Dereszynski & Dietterich submitted][Dereszynski & Dietterich submitted][Dereszynski & Dietterich, submitted][Dereszynski & Dietterich, submitted]

15ACML 200911/7/2009

Page 16: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

DataSensorSensor

PlacementPlacement Optimal Sensor PlacementOptimal Sensor PlacementDataPipeline

DataDataCollectionCollection

PlacementPlacement

DetectabilityDetectabilityErrors / NoiseErrors / NoiseSampling BiasSampling Bias

FeatureFeatureExtractionExtraction

CollectionCollection

Species classificationSpecies classificationRecognizing individualsRecognizing individualsT ki i di id lT ki i di id l

Sampling BiasSampling Bias

ExtractionExtraction

DataDataCleaningCleaning

Tracking individualsTracking individuals

Sensor failuresSensor failuresNetworking failuresNetworking failuresRecognition errorsRecognition errors

Model FittingModel Fitting

CleaningCleaning Recognition errorsRecognition errors

Species distribution modelsSpecies distribution modelsBehavioral modelsBehavioral models

PolicyPolicyOptimizationOptimization

Dynamical systems modelsDynamical systems models

Coupling Multiple Coupling Multiple ProblemsProblems

ACML 2009 16

OptimizationOptimizationProblemsProblems

11/7/2009

Page 17: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Species Distribution Modelsp

What are the environmental/biological grequirements for a species?Given:

E i t l f t ( l ti il tiEnvironmental features (elevation, soil properties, weather) of a sitePresence, presence/absence, or abundance of K , p ,species

Find:Probability that each of the K species will be foundProbability that each of the K species will be found at new sitesExtrapolation to global climate change scenarios

ACML 2009 1711/7/2009

Page 18: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Plants in Victoria[Arwen Lettkeman][Arwen Lettkeman]

5,605 plant species measured

Australian Data Sites

2800000

2900000

species measured at >113,000 sites83 environmental 2600000

2700000

2800000

Coo

rd

features2400000

2500000

Nor

thin

g

2200000

2300000

2100000 2200000 2300000 2400000 2500000 2600000 2700000 2800000 2900000 3000000

Easting Coord

11/7/2009 18ACML 2009

Source: Matt White, Arthur Rylah InstituteSource: Matt White, Arthur Rylah Institute

Page 19: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

DataSensorSensor

PlacementPlacement Optimal Sensor PlacementOptimal Sensor PlacementDataPipeline

DataDataCollectionCollection

PlacementPlacement

DetectabilityDetectabilityErrors / NoiseErrors / NoiseSampling BiasSampling Bias

FeatureFeatureExtractionExtraction

CollectionCollection

Species classificationSpecies classificationRecognizing individualsRecognizing individualsT ki i di id lT ki i di id l

Sampling BiasSampling Bias

ExtractionExtraction

DataDataCleaningCleaning

Tracking individualsTracking individuals

Sensor failuresSensor failuresNetworking failuresNetworking failuresRecognition errorsRecognition errors

Model FittingModel Fitting

CleaningCleaning Recognition errorsRecognition errors

Species distribution modelsSpecies distribution modelsBehavioral modelsBehavioral models

PolicyPolicyOptimizationOptimization

Dynamical systems modelsDynamical systems models

Large spatioLarge spatio--temporal MDPstemporal MDPsOptimaOptima that are robustthat are robust

Coupling Multiple Coupling Multiple ProblemsProblems

ACML 2009 19

OptimizationOptimization Optima Optima that are robustthat are robustto model uncertaintyto model uncertainty

ProblemsProblems

11/7/2009

Page 20: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Robust Reserve DesigngGiven:

S i di ib i d lSpecies distribution modelBudget

Find:Set of reserves to purchase thatSet of reserves to purchase that are good habitat for the species and fit within the budget

Robust to uncertainties in the model (and climate etc )model (and climate, etc.)

Optimize the machine learning to be more accurate where land is cheaper to acquire?Joint optimization of model fitting Predicted winter distribution of treePredicted winter distribution of treeJoint optimization of model fitting and optimization?

Predicted winter distribution of tree Predicted winter distribution of tree swallows (Fink, et al., unpublished)swallows (Fink, et al., unpublished)

ACML 2009 2011/7/2009

Page 21: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Outline: Three Challenges for Machine LearningMachine Learning

Object Recognition for Arthropod CountingObject Recognition for Arthropod Counting

M lti l S i P di tiMultiple Species Prediction

Spatio-Temporal Optimization for Forest Management

ACML 2009 2111/7/2009

Page 22: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Automated Rapid-Throughput Arthropod Population CountingArthropod Population Counting

G lGoal: technician collects specimens in the field by various meansrobotic device automatically manipulates, photographs, classifies, and sorts the specimens

Two applications:stoneflies in freshwater streamssoil mesofaunasoil mesofauna

ACML 2009 2211/7/2009

Page 23: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Application 1: Stonefly populations in freshwater streamsfreshwater streams

• differentially sensitive to• differentially sensitive to many pollutants

• live in rivers; reliable indicator of stream health

• difficult and expensive for people to classify (particularly to genus or(particularly to genus or species levels)

f

ACML 2009

• hundreds of species

2311/7/2009

Page 24: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Application 2: Small arthropods in soil: “soil mesofauna”in soil: soil mesofauna

AchipteriaAAchipteriaA BdellozoniumIBdellozoniumI BelbaABelbaA BelbaIBelbaI CatoposurusACatoposurusA EniochthoniusAEniochthoniusA PtenothrixVPtenothrixV

E t b TME t b TM E id AE id A EpilohmanniaAEpilohmanniaA EpilohmanniaDEpilohmanniaD EpilohmanniaTEpilohmanniaT HypochthoniusLAHypochthoniusLA PtiliidAPtiliidAEntomobrgaTMEntomobrgaTM EpidamaeusAEpidamaeusA EpilohmanniaAEpilohmanniaA EpilohmanniaDEpilohmanniaD EpilohmanniaTEpilohmanniaT HypochthoniusLAHypochthoniusLA

HypogastruraAHypogastruraA

IsotomaAIsotomaAIsotomaVIIsotomaVI LiacarusRALiacarusRA MetrioppiaAMetrioppiaA

NothrusFNothrusF

QuadroppiaAQuadroppiaA

onychiurusAonychiurusAOppiellaAOppiellaA PeltenuialaAPeltenuialaA PhthiracarusAPhthiracarusA

PlatynothrusFPlatynothrusFPlatynothrusIPlatynothrusI SiroVISiroVITomocerusATomocerusA

24ACML 200911/7/2009

Page 25: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Computer Vision Challenges(1)p g ( )

Highly-articulated objects with deformationHighly articulated objects with deformation

ACML 2009 2511/7/2009

Page 26: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Computer Vision Challenges(2)p g ( )

Huge intra-class changes of appearances due toHuge intra class changes of appearances due to development and maturation

tergites wingsbecome

ACML 2009

g g

2611/7/2009

Page 27: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Computer Vision Challenges(3)p g ( )

Small between-class differencesSmall between class differences

Calinueria Doronueria

ACML 2009 2711/7/2009

Page 28: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Machine Learningg

Training Examples

New Examples

CalineuriaCalineuria

LearningAlgorithm Classifier

CalineuriaCalineuria

DoroneuriaDoroneuria Algorithm

DoroneuriaDoroneuria

ACML 2009

Doroneuria

2811/7/2009

Page 29: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

State of the Art in Object RecognitionRecognition

“Bag of Keypoints” based on visualBag of Keypoints based on visual dictionaries

85% correct85% correct

New method: multiple-instance classification of bags of regions

95% correct

11/7/2009 ACML 2009 29

Page 30: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Region-Based Approaches:Convert Image to Bag of PatchesConvert Image to Bag of Patches

HandlesOcclusionRotation, translationScale (with scale-independent ( ppatch representation)Partial out-of-plane orientationArticulation / Pose

Problem:How to define the patches?pHow to represent each patch?How to classify a BAG of patches?

ACML 2009 3011/7/2009

Page 31: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Defining the Patches: Interest Region DetectorsInterest Region Detectors

ACML 2009 31

HessianHessian--Affine DetectorAffine Detector Kadir Entropy DetectorKadir Entropy Detector PCBR DetectorPCBR Detector

11/7/2009

Page 32: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Representing the Patches:SIFT (Lowe 1999)SIFT (Lowe, 1999)

(Lowe, 19

(Lowe, 19999)

999)

•• Morph ellipse into a circleMorph ellipse into a circle

•• Compute Compute intensity gradient at each pixel in 16x16 regionintensity gradient at each pixel in 16x16 region

R t t h l i l di t d i t i t it di tR t t h l i l di t d i t i t it di t•• Rotate whole circle according to dominant intensity gradientRotate whole circle according to dominant intensity gradient

•• Weight gradients by Weight gradients by a a Gaussian Gaussian distribution (indicated by circle)distribution (indicated by circle)

•• Collect into histograms within each 4x4 region (gives 16Collect into histograms within each 4x4 region (gives 16

ACML 2009

Collect into histograms within each 4x4 region (gives 16 Collect into histograms within each 4x4 region (gives 16 histograms)histograms)

•• Result: 128Result: 128--element vector normalized to have Euclidean norm 1element vector normalized to have Euclidean norm 13211/7/2009

Page 33: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Classify Bag of PatchesMethod 1: Visual DictionariesMethod 1: Visual Dictionaries

“look up” each patch in di ti d t i tdictionary and count into a feature vectorfeature vector is then given to the classifierto the classifier

1122

33

0 0 0 0 0 0 . . . . . 011224 2 6 4 9 0 . . . . . 3

44

classifierclassifier

ACML 2009 33

100100 ŷ=2ŷ=2

11/7/2009

Page 34: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Learn visual dictionary via clusteringclustering

Gaussian Mixture Model (k=100) with diagonal covariance ( ) gmatrices (EM, initialized with K-means)

abdomenabdomen

nosenose

eyeseyes

centers ofcenters oftergitestergites

legslegs

sides ofsides of

headhead

11/7/2009 ACML 2009 34

tergitestergites

100 clusters100 clusters

Page 35: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Issues with Visual Dictionaries

UnsupervisedUnsupervisedSeveral efforts to construct discriminative dictionaries (Moosman et al 2006)dictionaries (Moosman et al., 2006)

Lose information128 element SIFT contains 1024 bits a bag of128-element SIFT contains 1024 bits, a bag of 256 SIFTs contains 256K bitsKeyword histogram from 2700 elementKeyword histogram from 2700-element dictionary contains ~2700bits

11/7/2009 ACML 2009 35

Page 36: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Classify Bag of PatchesMethod 2: Multiple Instance ClassifierMethod 2: Multiple-Instance Classifier

The classifierThe classifier predicts the class of the imageof the image separately from each patch

classifierclassifier

each patchThese vote to make the final decisionthe final decision

0 0 0 0 0 0 0 0 011

ŷŷ=7=7ŷŷ=2=2

112 8 1 3 0 0 6 4 2 Final predictionFinal prediction ŷ 2ŷ 2

ACML 2009 36

0 0 0 0 0 0 0 0 0

votesvotes

11112 8 1 3 0 0 6 4 2 Final prediction: Final prediction: ŷ=2ŷ=2

11/7/2009

Page 37: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Improved Multiple-Instance ClassificationClassification

Evidence Trees: Like decision trees, but store ,the “evidence” in each leafGiven an input, output the evidence

xx1212 > 0.6> 0.6yesyes nono

xx109109 > 0.9> 0.9 xx6666 > 0.1> 0.1

100523 001232 000180 741030

nononono yesyesyesyes

ACML 2009 3711/7/2009

Page 38: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Classify Bag of PatchesVoted Evidence TreesVoted Evidence Trees

The classifierThe classifier predicts the class of the imageof the image separately from each patchclassifierclassifier each patchThese vote to make the final decision

100523 001232

the final decision

Final predictionFinal prediction ŷ 1ŷ 1

ACML 2009 38

0 0 0 0 0

votesvotes

Final prediction: Final prediction: ŷ=1ŷ=123 5 0 0 125 8 12 0 187 14 34 6 61

11/7/2009

Page 39: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Theorem: Voting Evidence is Better than Voting Decisionsthan Voting Decisions

Intuition: When voting gdecisions, there are two opportunities to make a mistake:1. Making the wrong

decision at each leaf2. Making the wrong2. Making the wrong

decision when combining the votes

With evidence trees,With evidence trees, the first opportunity is avoided

γγ = margin of decision tree nodes= margin of decision tree nodesππ = fraction of non= fraction of non--noise patchesnoise patches

ACML 2009 3911/7/2009

Page 40: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Voting Decisions vs. Voting EvidenceVoting Evidence

95.0

VO

CV

OC

80 0

85.0

90.0

2006

V20

06 V

70.0

75.0

80.0

AUC

decisions

CA

L 2

CA

L 2

55.0

60.0

65.0 evidence

PAS

PAS

50.0

55.0

11/7/2009 ACML 2009 40

Page 41: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Final Classifier:Stacked Random ForestsStacked Random Forests

1. Each patch is processed by a random forest of evidence trees

2. Evidence is summed and normalized to produce C3. C is classified by a second-level boosted decision3. C is classified by a second level boosted decision

tree ensemble

BagBagofoft ht h

NormalizedNormalizedCount vector CCount vector CΣΣ

weightedweightedvotevote ŷŷ

patchespatches

ACML 2009 41

Bootstrap/Random ForestBootstrap/Random ForestEnsembleEnsemble

Boosted EnsembleBoosted Ensemble

11/7/2009

Page 42: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Experimental Study9 Taxa of Stoneflies9 Taxa of Stoneflies

C lC l IsoIsoCalCal

DorDor

IsoIso

MosMos

HesHes PtePte

SweSwe

YorYor

ACML 2009 42

ZapZap

11/7/2009

Page 43: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

STONEFLY9 Dataset

3826 images3826 images773 specimens9 l9 classesError estimation by 3-fold cross-validation

all images of a specimen belong to the same fold

11/7/2009 ACML 2009 43

Page 44: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Comparison of Methodsp

E R

16

18

Error  Rate

10

12

14

6

8

10

0

2

4

11/7/2009 ACML 2009 44

Visual Dictionary Stacked Evidence Trees

Page 45: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Outline: Three Challengesg

Object Recognition for Arthropod CountingObject Recognition for Arthropod Counting

M lti l S i P di tiMultiple Species Prediction

Spatio-Temporal Optimization for Forest Management

ACML 2009 4511/7/2009

Page 46: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Plants in Victoria[Arwen Lettkeman][Arwen Lettkeman]

5,605 plant species measured

Australian Data Sites

2800000

2900000

species measured at >113,000 sites83 environmental 2600000

2700000

2800000

Coo

rd

features2400000

2500000

Nor

thin

g

2200000

2300000

2100000 2200000 2300000 2400000 2500000 2600000 2700000 2800000 2900000 3000000

Easting Coord

11/7/2009 46ACML 2009

Source: Matt White, Arthur Rylah InstituteSource: Matt White, Arthur Rylah Institute

Page 47: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Labels are Sparsep

1200

1000

600

800

ber o

f Site

s

200

400Num

b

0

200

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

11/7/2009 ACML 2009 47

Species Richness

Page 48: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Many Species are Rare Many Species are CommonMany Species are Common

sent

sent

whe

re p

res

whe

re p

res

# of

site

s #

of s

ites

11/7/2009 ACML 2009 48

Species IndexSpecies Index

Page 49: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Exploiting Multiple Speciesp g p p

Experiment: Predict 0.45Experiment: Predict presence/absence of one species given

Environmental ib

0.35

0.4

attributesPresence/Absence of other speciesBoth

0.25

0.3 Score

Both

0.15

0.2

Kapp

a Other Species

Environmental Vars

0.05

0.1

0

1/300 1/200 1/100 1/75 1/50 1/40 1/30 1/20 1/10 1/9 1/8 1/7 1/6 1/5 1/4 1/3

Fraction of Training Data

Page 50: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Multi-Label ClassificationMultiple-output neural networksMultiple-response decision trees

Zhang, H. 1998. Classification Trees for Multiple Binary Responses, JASA.De’ath, G. 2002. Multivariate Regression Trees: A new technique for modeling species-environment relationships. Ecology.

Conditional random fieldsMcCallum, A., Ghamrawi, N. 2204. Collective multi-label text classification. Tech Report.

Conditional topic modelsMimno D McCallum A 2008 Topic models conditioned on arbitrary features withMimno, D., McCallum, A. 2008. Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. UAI.

StackingWolpert, D. 1992. Stacked generalization. Neural Networks

Reduction to multi class classification problemsReduction to multi-class classification problemsRead, J., Pfaringer, B., Holmes, G. 2008. Multi-label classification using ensembles of pruned sets. ICDM.Tsoumakis, G., Vlahavas, I. 2007. Random k-label sets: an ensemble method for multilabel classification. ECML.

11/7/2009 ACML 2009 50

Page 51: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Multi-Task Learningg

Train one model for each species but useTrain one model for each species, but use the other species as auxiliary tasksTrain a joint model but choose separateTrain a joint model, but choose separate regularization constant and decision threshold for each speciesthreshold for each species

11/7/2009 ACML 2009 51

Page 52: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Status

No results yet!No results yet!

11/7/2009 ACML 2009 52

Page 53: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Outline: Three Challengesg

Object Recognition for Arthropod CountingObject Recognition for Arthropod Counting

M lti l S i P di tiMultiple Species Prediction

Spatio-Temporal Optimization for Ecosystem Management

ACML 2009 5311/7/2009

Page 54: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Fires in the Western US

Natural behavior –frequent low-intensity fires (every 15-20 years)

Fa ors Ponderosa PineFavors Ponderosa Pine forests

thick bark to survive low-intensity fireintensity fire

Takes out weaker trees – “natural thinning”Result: Open stands of big, valuable trees

11/7/2009 ACML 2009 54

Page 55: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Fire Suppression Policypp y

William Greeley USFS chief 1920-9:"the conviction was burned into me that that fire prevention is the number 1 job of American foresters (Greeley, WB. 1951. Forests and men. NY: Doubleday.)

10:00 am policy: Contain every wildfire by 10:00 am the day after it is reported regardless of cost.

11/7/2009 ACML 2009 55www.mtmultipleuse.org/images/smokey.jpg www.mtmultipleuse.org/images/smokey.jpg

Page 56: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Result of this Policyy

Lodgepole pine becomes g p pdominant tree

low economic valuevulnerable to pine barkvulnerable to pine bark beetledies and creates enormous fuel buildupsp

Fires become catastrophicmost vegetation killedmost soil organic mattermost soil organic matter destroyedvery long recovery timebig CO releasebig CO2 release

11/7/2009 ACML 2009 56

Page 57: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Adaptive Fire Treatmentp

Choose what fires to allow to burnChoose what fires to allow to burnPerform “mechanical thinning” to reduce fuel loadsloads

11/7/2009 ACML 2009 57

Page 58: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Formulation as a Markov Decision ProcessMarkov Decision Process

States:L d di id d i 100 i (MU )Landscape divided into 100 management units (MUs)Each MU has two state variables:

Age: age of trees {0-9, 10-19, 20-29, 30-39, 40-49}Fuel: fuel load {very-low, low, medium, high, very-high}25100 states

Actions:Every 10 years for each MU: {grow, cut, fuel}3100 actions3 actions

State transition function:Actions are deterministic, but then fire burns stochastically depending on states and spatial arrangement of states

R d F tiReward Function:Value of timber cut and soldCost of fuel treatments

11/7/2009 ACML 2009 58

Page 59: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Status

We have no algorithms that can handle suchWe have no algorithms that can handle such large spatio-temporal MDPs

And there are typically 2000-3000 MUs

11/7/2009 ACML 2009 59

Page 60: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

SummarySensorSensor

PlacementPlacement Optimal Sensor PlacementOptimal Sensor PlacementSummary

DataDataCollectionCollection

PlacementPlacement

DetectabilityDetectabilityErrors / NoiseErrors / NoiseSampling BiasSampling Bias

FeatureFeatureExtractionExtraction

CollectionCollection

Species classificationSpecies classificationRecognizing individualsRecognizing individualsT ki i di id lT ki i di id l

Sampling BiasSampling Bias

ExtractionExtraction

DataDataCleaningCleaning

Tracking individualsTracking individuals

Sensor failuresSensor failuresNetworking failuresNetworking failuresRecognition errorsRecognition errors

Model FittingModel Fitting

CleaningCleaning Recognition errorsRecognition errors

Species distribution modelsSpecies distribution modelsBehavioral modelsBehavioral models

PolicyPolicyOptimizationOptimization

Dynamical systems modelsDynamical systems models

Large SpatioLarge Spatio--Temporal MDPTemporal MDPssOptima that are robustOptima that are robust

Coupling Multiple Coupling Multiple ProblemsProblems

ACML 2009 60

OptimizationOptimization Optima that are robustOptima that are robustto model uncertaintyto model uncertainty

ProblemsProblems

11/7/2009

Page 61: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

AcknowledgementsgGrant Support: US National Science Foundation, US Forest Service

Species Distribution ModelsStudents: A. Lettkeman, P. WilkinsPostdocs: R HutchinsonPostdocs: R. HutchinsonFaculty: W-K. Wong, T. Dietterich

BugID: Students: N. Larios, H. Deng, W. Zhang, N. Payet, M. Sarpola, C. Fagan, J. Yuen, S. Ruiz CorreaPostdocs: G. Martínez-MuñozFaculty: R. Paasch, A. Moldenke, D. A. Lytle, E. Mortensen, L. G. Shapiro, y y pS. Todorovic, T. Dietterich

Fire ManagementStudents: A Gagnon R Houtman J Mark

ACML 2009

Students: A. Gagnon, R. Houtman, J. MarkFaculty: C. Montgomery, T. Dietterich, W-K. Wong

6111/7/2009

Page 62: Machine Learning and Ecosystem Informatics: Challenges and …web.engr.oregonstate.edu/~tgd/talks/acml-2009.pdf · 2009-11-07 · Hierarchical Modeling and Inference in Ecology: The

Questions?

ACML 2009 6211/7/2009