Alberta innovates pem_presentation_feb13_2012_ram_version1

45
Automated Predictive Mapping: Lessons Learned about the Process R. A. MacMillan LandMapper Environmental Solutions Inc.

description

This is a first draft of slides for a presentation to a session on Predictive Ecosystem Mapping at Alberta Innovates.

Transcript of Alberta innovates pem_presentation_feb13_2012_ram_version1

Page 1: Alberta innovates pem_presentation_feb13_2012_ram_version1

Automated Predictive Mapping:

Lessons Learned about the Process

R. A. MacMillanLandMapper Environmental Solutions Inc.

Page 2: Alberta innovates pem_presentation_feb13_2012_ram_version1

I n d i v i d u a l s a l i n i t y h a z a r d r a t i n g sfo r ea c h l a y e r

1 0 0 x 1 0 0 m g r id

L a n d s c a p ec u r v a t u r e

V e g e t a t io n

R a in f a l l

G e o lo g y

S o i ls

L a n d s u r f a c e

S a l in i t y h a z a r dm a p

L a y e r w e ig h t in g s

2 x

1 x

2 x

1 x

3 x

T o t a l s a l in i t yh a z a r d r a t in g

What is a PEM?• Definition of Predictive Ecosystem

Mapping– Jones et al., 1999Predictive Ecosystem Mapping (PEM) – a computer, GIS and knowledge based method of stratifying landscapes into ecologically-oriented map units based on the overlaying of existing mapped themes and the processing of the resulting attributes by automated inferencing software using a formalized knowledge base containing ecological-landscape relationships.

Page 3: Alberta innovates pem_presentation_feb13_2012_ram_version1

Principals and Concepts

Fundamental PrincipalsDifferent Approaches to DSM

Page 4: Alberta innovates pem_presentation_feb13_2012_ram_version1

Fundamental Principals of DSM

Pedotransfer functions (PTF)

Bouma (1989): “translating data we have into what we need ”

Credit: Minasny & McBratney

Page 5: Alberta innovates pem_presentation_feb13_2012_ram_version1

Fundamental Principals of DSM

Credit: Minasny & McBratney

Principle 1:

Do not predict something that is easier to measure or map than the predictor

Effort

Page 6: Alberta innovates pem_presentation_feb13_2012_ram_version1

Fundamental Principals of DSM

Uncertainty

-Do not use PTFs unless you can evaluate the uncertainty, and for a given problem.

-If a set of alternative PTFs is available, use the one with minimum variance (= optimized).

Principle 2:

Credit: Minasny & McBratney

Page 7: Alberta innovates pem_presentation_feb13_2012_ram_version1

Predictive Mapping Concepts

From: Dobos et al., 2006 JRC – EUR 22123

Page 8: Alberta innovates pem_presentation_feb13_2012_ram_version1

A Spatial Soil Inference System ( Lagacherie & McBratney, 2005)

DTMRS image

X

Existing Soil map

Scorpan layers

Soil observations

Spatial Soil Information System DSM Function library

Scorpan F.

Pedotransfer F

Class Content F.

Allocation F.

User interface

User data

Predictor

OUTPUT

Function organiser

Page 9: Alberta innovates pem_presentation_feb13_2012_ram_version1

DSM Methods

Translating Concepts into Results

Page 10: Alberta innovates pem_presentation_feb13_2012_ram_version1

Approaches to Producing Predictive Area-Class Maps

Page 11: Alberta innovates pem_presentation_feb13_2012_ram_version1

Unsupervised Classification

• If You Do Not Know (or Are Not Confident That You Know)– What spatial entities are optimal to map– What their defining attributes are– Under what conditions (of input variables) they occur

• Then You Are Best Served by Adopting– An unsupervised classification approach

• ISODATA (Irwin et al., 1997)• Fuzzy k-means (Burrough et al., 2002, 2003; Irwin et al.,

1997)

Page 12: Alberta innovates pem_presentation_feb13_2012_ram_version1

Concept of Fuzzy K-means Clustering

Source: J. Balkovič & G. Čemanová

Credit: Sobocká et al., 2003

Page 13: Alberta innovates pem_presentation_feb13_2012_ram_version1

Example of Application of Fuzzy K-means Unsupervised

Classification

From: Burrough et al., 2001, Landscsape Ecology

Note similarity of unsupervised classes to

conceptual classes

Page 14: Alberta innovates pem_presentation_feb13_2012_ram_version1

Supervised Classification

• If You ARE Confident That You Know– What spatial entities are optimal to map– That you can consistently identify “A” when you

“see it”• But You ARE NOT Confident That You Know

– What the defining attributes of the entities are– That you can formally express under what

conditions (of input variables) the entities to be predicted occur

• Then You Are Best Served by Adopting– A supervised (data mining) classification approach

• Classification and Regression Trees (CART)• Bayesian Analysis of Evidence• Supervised Fuzzy-logic

Page 15: Alberta innovates pem_presentation_feb13_2012_ram_version1

Supervised Classification Using Regression Trees

From: Zhou et al., 2004 JZUS

Note similarity of supervised rules and classes to typical soil-landform conceptual classes

Note numeric estimate of likelihood of occurrence of classes

Page 16: Alberta innovates pem_presentation_feb13_2012_ram_version1

Supervised Classification Using Bayesian Analysis of

EvidenceFrom: Zhou et al., 2004 JZUSNote: ultimately this is just a way of establishing numerical measures of the likelihood of occurrence of each class to be predicted given the presence of a predictor class

Note: the final, overall probability value is computed as a weighted average of the individual probabilities of each potential output class given each input class on n input maps

Page 17: Alberta innovates pem_presentation_feb13_2012_ram_version1

Supervised Classification Using Bayesian Analysis of

Evidence/Classification Trees

From: Zhou et al., 2004 JZUS

Page 18: Alberta innovates pem_presentation_feb13_2012_ram_version1

Supervised Classification Using Fuzzy Logic

• Shi et al., 2004– Used multiple cases of

reference sites– Each site was used to

establish fuzzy similarity of unclassified locations to reference sites

– Used Fuzzy-minimum function to compute fuzzy similarity

– Harden class using largest (Fuzzy-maximum) value

– Considered distance to each reference site in computing Fuzzy-similarity

Fuzzy likelihood of being a broad ridge

Page 19: Alberta innovates pem_presentation_feb13_2012_ram_version1

Knowledge-Based Classification

• If You ARE Confident That You Know– What spatial entities are optimal and desirable to

map

• AND You ARE Confident That You Know– What the defining attributes of the entities are– That you have a pretty good idea of the conditions

(of input variables) that the entities develop under

• Then You May Be Well Served by Adopting– A Knowledge-based (heuristic) classification

approach• Pragmatic, subjective, semantically expressed knowledge

– SoLIM (Zhu), LandMapR (MacMillan), SIE (Xun Shi)• Formal, theoretical, quantitatively defined knowledge

– Shary et al., 2005 (GeoFis)

Page 20: Alberta innovates pem_presentation_feb13_2012_ram_version1

From: Zhu,SoLIM Handbook

Knowledge-Based Classification In SoLIM

Page 21: Alberta innovates pem_presentation_feb13_2012_ram_version1

From: MacMillan, 2005

Knowledge-Based Classification In LandMapR

Source: Steen and Coupé, 1997

Page 22: Alberta innovates pem_presentation_feb13_2012_ram_version1

PEM DSS Classification Using LandMapR

Normal Mesic

Moist Foot Slope

Warm SW Slope

Shallow Crest

Organic Wetland

Wet Toe Slope

Cold Frosty Wet

Permanent Lake

Page 23: Alberta innovates pem_presentation_feb13_2012_ram_version1

From: MacMillan, 2005

PEM from a knowledge-based approach can look like a normal PEM

Page 24: Alberta innovates pem_presentation_feb13_2012_ram_version1

Predictive Mapping10 Lessons Learned from my

BC PEM Mapping Experience

Page 25: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 1: Define What Constitutes Success!

• Key to Everything Else– Can’t achieve success if

you don’t know what it is (and how to measure it)

• Measure Success– Need a way to measure

success objectively

• Establish Standards– Need to set targets that

can be realistically met– Figure out what you

need and not what you feel you want

Page 26: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 2: Organize for Success – Partition work

• Key is to split work up– Don’t try to do

everything yourself– Do what you do best

and let others do what they do best

– Don’t give implementer control over time and budget

– Check and verify

Forest Industry Clients

Government Funding

Programs

Dedicated Project

Manager

Project Technical Monitor

External Compliance

Auditors

GIS Input Data Preparation Specialists

Local Knowledge

Expert

Knowledge Engineer

& MapperIndependent Field Accuracy

Assessors

Government Published

Knowledge

Government Digital Data Repository

Government Published Standards

Research and Development Environment

Theory, Methods, Data, Tools, Software

Page 27: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 3: Test and Verify All of Your Assumptions –

Objectively!• PEM Pilot – 2002/03

– Automated methods will be less costly than traditional manual ones– Intensive manual interpretation and field sampling will produce more

accurate maps than those produced by automated modeling• Canim Lake PEM Operational Scale-up – 2003/04

– Automated predictive methods aren’t scalable for operational mapping– Finer resolution DEM data (5 & 10 vs. 25m) will yield more accurate

maps• Quesnel Operational PEM – 2004/05

– Unit costs can go down with efficiencies of scale as larger areas are mapped

– Single sets of KB rules can apply to entire BEC subzones• East Williams Lake Operational PEM – 2005/06

– Local experts can agree on correct classification in the field at 100% of visited locations

– Areas of elevated frost hazard can be predicted to occur in structural hollows

• East Quesnel and West Williams Lake Operational PEMs – 2006/07– Land Cover information from LandSat imagery is not useful for PEMs

Page 28: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 4: There’s More Than One Way to Skin this Cat!

• Use Expert Knowledge to Predict PEM Entities

• Use Data Mining – To Develop Statistical Classification Rules

Page 29: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 5: Select Appropriate Predictor Inputs!

• Predictors are More Important than Methods– Appropriate predictors

co-vary with entities to be predicted (at that scale)

– Multi-scale inputs are being used increasingly

• Expanding windows

– Measures of context and pattern are important

• Replacing local measures of slope and shape

Page 30: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 6: DEMs Don’t Tell You Everything!

• You Need to Make Use of Ancillary Predictor Data Sets– DEMs can tell you:

• SHAPE & SIZE (at a specific scale)• CONTEXT & PATTERN• ORIENTATION

– DEMs can’t tell you:• SUB-SURFACE ATTRIBUTES

– Texture or Mineralogy– Water table depth or seepage

• SURFACE COVER ATTRIBUTES– Land use, land cover, vegetation

5 m DEM

800 m900 m

25 m DEM

800 m900 m

Page 31: Alberta innovates pem_presentation_feb13_2012_ram_version1

Ancillary data sets are important and needed!

• Radiometrics 4 Subsurface• Imagery 4 Surface Cover

Page 32: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 7: Hierarchies Establish Context!

• One Set of Rules Can’t Fit Everywhere– Sub-divide map areas

into successively smaller and more homogeneous “classification domains”

– Develop and apply different KB rules in different map domains

– Knowing “where” you are tells you what to expect = CONTEXT!

Source: Steen and Coupé, 1997

Page 33: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 8: Don’t Model What You Can Directly Map More

Efficiently!Principle 1:

Do not predict something that is easier to measure or map than to predict!

So – if you can map it manually faster or better, do not hesitate to do so!

Page 34: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 9: Don’t Expect Perfection!

• These are concepts and NOT reality– Any number of maps

may be equally “good”– N experts will never

agree at all locations– Input data and models

are not perfect– Need to stop when

map is “good enough”– More time and effort

often give poorer result

T28B T28K T28O T28R T28M B-K B-O B-R K-O K-R O-R00 12 10 5 7 0 10 5 7 5 7 501 28 31 35 50 40 28 28 28 31 31 3402 19 7 23 4 6 7 19 19 7 4 403 20 12 5 2 9 12 5 2 5 2 204 10 19 1 15 26 10 10 10 1 15 105 1 9 0 1 7 1 1 1 0 1 006 10 3 4 0 2 3 3 0 3 0 007 0 6 5 21 10 0 0 0 5 6 508 0 3 3 0 0 0 0 0 3 0 009 0 0 21 0 0 0 0 0 0 0 0

71 71 67 60 66 51

Agreement between Ecologist 64Agreement between Map and Ecologists 65Overall Ecologist Agreement 65Overall Map Agreement 67

T28B T28K T28O T28R T28M B-K B-O B-R K-O K-R O-R00 12 10 5 7 0 10 5 7 5 7 501 28 31 35 50 40 28 28 28 31 31 3402 19 7 23 4 6 7 19 19 7 4 403 20 12 5 2 9 12 5 2 5 2 204 10 19 1 15 26 10 10 10 1 15 105 1 9 0 1 7 1 1 1 0 1 006 10 3 4 0 2 3 3 0 3 0 007 0 6 5 21 10 0 0 0 5 6 508 0 3 3 0 0 0 0 0 3 0 009 0 0 21 0 0 0 0 0 0 0 0

71 71 67 60 66 51

Agreement between Ecologist 64Agreement between Map and Ecologists 65Overall Ecologist Agreement 65Overall Map Agreement 67

Page 35: Alberta innovates pem_presentation_feb13_2012_ram_version1

Lesson 10: Discontinuities are Important!

From: Minar and Evans, 2008

Page 36: Alberta innovates pem_presentation_feb13_2012_ram_version1

Predictive MappingSome lessons learned from

my recent global soil mapping experiences

Page 37: Alberta innovates pem_presentation_feb13_2012_ram_version1

Some Lessons Learned• The process is more important than

the product– The final predictive map output product

is not the most important product.– The most important product is the

process used to create the predictive maps.

– The map is diminished in value if it cannot be easily updated, improved and replicated.

– The process has to capture and retain all inputs, all procedures and all outputs.

Page 38: Alberta innovates pem_presentation_feb13_2012_ram_version1

A proposal for a centralized ecological

information facility– Rationale:• Link individual components together to ensure

that ecological information products are:– Complete: information of similar content and

appearance is provided everywhere and not just for a scattered patchwork.

– Consistent: products avoid glaring inconsistencies, abrupt discontinuities and clearly recognizable differences between different project areas or mapping entities.

– Correct: all products are as correct, or accurate, as possible, assessed relative to real data using objectively defined criteria.

– Current: all data are as up to date and current as possible and new versions of outputs can be regularly and easily produced.

Page 39: Alberta innovates pem_presentation_feb13_2012_ram_version1

• Key objective• To provide an overarching methodological

framework and operational platform for the production of consistent province-wide ecological information products.

A proposal for a centralized ecological

information facility

Page 40: Alberta innovates pem_presentation_feb13_2012_ram_version1

• Basic concept• Provide an overarching methodological framework

that links individual components (bits) into an integrated whole whose functions interact intelligently to produce consistent outputs.

– An Open Database of field observations and classifications

– A repository of consistent gridded covariate maps– A linked library of complimentary functions and utilities

(mostly but not exclusively produced using R) for manipulating and processing the preceding data sets to automatically produce models and maps of ecological entity spatial patterns (and uncertainties) according to agreed specifications.

– A platform and utilities for discovering, displaying and retrieving grid maps of soil properties for any area of interest.

A proposal for a centralized ecological

information facility

Page 41: Alberta innovates pem_presentation_feb13_2012_ram_version1

An Example of a Central Information Facility

(GSIF)

Page 42: Alberta innovates pem_presentation_feb13_2012_ram_version1

An Example of a Central Information Facility

(GSIF)

Page 43: Alberta innovates pem_presentation_feb13_2012_ram_version1

• Build and maintain the cyber-infrastructure– Think of Facebook, Twitter or Google

• You provide the platform and the functionality• You establish the templates, standards and tools• Users contribute content and effort to create content

• Make it easy to use and rapid to update– Maps become best models of current reality

• Based on automated analysis of latest inputs and data• Let the maps constantly grow and improve – not static• Move towards real-time dynamic maps not one-time

static maps

What do I recommend?

Page 44: Alberta innovates pem_presentation_feb13_2012_ram_version1

• Expert rule based models take time and effort– Slow and costly to redo or update– Subject to bias and differences in experience

• Statistical models from data mining – Will often produce results inferior to ones

created using expert knowledge– But offer the ability to continuously and

regularly improve the rules and update the maps automatically

– Are optimized, in the sense that the output maps fit the available observations as closely as possible

Why do I recommend this?

Page 45: Alberta innovates pem_presentation_feb13_2012_ram_version1

Thank You

And Good Luck!