Model evaluation 201606

24
Model Evaluation A. Townsend Peterson University of Kansas

Transcript of Model evaluation 201606

Page 1: Model evaluation 201606

Model Evaluation

A. Townsend PetersonUniversity of Kansas

Page 2: Model evaluation 201606

Generalities

• Calibration data and evaluation data must be independent

• Important to establish whether the observed coincidence between model predictions and testing data is closer than random expectations

• Only once a model is tested (successfully) should the model be interpreted and explored

Page 3: Model evaluation 201606

Threshold-dependent or Not?Thresholded• PRO

– Simplicity of test– Clear interpretation– Computation is easy

• CON– Assumptions required in

thresholding– Less well accepted by the

community (who cares?)

Continuous• PRO

– Avoid need for thresholding and assumptions

– Very well accepted by community

• CON– Less clear in interpretation– Problems (known) with ROC

AUC– Computational challenges

Page 4: Model evaluation 201606

Binomial Test

• Given a SINGLE threshold• Proportional area predicted present

determines expected numbers of points correctly predicted

• Binomial test assesses whether observed number of successes is greater than that expected by chance alone

Page 5: Model evaluation 201606
Page 6: Model evaluation 201606

If predicted suitable area covers 15% of the testing area, then 15% of evaluation points are expected to fall in the predicted suitable area by chance.

• p = proportion of area predicted suitable

• s = number of successes• n = number of evaluation

points• =1-BINOMDIST(s,n,p,”TRUE”)

Cumulative binomial distribution calculates the probability of obtaining s successes out of n trials in a situation in which p proportion of the testing area is predicted present. If this probability is below 0.05, we interpret the situation as indicating that the model’s predictions are significantly better than random.

Threshold-dependent Approach

Page 7: Model evaluation 201606

Threshold-independent Approaches

Page 8: Model evaluation 201606
Page 9: Model evaluation 201606
Page 10: Model evaluation 201606
Page 11: Model evaluation 201606

Corr

ect p

redi

ction

of

pres

ence

info

rmati

on

(= a

void

ance

of o

miss

ion

erro

r)

Correct prediction of absence information (= avoidance of commissionerror)

Page 12: Model evaluation 201606
Page 13: Model evaluation 201606
Page 14: Model evaluation 201606
Page 15: Model evaluation 201606
Page 16: Model evaluation 201606

ROC Problems

• Ignores predicted probability values … just a ranking of suitabilities

• Speaks to regions of ROC space (= predictions) that are not particularly relevant

• Weights omission and commission errors equally• No information about spatial distribution of

model errors• Study area extent determines outcomes!

Page 17: Model evaluation 201606
Page 18: Model evaluation 201606
Page 19: Model evaluation 201606
Page 20: Model evaluation 201606
Page 21: Model evaluation 201606
Page 22: Model evaluation 201606
Page 23: Model evaluation 201606

Significance vs Performance

• Predictions that are significantly better than random is important, and is a sine qua non for model interpretation

• BUT, it is also important to assure that the model performs sufficiently well for the intended uses of the output

• Performance measures include omission rate, correct classification rate, etc.

Page 24: Model evaluation 201606