Assessing ProbabilitiesAssessing Probabilities in Risk and ...

24
2013-10-15 1 Assessing Probabilities Assessing Probabilities in Risk and Decision Analysis Aron Larsson SU/DSV and Aron Larsson SU/DSV and MIUN/ITM Probabilities Probabilities in risk in risk analysis analysis A measure of uncertainty of an A measure of uncertainty of an observable quantity Y Probabilities are subjective Based on the assessor’s knowledge There exists no trueprobability There exists no true probability assignment

Transcript of Assessing ProbabilitiesAssessing Probabilities in Risk and ...

2013-10-15

1

Assessing ProbabilitiesAssessing Probabilities in Risk and Decision Analysis

Aron Larsson SU/DSV andAron Larsson SU/DSV and MIUN/ITM

ProbabilitiesProbabilities in risk in risk analysisanalysis

”A measure of uncertainty of an A measure of uncertainty of an observable quantity Y”

Probabilities are subjectiveBased on the assessor’s knowledge

There exists no ”true” probabilityThere exists no true probability assignment

2013-10-15

2

The The basicbasic problemproblem

Given a measurable quantity Y we want Given a measurable quantity Y, we want to specify a probability distribution P(Y ≤ y) for y > 0

This also given a background information K represented as hard data y1, …, yn and as expert knowledgeThe hard data is more or less relevant

EvaluatingEvaluating probabilityprobabilityassignmentsassignments Pragmatic criterion Pragmatic criterionAccordance with observable data

Semantic criterionCalibration, accordance with future outcomes

Syntactic criterion Syntactic criterionCoherence, assigned probabilities should

conform to the laws of probability theory

2013-10-15

3

UsingUsing classicalclassical statisticsstatistics

Let Y be a binary quantity (one or zero) Let Y be a binary quantity (one or zero)

P(Y = 1) = (1/n) Σi yi = (y1 + y2 + … + yn )/n

Let Y be a real valued quantity

P(Y ≤ y) = (1/n) Σi I(yi ≤ y)Where I() is the indicator funtionWhere I() is the indicator funtion

Needs n observations, n must be ”sufficiently large”≥ 10 provided that not all H are either 0 or 1)

Maximum Maximum likelihoodlikelihood estimationestimation

Assume that we have data from a known Assume that we have data from a known parametric distribution (normal, poisson, beta etc.)

We wish to estimate the parameters θ = (θ1, …, θn) of the distribution

The MLE is the value of the parameters that makes the observed data most likely

2013-10-15

4

Maximum Maximum likelihoodlikelihood estimationestimation

We have n i i d samples x x We have n i.i.d. samples x1, …, xn

Specify the joint distribution

f(x1, …, xn | θ) = f(x1| θ) f(x2| θ)… f(xn | θ)

Now view x1, …, xn as the parameters and let θ vary then a likelihood function for θlet θ vary, then a likelihood function for θcan be formulated as

L(θ | x1, …, xn ) = Πi f(xi | θ)

Maximum Maximum likelihoodlikelihood estimationestimation

L(θ | x x ) = Π f(x | θ) L(θ | x1, …, xn ) = Πi f(xi | θ)

ln L(θ | x1, …, xn ) = Σi ln f(xi | θ)

Now we estimate θ by finding a value that maximises Lmaximises L

As it turns out, this is very easy for some parametric distributions (the normal, the exponential, the poisson)

2013-10-15

5

BayesianBayesian analysisanalysis

Update probabilities when new information Update probabilities when new information becomes available

We are interested in the probability of Θ = θ, which may be updated by observing x

nj ji

iii APABP

APABPBAP1 ︶︵︶|︵

︶︵︶|︵︶|︵

Prior Prior probabilitiesprobabilities

Let Θ = 0 mean ”not ill” Θ = 1 mean Let Θ = 0 mean not ill , Θ = 1 mean ”moderately ill, ” Θ = 2 mean ”seriously ill”

We need a prior probability distribution π(Θ = θ), which we assume can be retrieved from, e.g., health statistics

The prior probability is the probabilities we have over the outcomes of Θ before new information

2013-10-15

6

LikelihoodLikelihood principleprinciple

The likelihood principle in Bayesian The likelihood principle in Bayesian analysis makes explicit the natural conditional idea that only the actual observed x should be relevant to conclusions or evidence about Θ

For observed data x, the function L(θ) = f(x | θ) is called the ”likelihood function”

Note: xgiven θ

here

LikelihoodLikelihood principleprinciple ((cont’dcont’d))

In making inferences or decisions about θ In making inferences or decisions about θafter x is observed, all relevant experimental information is contained in the likelihood function for the observed x.

2013-10-15

7

LikelihoodLikelihood functionfunction

Assume we can conduct a test on a Assume we can conduct a test on a patient yielding positive (1) or negative (0)

We need to know about the dependencies, i.e. the conditional probabilities:

P(X = 1 | Θ = 2) = 0.9 ( | )

P(X = 1 | Θ = 1) = 0.6

P(X = 1 | Θ = 0) = 0.1

We refer to this as the likelihood function L(x | θ).

LikelihoodLikelihood functionfunction and marginaland marginal

Knowing the likelihood function we can Knowing the likelihood function we can simply obtain P(X = x), i.e. the marginal distribution of X, labelled m(x | π) or m(x)For example

m(1) = P(X 1 | Θ 2)P(Θ 2) +P(X = 1 | Θ = 2)P(Θ = 2) +P(X = 1 | Θ = 1)P(Θ = 1) +P(X = 1 | Θ = 0)P(Θ = 0) =0,9 · 0,02 + 0,6 · 0,1 + 0,1 · 0,88 = 0,166

2013-10-15

8

LikelihoodLikelihood functionfunction and marginaland marginal

The marginal density of X is The marginal density of X is

In the discrete case, this is

︶︵︶|︵︶︵ dFxfxm

)()|( xf

In the continuous case

)()|( xf

d)()|(xf

BayesianBayesian updatingupdating

”Knowing” π(θ) m(x) and L(θ) we are Knowing π(θ), m(x), and L(θ), we are now interested in π(θ | x), or P(Θ = θ | X = x)That is, we are interested in the probability of

the outcomes θ having observed x

2013-10-15

9

The The posteriorposterior distributiondistribution

Let π(Θ = 2) = 0 02 be a prior probability Let π(Θ = 2) = 0.02 be a prior probability, we now observe X=1 thenπ(2 | 1) = f(1 | 2) π(2) / m(1) = 0,9 · 0,02 / 0,166 = 0,11

This is now our posterior probability of Θ = 2, or π(2 | 1) = 0,11

In this discrete case, this is called the Bayes’ theorem

AddingAdding moremore informationinformation

So we now ”know” that P(Θ = 2 | X = 1) = So, we now know that P(Θ = 2 | X = 1) = 0.11, what if that is not enough?Additional information can be sought

Let X := X1 and do another test X2 (which is conditionally independent of the first test)

Now we are interested in the following, P(Θ = 2 | X1 = 1, X2 = 1)

2013-10-15

10

LikelihoodLikelihood functionfunction againagain

From the independence of the tests the From the independence of the tests, the likelhood function’s properties is not changedP(X1 = 1 | Θ = 2, X2 = 1) = 0.9 P(X1 = 1 | Θ = 1, X2 = 1) = 0.6 P(X = 1 | Θ = 0 X = 1) = 0 1P(X1 = 1 | Θ = 0, X2 = 1) = 0.1

Replace π(Θ = θ) with π(Θ = θ | X1 = 1) and update again

PosteriorPosterior as new prioras new prior

Replacing π(θ) Replacing π(θ) P(Θ = 2 | X1 = 1) = 0.11

P(Θ = 1 | X1 = 1) = 0.6*0.10 _ = 0.360.9*0.2 + 0.6*0.1 + 0.1*0.88

P(Θ = 0 | X1 = 0) = 0.1*0.88 _ = 0.530.9*0.2 + 0.6*0.1 + 0.1*0.88

2013-10-15

11

BayesianBayesian updatingupdating againagain

P(Θ = 2 | X = 1 X = 1) = P(Θ = 2 | X1 = 1, X2 = 1) =

0.9*0.11 _0.9*0.11 + 0.6*+0.36 + 0.1*0.53

= 0.27

ObservableObservable parametersparameters

Assume we make n observations collected Assume we make n observations collected in x = (x1, …, xn)

Each xi is indepent of each other and identically distributed, then we have a joint distribution of the data

n

iin xpxxp

11 ︶|︵︶|,...,︵

2013-10-15

12

BayesianBayesian updatingupdating in generalin general

Let f be a distribution in general we can Let f be a distribution, in general, we can write the conditional distribution of θ given x, which is called the posterior distribution, as

)()|()|(

xLf

)(

)()|()|(

xmxf

BayesianBayesian updatingupdating: : ExampleExample

Quality engineering – sampling by attributesQuality engineering sampling by attributesWe produce N items in a lot, we want at most that

0,35% of these are ”non-conforming” in terms of quality (the acceptance quality limit a is 0,35%)

We assume a prior distribution over a

Then we look at n items from N, the sample size

The probability of finding zero non-conforming items in our sample given a certain a is our likelihood function

Finding zero non-conforming items will then increase our confidence in that the quality is better than a

2013-10-15

13

BayesianBayesian updatingupdating: : ExampleExampleLot size (500 ‐ 10 

000 m2) AQL SQLSamplesize

Prior conf.

Post. Conf. Diff

Pr. findone

Sampling start cost

Cost. / m2

Cost. / %

Tot. Cost

$ $ $$400,0

5 000 0,20% 0,35% 200 80,22% 90,14% 9,92% 33,54% $0,00 $2,00 $40,32 0

0,5

0,6

0,7

0,8

0,9

1

Before

Aft

0

0,1

0,2

0,3

0,4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

After

WhenWhen data is data is missingmissing

When data is lacking or existing data only When data is lacking or existing data only partially relevant

Expert elicitationDirect assessment

Reference gamesg

Pearson-Tukey

2013-10-15

14

Expert Expert elicitiationelicitiation: : ProbabilityProbabilitywheelwheel

Two adjustable sectors, a spinner for visually generating random events of specified probability.

When the expert feels that the probability of ending up in the blue sector is the same as the probability of the event ofprobability of the event of interest, the probability of the event “equals” the proportion of the blue sector.

Expert Expert elicitationelicitation: Indifferent : Indifferent bets approachbets approach Observe how experts Bet 1 Win €X if Observe how experts

behave in gambling situations

Assume that you are indifferent between these two bets

Bet 1 Win €X if Italy win

Lose €Y if France win

Bet 2 Lose €X if Italy winthese two bets

Further assume ”risk-neutrality w.r.t. money”

Italy win

Win €Y if France win

2013-10-15

15

Expert Expert elicitationelicitation: Indifferent : Indifferent bets approachbets approach Since their expected Bet 1 Win €X if Since their expected

utility is equal, this yields:P(Italy win) = Y/(X+Y)

Why?

Bet 1 Win €X if Italy win

Lose €Y if France win

Bet 2 Lose €X if Italy winItaly win

Win €Y if France win

Expert Expert elicitationelicitation: Indifferent : Indifferent bets approachbets approach So letting X = €10 Bet 1 Win €X if So, letting X €10

and Y = €15 then P(Italy win) = 15/25 = 3/5

So, based on the behaviour, the elicited probability that Italy

Bet 1 Win €X if Italy win

Lose €Y if France win

Bet 2 Lose €X if Italy winprobability that Italy

will win is 3/5Italy win

Win €Y if France win

2013-10-15

16

Expert Expert elicitationelicitation: The : The referencereferencelotterylottery approachapproach

Compare the two lotteries lottery 1 Compare the two lotteries, lottery 1 If Italy wins you will get 2 weeks paid vacation

(Prize A) on a very nice location

Otherwise you’ll get a glass of beer (Prize B)

With lottery 2Win Prize A with probability p

Win Prize B with probability 1-p

Expert Expert elicitationelicitation: The : The referencereferencelotterylottery approach (approach (cont’dcont’d))

Adjust p until you are indifferent between Adjust p until you are indifferent between the two lotteries

When you are indifferent, p is your subjective probability that Italy will win

2013-10-15

17

ContinousContinous probabilitiesprobabilities

In the case of an uncertain but continous In the case of an uncertain but continousquantity

For example: ”The outcome is a real number between 0 and 1000” as oppose to ”the outcome will be either A, B, or C” as

is the case for finite quantities

C ti titi ft i d i i Continous quantities often emerge in decision problems, for example in variables such as demand, sales etc.

CumulativeCumulative assessmentassessment

Consider: ”The outcome x of random variable (eventConsider: The outcome x of random variable (event node) E is a real number between 0 and 1000”

Cumulative assessments would be P(x ≤ 200) = 0.1 P(x ≤ 400) = 0.3 P(x ≤ 600) = 0.6 P(x ≤ 800) = 0.95 P( ≤ 1000) 1 P(x ≤ 1000) = 1 as oppose to ”the outcome will be either A, B, or C” as is the

case for finite quantities

2013-10-15

18

CumulativeCumulative assessmentassessment graphgraph

Cum. Prob.

0,4

0,6

0,8

1

1,2

Cum. Prob.

0

0,2

Value

FractilesFractiles

P(x ≤ a0 3) = 0 3 P(x ≤ a0.3) = 0.3The number a0.3 is the 0.3 fractile of the

distribution

The a0.5 is the median of the distribution, the probability of ending up with an outcome lower a0 5 than is just as likely asoutcome lower a0.5 than is just as likely as ending up with an outcome greater than a0.5

2013-10-15

19

QuartilesQuartiles

P(x ≤ a ) = 0 25 (first quartile) P(x ≤ a0.25) = 0.25 (first quartile)

P(x ≤ a0.5) = 0.5 (second quartile)

P(x ≤ a0.75) = 0.75 (third quartile)

ExtendedExtended PearsonPearson--TukeyTukey methodmethod

A simple but useful three-point approximation A simple but useful three-point approximation

Suitable when the distribution is assumed to be symmetric

Uses the median and the 0.05 and 0.95 fractiles Assign these three points specific probabilities

P(a ) = 0 185 P = 0 185 a0 05 P(a0.05) = 0.185

P(a0.5) = 0.63

P(a0.95) = 0.185

P 0.185

P = 0.185

P = 0.63

a0.05

a0.5

a0.95

2013-10-15

20

BracketBracket mediansmedians

Another fairly simple technique for Another, fairly simple, technique for approximating a continous distribution with a discrete one

Not as restricted to symmetric distributions as the Pearson-Tukey method

Consider P(a ≤ x ≤ b) the bracket median Consider P(a ≤ x ≤ b), the bracket median m* of this interval is where P(a ≤ x ≤ m*) = P(m* ≤ x ≤ b)

UsingUsing bracketbracket mediansmedians

Break the continous probability distribution Break the continous probability distribution into several equally likely intervals

Assess the bracket median for each such interval

2013-10-15

21

UsingUsing bracketbracket medians (medians (cont’dcont’d))

Cum. Prob.

0,4

0,6

0,8

1

1,2

Cu

m.

Pro

b.

Cum. Prob.

What is the bracket median for the interval [100, 500] in this probability distribution?

0

0,2

0 100 200 300 400 500 600 700 800 900 1000

Demand

ScoringScoring rulesrules

A scoring rule measure the accuracy of A scoring rule measure the accuracy of probabilistic predictions

Judge how well calibrated a probability assessment is

Notation: x = 1 if event does occur, x=0 if it does not

q = probability of occurrence reported by forecaster p = forecaster’s private probability of occurrence A proper scoring rule should provide maximum score

when q=p

2013-10-15

22

ScoringScoring rulerule ((cont’dcont’d))

Let xq q2/2 so that assessor’s expected Let xq - q2/2 so that assessor s expected payoff is pq - q2/2

Derivative w.r.t. q: p - qSetting to 0 gives q=p

Note second derivative is negativeg

Assessor is motivated to tell the truthxq - q2/2 is a proper scoring rule

BrierBrier quadraticquadratic scoringscoring rulerule

1 (x q)2 1 - (x - q)2

Assessor’s expected payoff: 1 - p(1 - q)2 - (1 - p)q2

Derivative w.r.t. q:-2pq + 2p - 2(1-p)q = 2p-2q 2pq + 2p 2(1 p)q 2p 2q

Setting to 0 gives q=p

So the quadratic scoring rule is also proper

2013-10-15

23

LogarithmicLogarithmic scoringscoring rulerule

x log q + (1 x) log (1 q) x log q + (1-x) log (1-q)

Assessor’s expected payoff: p log q + (1-p) log (1-q)

Derivative w.r.t. qp/q - (1-p)/(1-q)p/q (1 p)/(1 q)

Setting to 0 gives q=p

Note second derivative is negative

So logarithmic is also proper

ScoringScoring rulerule exampleexample

Trial No: x:

0,3

Trial 1 1

Trial 2 1

Trial 3 1

Trial 4 0

Trial 5 1

Trial 6 1

Trial 7 0

Trial 8 1

Trial 9 1

Trial 10 1

Trial 11 1

0

0,05

0,1

0,15

0,2

0,25

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Series1

Trial 11 1

Trial 12 1

Trial 13 0

Trial 14 0

Trial 15 1

Trial 16 1

Trial 17 1

Trial 18 1

Trial 19 0

Trial 20 1

2013-10-15

24

ReadingsReadings

Aven Chapter 4 Aven Chapter 4