Decision Analysis - MIT OpenCourseWare · Instructors: Professor Lucila Ohno-Machado and Professor...

Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo

6.873/HST.951 Medical Decision Support Fall 2005

Decision Analysis (part 2 of 2)

Review Linear Regression

Lucila Ohno-Machado

Outline

• Homework clarification • Sensitivity, specificity, prevalence• Cost-effectiveness analysis • Discounting cost and utilities • Review of Linear Regression

2 x 2 table(contingency table)

PPD+ PPD

2 10TB 8

no TB 3 87 90

11 89 100

Prevalence of TB = 10/100 Sensitivity of test = 8/11 Specificity of test = 87/89

threshold

normal Disease

FN

True Negative (TN)

FP

True Positive (TP)

0 3mm 10

Sens = TP/TP+FN

Spec = TN/TN+FP

PPV = TP/TP+FP

NPV = TN/TN+FN

Accuracy = TN +TP

nl D

“nl”

“D”

- +

TN

TPFP

FN

“nl” “D”

Sens = TP/TP+FN40/50 = .8

Spec = TN/TN+FP45/50 = .9

PPV = TP/TP+FP40/45 = .89

NPV = TN/TN+FN45/55 = .81

Accuracy = TN +TP85/100 = .85

nl D

“nl”

“D”

“nl” “D”

45

405

10

nl disease

threshold

TN

FP

TP

Sensitivity = 50/50 = 1 Specificity = 40/50 = 0.8

“D”

“nl”

nl D

40

5010

0

50 50

40

60

0.0 0.4 1.0

nl disease

threshold

FN

TN

FP

TP

“D”

“nl”

nl D

45

405

10

50 50

50

50

Sensitivity = 40/50 = .8 Specificity = 45/50 = .9

0.0 0.6 1.0

nl disease

threshold

FN

TN TP

Sensitivity = 30/50 = .6 Specificity = 1

“D”

“nl”

nl D

50

300

20

50 50

70

30

0.0 0.7 1.0

Cost-effectiveness analysis

• Comparison of costs with health effects– Cost per Down case syndrome averted

– Cost per year of life saved • Perspectives (society, insurer, patient)• Comparators

– Comparison with doing nothing – Comparison with “standard of care”

Discounting costs

• It is better to spend $10 next year than today (its value will be only $9.52, assuming 5% rate)

• Even better to spend it 2 years from now ($9.07)• For cost-effectiveness analysis spanning

multiple years, recommended rate is usually 5%

• C = C0 + C1/(1-r)1 + C2/(1-r)2 + ...• C0 are costs at time 0

Discounting utilities

• Value for full mobility is 10 today (is it only 9.52 next year?)

• Should the discount rate be the same as for costs?

• If smaller, then it would always be better to wait one more year…

Levels of Evidencein Evidence-Based Medicine

US Task Force

• Level 1: at least 1 randomized controlled trial (RCT)

• Level 2-I: controlled trials (CT) • Level 2-II: cohort or case-control study

• Level 2-III: multiple time series with or without the intervention

• Level 3: expert opinions

Examples

• Cost per year of life saved, Life years/US$1M

• By pass surgery middle-aged man– $11k/year, 93

• CCU for low risk patients – $435k/year, 2

Importance of good stratification

• Bypass surgery – Left main disease 93 – One vessel disease 12

• CCU for chest pain – 5% risk of MI 2 – 20% risk of MI 10

Intro to Modeling

Univariate Linear Model• Y is what we want to predict (dependent variable)• X are the predictors (independent variables) • Y=f(X), where f is a linear function

y

y = 1β0 + x1β1 1

y = 1β + x2β2 0 1

…

x

Univariate Linear Model

ββ1 is slope

0 is intercept

y = 1β0 + x11

y = 1β + x22 0…

1

1

β

β y

x

2

1

Multivariate Model

• Simple model: structure and parameters

• 3 predictors, 4 parameters β

• one of the parameters (β0) is a constant

+β0 +β1 2 +β ββx

3

23 3

1=y x11 x12 x13

ββββ

⎢⎢⎢⎢

⎥⎥⎥⎥

0

1

2

3

1 1⎡ ⎡⎤ 0 +β 1+β +β21=y x21 x22 ⎢

⎢⎢⎢

x11 x 21

x12 x 22

⎤ ⎥⎥⎥⎥

⎣ x13 x 23 ⎣⎦ ⎦

Notation and Terminology

• xi is vector of j inputs, covariates, independent

⎡⎤ 30⎡ ⎤ ⎡ ⎤age x1variables, or predictors for case i (i.e., what wex1

know for all cases)

⎢⎢⎢⎣

⎥⎥⎥⎦

⎢⎢⎢⎣

⎥⎥⎥⎦

⎢⎢⎢⎣

= 10

1

⎥⎥⎥⎦

salt smoke

= x2

x3

X is matrix of j x n T•⎡ ⎤x11x21xi column vectors (input

data for each case) ⎡x11x12 x13⎢

⎢⎢⎣

=⎥⎥⎥⎦

x12 x22 ⎢⎣x21x22 x23

⎤⎥⎦x13 x23

Prediction

• y is scalar: output,i dependent variable (i.e., what we want to predict)

e.g., mean blood pressure ⎡ predpat1 ⎤

⎥ = ⎡ 100⎤ ⎡ y1 ⎤

⎢ ⎢ ⎢ ⎥ ⎣ predpat2 ⎦ ⎣ 98 ⎦

⎥ = ⎣ y2 ⎦

• Y is vector of yi

Multivariate Linear Model

+β0 +β1 2 +β β3

βx23 3

1=y1 x11 x12 x13

0 +β 1+β 2 +β1=y2 x21 x22

Y = XTβ, where each x includes a term for 1 (constant) (x10=1,i x20=1, etc.) to be multiplied by the intercept β0

ββββ

⎢⎢⎢⎢

0

1

2

3

⎡ ⎤ ⎥⎥⎥⎥

⎡ ⎤ ⎡y1 ⎤x10 x11x12 x13⎢⎣

⎢⎣

⎥⎦

⎥⎦y2 x x x x20 21 22 23

⎣ ⎦

Regression and Classification

Regression: continuous outcome • E.g., blood pressure Y = f(X)

Classification: categorical outcome

• E.g., death (binary)

G = X G )ˆ (

Loss function

• Y and X random variables

• f(x) is the model • L(Y,f(X)) is the loss function (penalty for being

wrong) • It is a function of how much to pay for

discrepancies between Y (real observation) and f(X) estimated value for an observation)

• In several cases, we use only the error and leave the cost for the decision analysis model

Regression Problems

Let’s concentrate on simple errors for now:• Expected Prediction Error (EPE):

[Y-f(X)]2

|Y-f(X)| • These two error functions imply that errors

in both directions are considered the same way.

Univariate Linear Model

y1 = 1β0 + x1β1

y2 = 1β + x2β0 1 y

y = 1β + x1β + error1 0 1

y = 1β + x2β + error2 0 1

x

Squared Errors

n 2SSE = ∑(Y − Y )

i=1

)(ˆ

ˆ 10

XfY

xy ii

=

+= ββ

y

y

Linear regression

Conditioning on x y• x is a certain value

ˆ(x y ) = 1/ k ∑ yixi = x )

x f ) = Ave ( y | x = x )( i i

• Expectation isapproximated by average

• Conditioning is on x

x

k -Nearest Neighbors y• N is neighborhood

ˆ(x y ) = 1/ k ∑ yixi ∈Nk ( x )

x f ) = Ave ( y | x ∈ Nk ( x ))( i i x

• Expectation is approximated by average

• Conditioning is on neighborhood

Nearest Neighbors y• N is continuous

neighborhood

Y ( x) = 1/ n∑ w y i i

(f ( x) = y e WeightedAv | x)i x

w• Expectation isapproximated by weighted average

x

Minimize Sum of Squared Errors

y

yi = β 0 + β xi1

n n

SSE = ∑ ( y − y)2 2= ∑ [ y − (β + β x)] =i= 1 i=

0 11

n

= ∑ ( y2 2− 2 y(β + β x) + (β + β x) ) =i=

0 1 0 11

n2= ∑ ( y − 2 yβ − 2 yβ x + β 2 + 2 ββ x + β 2 2x )0 1 0 0 1 1

i= 1

(derivative wrt β 0) = 0

n 2 2 2∑ ( y − 2 yβ − 2 yβ x + β 2 + 2 β β x + β x )0 1 0 0 1 1

i= 1

n∂ SE = 2∑ ( + − β + β x) = 00 1∂β 0 i= 1

y

β n + β 1 ∑ x = ∑ y Normal equation 10

(derivative wrt β 1) = 0

n 2 2 2∑ ( y − 2 y β − 2 y β x + β 2 + 2 ββ x + β x )0 1 0 0 1 1

i = 1

n∂ SE = 2∑ (− x y + β x + β x 2 ) = 0

∂β 1 i = 1 0 1

Normal equation 2β 0 ∑ x + β 1 ∑ x 2 = ∑ x y

Solve system of normal equations

β n + β1 ∑ x = ∑ y Normal equation 10

β0 ∑ x + β1 ∑ x 2 = ∑ x y Normal equation 2

yβ0 = y − β x1

β1 =Σ( x − x )( y − y )

Σ( x − x )2

Limitations of linear regression

• Assumes conditional probability p(Y|X) is normal

• Assumes equal variance in every X

• It’s linear ☺

(but we can always use interaction or transformed terms)

Linear Regression for Classification?

y = p y=1

yi = β + β xi0 1

x

Linear Probability Model

y=1

x

ii xy 10ˆ ββ += for 0<= β0 + β1 xi <=1

for 0 >= β0 + β1 xi

for β0 + β1 xi >=1

0

1

Logit Model

∑=⎥⎦

⎤⎢⎣

⎡−

+=⎥⎦

⎤⎢⎣

⎡−

+=

+=

+

+

+−

ii

i

i

ii

i

x

x

i

xi

xp

p

xp

p

eep

ep

i

i

i

β

ββ

ββ

ββ

ββ

1log

1log

1

11

10

)(

10

10

10

logit

x

p=1

x

Two dimensions

x1

x2

Decision Analysis - MIT OpenCourseWare · Instructors: Professor Lucila Ohno-Machado and Professor...

Documents

Transcript of Decision Analysis - MIT OpenCourseWare · Instructors: Professor Lucila Ohno-Machado and Professor...