Regression Logistic - let.rug.nl · PDF fileInf. Stats Outline Logistic Regression Idea:...

32
Inf. Stats Logistic Regression Idea: Predict categorical variable using regression Examples surgery survival dependent on age, length of surgery, ... whether purchase occurs dependent on age, income, web-site characteristics, whether speech error occur as alcohol level increases when linguistic rules apply (final [t] in Dutch) dependent on speed of utterance, stress, social group, ... Very popular, especially in sociolinguistics. 1

Transcript of Regression Logistic - let.rug.nl · PDF fileInf. Stats Outline Logistic Regression Idea:...

��

Inf.

Sta

tsLo

gist

icR

egre

ssio

n

Idea

:P

redi

ctca

tego

rica

lvar

iabl

eus

ing

regr

essi

on

Exa

mpl

es

surg

ery

surv

ival

depe

nden

ton

age,

leng

thof

surg

ery,

...

whe

ther

purc

hase

occu

rsde

pend

ento

nag

e,in

com

e,w

eb-s

itech

arac

teris

tics,

whe

ther

spee

cher

ror

occu

ras

alco

holl

evel

incr

ease

s

whe

nlin

guis

ticru

les

appl

y(fi

nal

[t]in

Dut

ch)

depe

nden

ton

spee

dof

utte

ranc

e,st

ress

,soc

ialg

roup

,...

Ver

ypo

pula

r,es

peci

ally

inso

ciol

ingu

istic

s.

1

��

Inf.

Sta

tsR

egre

ssio

nTe

chni

ques

Attr

activ

e

allo

wpr

edic

tion

ofon

eva

riabl

eva

lue

base

don

one

orm

ore

othe

rs

allo

wan

estim

atio

nof

the

impo

rta

nce

ofva

rious

inde

pend

entf

acto

rs(c

f.

�� )

2

��

Inf.

Sta

tsO

utlin

eLo

gist

icR

egre

ssio

n

Idea

:P

redi

ctca

tego

rica

lvar

iabl

eus

ing

regr

essi

on

core

task

:an

alyz

ede

pend

ency

ofca

tego

rica

lvar

iabl

eon

othe

rsus

ing

regr

essi

on

prob

lem

:tr

ansl

atin

gre

gres

sion

tech

niqu

esto

cate

gori

cald

omai

n

key

step

:pr

edic

tcha

nce

ofca

tego

rica

lvar

iabl

e—

tran

sfor

min

gca

tego

rica

lto

num

eric

varia

ble

note

:in

depe

nden

tva

riabl

esm

aybe

num

eric

orca

tego

rica

l—

asin

regr

essi

onin

gene

ral,

sim

ple

orm

ultip

le

3

��

Inf.

Sta

tsC

hanc

eas

Dep

ende

ntV

aria

ble

Idea

:P

redi

ctch

ance

ofca

tego

rica

lvar

iabl

eas

depe

nden

tvar

iabl

eus

ing

regr

essi

on

real

chan

ces

par

epo

sitiv

enu

mbe

rs

���

prob

lem

:ho

wto

keep

pred

icte

dva

lues

inco

rrec

tbou

nds

solu

tion:

don’

tuse

chan

ces

dire

ctly

,but

rath

era

mor

eco

mpl

icat

edtr

ansf

orm

atio

n

4

��

Inf.

Sta

tsLo

git(

p)

�� ���

-5-4-3-2-1012345

00.

20.

40.

60.

81

logi

t(x)

����� ��

� ��� ����� � �� �����

logi

t

� ��� �� �� �

� ������ �� �� �

5

��

Inf.

Sta

tsLo

git(

p)vs

.Lo

gist

ic

use

oflo

gits

olve

spr

oble

ms

ofbo

unds

—w

epr

edic

tlog

itva

lues

(cf.

chan

ces

��� )

logi

tis

easi

lyin

terp

reta

ble

as“o

dds”

–“t

heod

dsof

Rea

laga

inst

Aja

xar

e4

to1”

—pr

obab

ility

is

�� ,

�������� �� �� �

�� �

why

the

nam

e‘lo

gist

ic’?

6

��

Inf.

Sta

tsW

hy‘lo

gist

ic’?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.91 -1

0-5

05

10

logi

stic

(x)

����� �

�� ����

Sim

ilarl

yco

nstr

ains

pred

icte

dva

lue

� :

���

7

��

Inf.

Sta

tsLo

gist

icvs

.Lo

git

Fun

ctio

ns

��� ! � � l

ogit

� ��

! � �� log

it

" #

��� log

it

" # ����

��� log

it

" #�� lo

git" #

���� lo

git" #�� log

it" #

�� � �� log

it

" # ��� log

it

" #

��

$ logit

" #

" ! %$ log

it

" # #�$� lo

git" #

$� logi

t" #

��

!

" ! %$� logi

t" # #

8

��

Inf.

Sta

tsS

trat

egy:

Pre

dict

Logi

tV

alue

s

logi

t

� ���&('�& !� ,w

here

� isth

ein

depe

nden

tvar

iabl

e

try

tofin

dop

timal

& '*)& ! gi

ven

data

note

that

we’

rese

ekin

ga

nonl

inea

rre

latio

nshi

p

9

��

Inf.

Sta

tsE

xam

ple:

Labo

v’s

NY

C/r

/st

udy

Will

iam

Labo

vex

amin

edva

riant

pron

unci

atio

nsof

sylla

ble-

final

/r/

inA

mer

ican

Eng

lish

([r]

vs[

+ ]).

New

York

used

tobe

like

Bos

ton,

final

/r/i

s[

+ ],bu

tits

tart

edch

angi

ngin

the

1950

’san

d19

60’s

.La

bov

hypo

thes

ized

aso

cial

basi

sfo

rth

ech

ange

.

32 302031

17

4

Saks

Mac

y’s

S.K

lein

N=

6812

571

/r/a

lloph

ones

mix

ed[r

,

, ]

allc

ons.

[r]

high

soci

alst

ratu

mS

aks

Mac

y’s

S.K

lein

low

soci

alst

ratu

m

10

��

Inf.

Sta

tsD

ata

onN

YC

/r/

Soc

ialS

tatu

sP

ronu

ncia

tion

of/r

/co

ns.

([r]

)vo

calic

([

+ ])m

ixed

high

306

32m

ediu

m20

7431

low

450

17

Wha

tsta

t.te

stis

need

edto

ask

whe

ther

soc.

stat

usin

fluen

ces

pron

unci

atio

nof

/r/?

11

��

Inf.

Sta

tsA

naly

zing

Soc

ial

Influ

ence

on/r

/

Wha

tsta

t.te

stis

need

edto

ask

whe

ther

soc.

stat

usin

fluen

ces

pron

unci

atio

nof

/r/?

�� test

ofin

depe

nden

ce(s

eeth

atse

ctio

n)—

ison

eno

min

alva

riabl

ede

pend

ento

nan

othe

r?

we

exer

cise

logi

stic

regr

essi

onfo

rtw

ore

ason

s:–

tom

easu

reth

ede

gree

ofde

pend

ence

–to

com

bine

with

ques

tions

offu

rthe

rde

pend

ence

12

��

Inf.

Sta

tsS

impl

ifyin

gth

eQ

uest

ion

Elim

inat

eth

e“m

ixed

-rre

port

s”:

Soc

ialS

tatu

sP

ronu

ncia

tion

of/r

/co

ns.

([r]

)vo

calic

([+ ])

mix

edhi

gh30

632

med

ium

2074

31lo

w4

5017

now

we’

repr

edic

ting

adi

cho

tom

ous

(tw

o-va

lued

)var

iabl

e(in

stea

dof

apo

lyto

mou

son

e).

Not

eth

atth

epr

edic

tor

isst

illpo

lyto

mou

s.

this

step

wou

ldbe

ques

tiona

ble

ifth

eca

tego

rybe

ing

elim

inat

eddo

min

ated

13

��

Inf.

Sta

tsC

odin

g

we

code

/r/a

s’0

,voc

alic

’and

’1,c

onso

nant

al’

rem

embe

rth

e“w

eigh

tby

freq

uenc

y”co

mm

and

SP

SS

offe

rsse

vera

lalte

rnat

ives

for

the

Inde

pend

entV

aria

ble

(Sta

tus)

“dum

my”

codi

ng(S

PS

S:“

indi

cato

r”)

isre

com

men

ded:

Sta

tus

expl

anat

ion

dum

my-

1du

mm

y-2

1(h

igh,

Sak

s)1

02

(mid

,Mac

y’s)

01

3(lo

w,S

.Kle

in)

00

14

��

Inf.

Sta

tsS

PS

SO

utpu

t—C

odin

g

Dependent

Variable

Encoding:

Original

Internal

Value

Value

00

[vocalic

pronunciation]

11

[consonantal

"]

Parameter

Value

Freq

Coding

(1)

(2)

SOC_STAT

12

1.000

.000

22

.000

1.000

32

.000

.000

15

��

Inf.

Sta

tsS

PS

SO

utpu

t

--------------------

Variablesin

theEquation

-----------------

Variable

BS.E.

Wald

df

Sig

RExp(B)

SOC_STAT

43.90

2.000

.42

SOC_STAT(1)

4.13

.69

36.38

1.000

.39

62.49

SOC_STAT(2)

1.22

.58

4.44

1.035

.10

3.38

Constant

-2.53

.52

23.63

1.000

Rec

allt

hatw

e’re

findi

ngth

epa

ram

eter

sto

the

follo

win

geq

uatio

n:

logi

t

� ���

&('�& !-!�

& �-�

�� � �� � -!

�� � �� � - �

�� �

16

��

Inf.

Sta

tsIn

terp

retin

gS

PS

SO

utpu

t

logi

t

� ���� � �� � -!

Sak

s,

- !��

�� � �� � - �

Mac

y’s,

- ���

�� �

S.K

lein

,- !�-��

�� � �� � �� �

Sak

s

�� � �� � �� � M

acy’

s

�� �

S.K

lein

17

��

Inf.

Sta

tsC

heck

ing

Inte

rpre

tatio

nof

Out

put

��� " ! �

#�

� � Sak

s

�� � M

acy’

s

�� �

S.K

lein

� � " ! �

# " ! �

#

� �����

���S

aks

� ���� ����

Mac

y’s

� ��� ��

��

S.K

lein

The

sein

deed

mat

chth

eda

tato

bepr

edic

ted.

18

��

Inf.

Sta

tsS

PS

SO

utpu

t

--------------------

Variablesin

theEquation

-----------------

Variable

BS.E.

Wald

df

Sig

RExp(B)

SOC_STAT

43.90

2.000

.42

SOC_STAT(1)

4.13

.69

36.38

1.000

.39

62.49

SOC_STAT(2)

1.22

.58

4.44

1.035

.10

3.38

Constant

-2.53

.52

23.63

1.000

Not

eth

at:

all

varia

bles

are

sign

ifica

nta

kind

of

. (

�/� )

isbe

ing

estim

ated

—w

ithou

tthe

cert

aint

yth

at.� )/� indi

cate

sex

plai

ned

varia

nce

Exp(B)

��0

19

��

Inf.

Sta

tsU

nder

stan

ding

SP

SS

Out

put

ClassificationTable

forUITSPRK

TheCut

Valueis.50

Predicted

01

PercentCorrect

0I

1Observed

+-------+-------+

00

I124

I6

I95.38%

+-------+-------+

11

I24

I30

I55.56%

+-------+-------+

Overall

83.70%

20

��

Inf.

Sta

tsP

redi

ctio

ns,

Cor

rect

ness

Predicted

[@]

[r]

PercentCorrect

Macy’s

I/Klein

ISaks

Observed

+-------+-------+

0[@]

I124

I6

I95.38%

+-------+-------+

1[r]

I24

I30

I55.56%

+-------+-------+

Overall

83.70%

Thi

ssh

ows

the

pred

ictio

nof

the

varia

ble

code

dfo

rst

atus

.

Not

eth

atw

e’re

pred

ictin

gth

atS

aks’

spr

onun

ciat

ions

shou

ldbe

all[

r]an

dth

eot

hers

all[

@](

schw

a).

21

��

Inf.

Sta

tsLo

gLi

kelih

ood

Var

ianc

ein

the

bino

mia

lcas

eis

�� ��� ,a

ndva

rianc

eof

the

num

ber

ofob

serv

a-tio

nsis

�1 ����"*2�1# w

here

the

posi

tive

valu

e[r

]was

seen

3 times

and

the

null

valu

e

��43� tim

es.

From

this

we

deriv

eth

elo

glik

elih

ood

5 :

5 �����1 � ���"*2�1#�3

� �����43� ���� ���

We

mea

sure

the

qual

ityof

the

mod

elus

ing

log

likel

ihoo

dan

des

timat

ing

the

para

-m

eter

sto

obta

inth

eop

timal

valu

e:

Ital

sotu

rns

outt

hat

�5 has

a�� di

strib

utio

nw

ith

��4�� de

gree

sof

free

dom

. 22

��

Inf.

Sta

tsLo

gP

roba

bilit

ies

-5

-4.5-4

-3.5-3

-2.5-2

-1.5-1

-0.50

00.

20.

40.

60.

81

ln(x

)

Ver

ylik

ely

even

ts(

�� )

cont

ribut

elit

tleto

log

likel

ihoo

ds.

23

��

Inf.

Sta

tsLo

gLi

kelih

ood

We

mea

sure

the

qual

ityof

the

mod

elus

ing

log

likel

ihoo

dan

des

timat

ing

the

para

-m

eter

sto

obta

inth

eop

timal

valu

e.W

eob

tain

the

optim

alva

lue

byus

ing

the

over

all

freq

uenc

ies

asa

best

gues

s:

Soc

ialS

tatu

sP

ronu

ncia

tion

of/r

/co

ns.

([r]

)vo

calic

([

+ ])hi

gh30

6m

ediu

m20

74lo

w4

50

tota

ls54

130

best

gues

s0.

293

0.70

7

24

��

Inf.

Sta

tsS

impl

est

Mod

el—

No

Soc

ial

Cla

ss

We

mea

sure

the

qual

ityof

the

mod

elus

ing

log

likel

ihoo

dan

des

timat

ing

the

para

-m

eter

sto

obta

inth

eop

timal

valu

e.

5�

3� ���� 43� � �� ���

������� � ���� ��������� � �

����� ��� ���������

��� � ��� � ���� �

�5�

���T

his

isth

esi

mpl

estm

odel

.

We

then

turn

toth

em

odel

whi

chdi

stin

guis

hes

Sak

sfr

omev

eryt

hing

else

.

25

��

Inf.

Sta

tsP

aram

eter

sin

New

Mod

el

We

exam

ine

the

new

mod

el,

whi

chds

iting

uish

estw

ocl

asse

s,fo

rw

hich

dist

inct

“bes

tgue

sses

”ar

eob

tain

ed,a

gain

usin

gth

eem

piric

alfr

eque

ncie

s:

Soc

ialS

tatu

sP

ronu

ncia

tion

of/r

/co

ns.

([r]

)vo

calic

([+ ])

prop

.r

high

306

0.83

3no

nhig

h24

124

0.16

2

26

��

Inf.

Sta

tsin

New

(Tw

o-C

lass

)M

odel

5�

3�������43� ���� ���

���� �� ����� ������ ��� �

����� ���� ���� ��

�� � ���

��� �

5�

3�������43� ���� ���

���� �� ����� ����� �������

����� ��� ������� �

��� ��� �

��� �

sum

��� � �

�5

���� �

27

��

Inf.

Sta

tsS

PS

SR

epor

ton

Exp

lain

edV

aria

nce

Beginning

BlockNumber

0.

Initial

LogLikelihood

Function

-2Log

Likelihood

222.7

[...]

Estimation

terminated

atiteration

number4because

Ldecreased

...

-2Log

Likelihood

158.3

Chi-Square

dfSignificance

Model

64.461

2.0000

Red

uctio

nin

�5 :

��� ���� ��� � is

the

best

mea

sure

ofth

equ

ality

ofth

em

odel

.

�� � is

��6 of

the

varia

nce

(

��� ).

28

��

Inf.

Sta

tsV

isua

lizin

gR

elat

ions

29

��

Inf.

Sta

tsA

naly

sis

ofR

esid

uals

Just

asin

linea

rreg

ress

ion,

usef

ulin

orde

rto

see

whe

repr

edic

tions

gow

rong

,whe

reot

her/

addi

tiona

lide

asm

ight

beus

eful

SP

SS

can

save

resi

dual

s(f

alse

pred

ictio

ns).

Labo

v’s

data

isno

tava

ilabl

eex

cept

inth

eta

bula

rfo

rmus

ed,s

ow

eca

nnot

exam

ine

the

resi

dual

she

re.

30

��

Inf.

Sta

tsLo

gist

icR

egre

ssio

n

Idea

:P

redi

ctca

tego

rica

lvar

iabl

eus

ing

regr

essi

on

Exa

mpl

e:w

heth

erlin

guis

ticru

les

appl

y,e.

g.,s

ylla

ble-

final

[r]i

nN

YC

key

step

:pr

edic

tcha

nce

ofca

tego

rica

lvar

iabl

e—

tran

sfor

min

gca

tego

rica

lto

num

eric

varia

ble

—lo

git(

log-

odds

)tr

ansf

orm

atio

nus

ed

logi

t

������ �

� ��

inde

pend

entv

aria

bles

may

benu

mer

icor

cate

gori

cal

31

7 8Inf. Stats

Interpreting SPSS Output

932