Regression Logistic - let.rug.nl · PDF fileInf. Stats Outline Logistic Regression Idea:...
Transcript of Regression Logistic - let.rug.nl · PDF fileInf. Stats Outline Logistic Regression Idea:...
��
Inf.
Sta
tsLo
gist
icR
egre
ssio
n
Idea
:P
redi
ctca
tego
rica
lvar
iabl
eus
ing
regr
essi
on
Exa
mpl
es
surg
ery
surv
ival
depe
nden
ton
age,
leng
thof
surg
ery,
...
whe
ther
purc
hase
occu
rsde
pend
ento
nag
e,in
com
e,w
eb-s
itech
arac
teris
tics,
whe
ther
spee
cher
ror
occu
ras
alco
holl
evel
incr
ease
s
whe
nlin
guis
ticru
les
appl
y(fi
nal
[t]in
Dut
ch)
depe
nden
ton
spee
dof
utte
ranc
e,st
ress
,soc
ialg
roup
,...
Ver
ypo
pula
r,es
peci
ally
inso
ciol
ingu
istic
s.
1
��
Inf.
Sta
tsR
egre
ssio
nTe
chni
ques
Attr
activ
e
allo
wpr
edic
tion
ofon
eva
riabl
eva
lue
base
don
one
orm
ore
othe
rs
allo
wan
estim
atio
nof
the
impo
rta
nce
ofva
rious
inde
pend
entf
acto
rs(c
f.
�� )
2
��
Inf.
Sta
tsO
utlin
eLo
gist
icR
egre
ssio
n
Idea
:P
redi
ctca
tego
rica
lvar
iabl
eus
ing
regr
essi
on
core
task
:an
alyz
ede
pend
ency
ofca
tego
rica
lvar
iabl
eon
othe
rsus
ing
regr
essi
on
prob
lem
:tr
ansl
atin
gre
gres
sion
tech
niqu
esto
cate
gori
cald
omai
n
key
step
:pr
edic
tcha
nce
ofca
tego
rica
lvar
iabl
e—
tran
sfor
min
gca
tego
rica
lto
num
eric
varia
ble
note
:in
depe
nden
tva
riabl
esm
aybe
num
eric
orca
tego
rica
l—
asin
regr
essi
onin
gene
ral,
sim
ple
orm
ultip
le
3
��
Inf.
Sta
tsC
hanc
eas
Dep
ende
ntV
aria
ble
Idea
:P
redi
ctch
ance
ofca
tego
rica
lvar
iabl
eas
depe
nden
tvar
iabl
eus
ing
regr
essi
on
real
chan
ces
par
epo
sitiv
enu
mbe
rs
���
prob
lem
:ho
wto
keep
pred
icte
dva
lues
inco
rrec
tbou
nds
solu
tion:
don’
tuse
chan
ces
dire
ctly
,but
rath
era
mor
eco
mpl
icat
edtr
ansf
orm
atio
n
4
��
Inf.
Sta
tsLo
git(
p)
�� ���
-5-4-3-2-1012345
00.
20.
40.
60.
81
logi
t(x)
����� ��
� ��� ����� � �� �����
logi
t
� ��� �� �� �
� ������ �� �� �
5
��
Inf.
Sta
tsLo
git(
p)vs
.Lo
gist
ic
use
oflo
gits
olve
spr
oble
ms
ofbo
unds
—w
epr
edic
tlog
itva
lues
�
(cf.
chan
ces
��� )
logi
tis
easi
lyin
terp
reta
ble
as“o
dds”
–“t
heod
dsof
Rea
laga
inst
Aja
xar
e4
to1”
—pr
obab
ility
is
�� ,
�������� �� �� �
�� �
why
the
nam
e‘lo
gist
ic’?
6
��
Inf.
Sta
tsW
hy‘lo
gist
ic’?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.91 -1
0-5
05
10
logi
stic
(x)
����� �
�� ����
Sim
ilarl
yco
nstr
ains
pred
icte
dva
lue
� :
���
7
��
Inf.
Sta
tsLo
gist
icvs
.Lo
git
Fun
ctio
ns
��� ! � � l
ogit
� ��
! � �� log
it
" #
��� log
it
" # ����
��� log
it
" #�� lo
git" #
���� lo
git" #�� log
it" #
�� � �� log
it
" # ��� log
it
" #
��
$ logit
" #
" ! %$ log
it
" # #�$� lo
git" #
$� logi
t" #
��
!
" ! %$� logi
t" # #
8
��
Inf.
Sta
tsS
trat
egy:
Pre
dict
Logi
tV
alue
s
logi
t
� ���&('�& !� ,w
here
� isth
ein
depe
nden
tvar
iabl
e
try
tofin
dop
timal
& '*)& ! gi
ven
data
note
that
we’
rese
ekin
ga
nonl
inea
rre
latio
nshi
p
9
��
Inf.
Sta
tsE
xam
ple:
Labo
v’s
NY
C/r
/st
udy
Will
iam
Labo
vex
amin
edva
riant
pron
unci
atio
nsof
sylla
ble-
final
/r/
inA
mer
ican
Eng
lish
([r]
vs[
+ ]).
New
York
used
tobe
like
Bos
ton,
final
/r/i
s[
+ ],bu
tits
tart
edch
angi
ngin
the
1950
’san
d19
60’s
.La
bov
hypo
thes
ized
aso
cial
basi
sfo
rth
ech
ange
.
32 302031
17
4
Saks
Mac
y’s
S.K
lein
N=
6812
571
/r/a
lloph
ones
mix
ed[r
,
, ]
allc
ons.
[r]
high
soci
alst
ratu
mS
aks
Mac
y’s
S.K
lein
low
soci
alst
ratu
m
10
��
Inf.
Sta
tsD
ata
onN
YC
/r/
Soc
ialS
tatu
sP
ronu
ncia
tion
of/r
/co
ns.
([r]
)vo
calic
([
+ ])m
ixed
high
306
32m
ediu
m20
7431
low
450
17
Wha
tsta
t.te
stis
need
edto
ask
whe
ther
soc.
stat
usin
fluen
ces
pron
unci
atio
nof
/r/?
11
��
Inf.
Sta
tsA
naly
zing
Soc
ial
Influ
ence
on/r
/
Wha
tsta
t.te
stis
need
edto
ask
whe
ther
soc.
stat
usin
fluen
ces
pron
unci
atio
nof
/r/?
�� test
ofin
depe
nden
ce(s
eeth
atse
ctio
n)—
ison
eno
min
alva
riabl
ede
pend
ento
nan
othe
r?
we
exer
cise
logi
stic
regr
essi
onfo
rtw
ore
ason
s:–
tom
easu
reth
ede
gree
ofde
pend
ence
–to
com
bine
with
ques
tions
offu
rthe
rde
pend
ence
12
��
Inf.
Sta
tsS
impl
ifyin
gth
eQ
uest
ion
Elim
inat
eth
e“m
ixed
-rre
port
s”:
Soc
ialS
tatu
sP
ronu
ncia
tion
of/r
/co
ns.
([r]
)vo
calic
([+ ])
mix
edhi
gh30
632
med
ium
2074
31lo
w4
5017
now
we’
repr
edic
ting
adi
cho
tom
ous
(tw
o-va
lued
)var
iabl
e(in
stea
dof
apo
lyto
mou
son
e).
Not
eth
atth
epr
edic
tor
isst
illpo
lyto
mou
s.
this
step
wou
ldbe
ques
tiona
ble
ifth
eca
tego
rybe
ing
elim
inat
eddo
min
ated
13
��
Inf.
Sta
tsC
odin
g
we
code
/r/a
s’0
,voc
alic
’and
’1,c
onso
nant
al’
rem
embe
rth
e“w
eigh
tby
freq
uenc
y”co
mm
and
SP
SS
offe
rsse
vera
lalte
rnat
ives
for
the
Inde
pend
entV
aria
ble
(Sta
tus)
“dum
my”
codi
ng(S
PS
S:“
indi
cato
r”)
isre
com
men
ded:
Sta
tus
expl
anat
ion
dum
my-
1du
mm
y-2
1(h
igh,
Sak
s)1
02
(mid
,Mac
y’s)
01
3(lo
w,S
.Kle
in)
00
14
��
Inf.
Sta
tsS
PS
SO
utpu
t—C
odin
g
Dependent
Variable
Encoding:
Original
Internal
Value
Value
00
[vocalic
pronunciation]
11
[consonantal
"]
Parameter
Value
Freq
Coding
(1)
(2)
SOC_STAT
12
1.000
.000
22
.000
1.000
32
.000
.000
15
��
Inf.
Sta
tsS
PS
SO
utpu
t
--------------------
Variablesin
theEquation
-----------------
Variable
BS.E.
Wald
df
Sig
RExp(B)
SOC_STAT
43.90
2.000
.42
SOC_STAT(1)
4.13
.69
36.38
1.000
.39
62.49
SOC_STAT(2)
1.22
.58
4.44
1.035
.10
3.38
Constant
-2.53
.52
23.63
1.000
Rec
allt
hatw
e’re
findi
ngth
epa
ram
eter
sto
the
follo
win
geq
uatio
n:
logi
t
� ���
&('�& !-!�
& �-�
�� � �� � -!
�� � �� � - �
�� �
16
��
Inf.
Sta
tsIn
terp
retin
gS
PS
SO
utpu
t
logi
t
� ���� � �� � -!
Sak
s,
- !��
�� � �� � - �
Mac
y’s,
- ���
�� �
S.K
lein
,- !�-��
�
�� � �� � �� �
Sak
s
�� � �� � �� � M
acy’
s
�� �
S.K
lein
17
��
Inf.
Sta
tsC
heck
ing
Inte
rpre
tatio
nof
Out
put
��� " ! �
#�
� � Sak
s
�� � M
acy’
s
�� �
S.K
lein
� � " ! �
# " ! �
#
�
� �����
���S
aks
� ���� ����
Mac
y’s
� ��� ��
��
S.K
lein
The
sein
deed
mat
chth
eda
tato
bepr
edic
ted.
18
��
Inf.
Sta
tsS
PS
SO
utpu
t
--------------------
Variablesin
theEquation
-----------------
Variable
BS.E.
Wald
df
Sig
RExp(B)
SOC_STAT
43.90
2.000
.42
SOC_STAT(1)
4.13
.69
36.38
1.000
.39
62.49
SOC_STAT(2)
1.22
.58
4.44
1.035
.10
3.38
Constant
-2.53
.52
23.63
1.000
Not
eth
at:
all
varia
bles
are
sign
ifica
nta
kind
of
. (
�/� )
isbe
ing
estim
ated
—w
ithou
tthe
cert
aint
yth
at.� )/� indi
cate
sex
plai
ned
varia
nce
Exp(B)
��0
19
��
Inf.
Sta
tsU
nder
stan
ding
SP
SS
Out
put
ClassificationTable
forUITSPRK
TheCut
Valueis.50
Predicted
01
PercentCorrect
0I
1Observed
+-------+-------+
00
I124
I6
I95.38%
+-------+-------+
11
I24
I30
I55.56%
+-------+-------+
Overall
83.70%
20
��
Inf.
Sta
tsP
redi
ctio
ns,
Cor
rect
ness
Predicted
[@]
[r]
PercentCorrect
Macy’s
I/Klein
ISaks
Observed
+-------+-------+
0[@]
I124
I6
I95.38%
+-------+-------+
1[r]
I24
I30
I55.56%
+-------+-------+
Overall
83.70%
Thi
ssh
ows
the
pred
ictio
nof
the
varia
ble
code
dfo
rst
atus
.
Not
eth
atw
e’re
pred
ictin
gth
atS
aks’
spr
onun
ciat
ions
shou
ldbe
all[
r]an
dth
eot
hers
all[
@](
schw
a).
21
��
Inf.
Sta
tsLo
gLi
kelih
ood
Var
ianc
ein
the
bino
mia
lcas
eis
�� ��� ,a
ndva
rianc
eof
the
num
ber
ofob
serv
a-tio
nsis
�1 ����"*2�1# w
here
the
posi
tive
valu
e[r
]was
seen
3 times
and
the
null
valu
e
��43� tim
es.
From
this
we
deriv
eth
elo
glik
elih
ood
5 :
5 �����1 � ���"*2�1#�3
� �����43� ���� ���
We
mea
sure
the
qual
ityof
the
mod
elus
ing
log
likel
ihoo
dan
des
timat
ing
the
para
-m
eter
sto
obta
inth
eop
timal
valu
e:
Ital
sotu
rns
outt
hat
�5 has
a�� di
strib
utio
nw
ith
��4�� de
gree
sof
free
dom
. 22
��
Inf.
Sta
tsLo
gP
roba
bilit
ies
-5
-4.5-4
-3.5-3
-2.5-2
-1.5-1
-0.50
00.
20.
40.
60.
81
ln(x
)
Ver
ylik
ely
even
ts(
�� )
cont
ribut
elit
tleto
log
likel
ihoo
ds.
23
��
Inf.
Sta
tsLo
gLi
kelih
ood
We
mea
sure
the
qual
ityof
the
mod
elus
ing
log
likel
ihoo
dan
des
timat
ing
the
para
-m
eter
sto
obta
inth
eop
timal
valu
e.W
eob
tain
the
optim
alva
lue
byus
ing
the
over
all
freq
uenc
ies
asa
best
gues
s:
Soc
ialS
tatu
sP
ronu
ncia
tion
of/r
/co
ns.
([r]
)vo
calic
([
+ ])hi
gh30
6m
ediu
m20
74lo
w4
50
tota
ls54
130
best
gues
s0.
293
0.70
7
24
��
Inf.
Sta
tsS
impl
est
Mod
el—
No
Soc
ial
Cla
ss
We
mea
sure
the
qual
ityof
the
mod
elus
ing
log
likel
ihoo
dan
des
timat
ing
the
para
-m
eter
sto
obta
inth
eop
timal
valu
e.
5�
3� ���� 43� � �� ���
������� � ���� ��������� � �
����� ��� ���������
��� � ��� � ���� �
�5�
���T
his
isth
esi
mpl
estm
odel
.
We
then
turn
toth
em
odel
whi
chdi
stin
guis
hes
Sak
sfr
omev
eryt
hing
else
.
25
��
Inf.
Sta
tsP
aram
eter
sin
New
Mod
el
We
exam
ine
the
new
mod
el,
whi
chds
iting
uish
estw
ocl
asse
s,fo
rw
hich
dist
inct
“bes
tgue
sses
”ar
eob
tain
ed,a
gain
usin
gth
eem
piric
alfr
eque
ncie
s:
Soc
ialS
tatu
sP
ronu
ncia
tion
of/r
/co
ns.
([r]
)vo
calic
([+ ])
prop
.r
high
306
0.83
3no
nhig
h24
124
0.16
2
26
��
Inf.
Sta
tsin
New
(Tw
o-C
lass
)M
odel
5�
3�������43� ���� ���
���� �� ����� ������ ��� �
����� ���� ���� ��
�� � ���
��� �
5�
3�������43� ���� ���
���� �� ����� ����� �������
����� ��� ������� �
��� ��� �
��� �
sum
��� � �
�5
���� �
27
��
Inf.
Sta
tsS
PS
SR
epor
ton
Exp
lain
edV
aria
nce
Beginning
BlockNumber
0.
Initial
LogLikelihood
Function
-2Log
Likelihood
222.7
[...]
Estimation
terminated
atiteration
number4because
Ldecreased
...
-2Log
Likelihood
158.3
Chi-Square
dfSignificance
Model
64.461
2.0000
Red
uctio
nin
�5 :
��� ���� ��� � is
the
best
mea
sure
ofth
equ
ality
ofth
em
odel
.
�� � is
��6 of
the
varia
nce
(
��� ).
28
��
Inf.
Sta
tsA
naly
sis
ofR
esid
uals
Just
asin
linea
rreg
ress
ion,
usef
ulin
orde
rto
see
whe
repr
edic
tions
gow
rong
,whe
reot
her/
addi
tiona
lide
asm
ight
beus
eful
SP
SS
can
save
resi
dual
s(f
alse
pred
ictio
ns).
Labo
v’s
data
isno
tava
ilabl
eex
cept
inth
eta
bula
rfo
rmus
ed,s
ow
eca
nnot
exam
ine
the
resi
dual
she
re.
30
��
Inf.
Sta
tsLo
gist
icR
egre
ssio
n
Idea
:P
redi
ctca
tego
rica
lvar
iabl
eus
ing
regr
essi
on
Exa
mpl
e:w
heth
erlin
guis
ticru
les
appl
y,e.
g.,s
ylla
ble-
final
[r]i
nN
YC
key
step
:pr
edic
tcha
nce
ofca
tego
rica
lvar
iabl
e—
tran
sfor
min
gca
tego
rica
lto
num
eric
varia
ble
—lo
git(
log-
odds
)tr
ansf
orm
atio
nus
ed
logi
t
������ �
� ��
inde
pend
entv
aria
bles
may
benu
mer
icor
cate
gori
cal
31