Moving further
description
Transcript of Moving further
- Word counts
- Speech error counts
- Metaphor counts
- Active construction counts
Moving furtherCategorical count data
Hissing Koreans
Winter & Grawunder (2012)
No. of Cases
Bentz & Winter (2013)
Poisson Model
Siméon Poisson
1898: Ladislaus Bortkiewicz
Army Corps
with few Horses
Army Corpslots of Horses
few deaths
lowvariability
many deaths
highvariability
The Poisson Distribution
Poisson Regression= generalized linear
model with Poisson error structure
and log link function
The Poisson ModelY ~ log(b0 + b1*X1 + b2*X2)
In R:
lmer(my_counts ~ my_predictors +(1|subject), mydataset, family="poisson")
Poisson model output
logvalues
predicted mean
rate
exponentiate
Poisson Model
- Focus vs. no-focus
- Yes vs. No
- Dative vs. genitive
- Correct vs. incorrect
Moving furtherBinary categorical data
Bentz & Winter (2013)
Case yes vs. no ~ Percent L2 speakers
Logistic Regression= generalized linear
model with binomial error structure
and logistic link function
The Logistic Modelp(Y) ~ logit-1(b0 + b1*X1 + b2*X2)
In R:
lmer(binary_variable ~ my_predictors +(1|subject), mydataset,family="binomial")
Probabilities and OddsProbability of
anEvent
Odds of anEvent
Intuition about Odds
N = 12
What are the odds that I pick a blue
marble?
Answer:2/10
Log odds
= logit function
Representative valuesProbability Odds Log odds (= “logits”)0.1 0.111 -2.1970.2 0.25 -1.3860.3 0.428 -0.8470.4 0.667 -0.4050.5 1 00.6 1.5 0.4050.7 2.33 0.8470.8 4 1.3860.9 9 2.197
Snijders & Bosker (1999: 212)
Bentz & Winter (2013)
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
Log odds when Percent.L2 = 0
Bentz & Winter (2013)
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
For each increase in Percent.L2 by 1%, how much the log odds decrease (= the slope)
Bentz & Winter (2013)
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
Logits or“log odds”
Exponentiate
Transform byinverse logit
Odds
Proba-bilities
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
Logits or“log odds”
Transform byinverse logit
Odds
Proba-bilities
exp(-6.5728)
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
Logits or“log odds”
exp(-6.5728)
Transform byinverse logit
0.001397878
Proba-bilities
Odds
> 1
< 1
Numeratormore likely
Denominator more likely
= event happens more often than
not
= event is more likely not to
happen
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
Logits or“log odds”
exp(-6.5728)
Transform byinverse logit
0.001397878
Proba-bilities
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
Logits or“log odds” logit.inv(1.4576) 0.81
Bentz & Winter (2013)
About 80%(makes sense)
Estimate Std. Error z value Pr(>|z|)(Intercept) 1.4576 0.6831 2.134 0.03286Percent.L2 -6.5728 2.0335 -3.232 0.00123
Case yes vs. no ~ Percent L2 speakers
Logits or“log odds” logit.inv(1.4576) 0.81
logit.inv(1.4576+-6.5728*0.3) 0.37
Bentz & Winter (2013)
= logit function
= inverse logit
function
= inverse logit
function
This is the famous “logistic
function”
logit-1
Inverse logit function
(transforms back toprobabilities)
logit.inv = function(x){exp(x)/(1+exp(x))}
(this defines the function in R)
GeneralLinear Model
GeneralizedLinear Model
GeneralizedLinearMixed Model
GeneralLinear Model
GeneralizedLinear Model
GeneralizedLinearMixed Model
GeneralLinear Model
GeneralizedLinear Model
GeneralizedLinearMixed Model
GeneralizedLinear Model
= “Generalizing” the General Linear Model to cases that don’t include continuous response variables (in particular categorical ones)
= Consists of two things: (1) an error distribution, (2) a link function
= “Generalizing” the General Linear Model to cases that don’t include continuous response variables (in particular categorical ones)
= Consists of two things: (1) an error distribution, (2) a link function
Logistic regression: Binomial distributionPoisson regression:Poisson distribution
Logistic regression:Logit link function
Poisson regression:Log link function
= “Generalizing” the General Linear Model to cases that don’t include continuous response variables (in particular categorical ones)
= Consists of two things: (1) an error distribution, (2) a link function
Logistic regression: Binomial distributionPoisson regression:Poisson distribution
Logistic regression:Logit link function
Poisson regression:Log link function
lm(response ~ predictor)
glm(response ~ predictor,family="binomial")
glm(response ~ predictor,family="poisson")
Categorical Data
Dichotomous/Binary Count
Logistic Regression
PoissonRegression
General structure
Linear Modelcontinuous ~ any type of variable
Logistic Regressiondichotomous ~ any type of variable
Poisson Regressioncount ~ any type of variable
For the generalized linearmixed model…
… you only have to specify the family.
lmer(…)lmer(…,family="poisson")lmer(…,family="binomial")
That’s it(for now)