binary heap, d-ary heap, binomial heap, amortized analysis ...
Binomial (or Binary) Logistic Regression
Transcript of Binomial (or Binary) Logistic Regression
![Page 2: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/2.jpg)
One independent variable, one (continuous) dependent variable.
Outcomei = Modeli + Errori
Yi = b0 + b1X1 + �i
b0: interception at y-axisb1: line gradientX1: predictor variable� : Error
X1 predicts Y.
Linear regression:Univariate
![Page 3: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/3.jpg)
Linear regression:Multivariate
Several independent variables, one (continuous) dependent variable.
Yi = b0 + b1X1 + b2X2 + … + bnXn + �i
b0: interception at y-axisb1: line gradientbn: regression coefficient of Xn
X1: predictor variable� : Error
X1 predicts Y.
![Page 4: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/4.jpg)
Assumption
• Linear regression assumes linear relationships between variables.• This assumption is usually violated when the dependent variable is
categorical.• The logistic regression equation expresses the multiple linear regression
equation in logarithmic terms and thereby overcomes the problem of violating the linearity assumption.
![Page 5: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/5.jpg)
Assumption cont.
logbase[number]log216 = 4 => 24 = 2 x 2 x 2 x 2 = 16
‘natural logarithm’: lnln = loge[number] | e = Eulers constant 2,7182818284…ln[odds] => ‘logit’
logit(p) =
elogit(p) =elogit(p) (1-p) = p = elogit(p) - pelogit(p)
p + pelogit(p) = elogit(p)
p(1+ elogit(p)) = elogit(p)
1 .p = 1+e-logit(p)
≈
)1(ln
pp−
pp−1
![Page 6: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/6.jpg)
Binary logistic regression:Univariate
One independent variable, one categorical dependent variable.
e xbbYP
)(1101
1)(
+−+=
P: probability of Y occuringe: natural logarithm base (= 2,7182818284…)b0: interception at y-axisb1: line gradient
X1 predicts the probability of Y.
![Page 7: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/7.jpg)
Binary logistic regression:Univariate cont.
As P(Y) ranges from 0 to 1, the logit ranges from -� to +�.
http://data.princeton.edu/wws509/notes/c3s1.html
![Page 8: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/8.jpg)
Binary logistic regression:Multivariate
Several independent variables, one categorical dependent variable.
P: probability of Y occuringe: natural logarithm baseb0: interception at y-axisb1: line gradientbn: regression coefficient of Xn
X1: predictor variable
X1 predicts the probability of Y.
ee
xbxbxbb
xbxbxbbYP
nn
nn
++++
++++
+=
...
...
22110
22110
1)(
![Page 9: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/9.jpg)
=> Linear regression predicts the value that Y takes.
Instead, in logistic regression, the frequencies of values 0 and 1 are used to predict a value:
=> Logistic regression predicts the probability of Y taking a specific value.
Binary logistic regression:Multivariate cont.
![Page 10: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/10.jpg)
Research question
• Broad: How intelligible is Danish to Swedish listeners without previous exposure? Here: Which factors predict whether a Danish word is easily decoded by Swedish pre-schoolers or not?
• Dependent variable: Word intelligibilityEvery word can be – Decoded (1), or– Not decoded (0)
• Independent variables:– Phonetic distance Continuous variable (0.00 - 1.00)– Toneme Binary variable (0,1)– Number of ‘difficult’ sounds for the listener Categorical variable (0,1,2,3,…)
Binary variable
![Page 11: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/11.jpg)
Experiment
• 50 Danish word were auditorily presented to 12 Swedish children via headphones
• Similarly, 200 pictures (i.e. 4 pictures per sound) were presented visually on a touch screen
• The children were instructed to point to the corresponding picture• Resulting data: Intelligibility scores per word per subject
50 x 12 = 600 scores
![Page 12: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/12.jpg)
Independent variables: Examples
• Phonetic distance: måne/måne hund/hund apa/abeSw. ������� ���� ����a�Da. ������� �� ��� ������
0% 50% 100%
• Swedish tonemes: Toneme 1 (e.g. bäbis ) Toneme 2 (e.g. äpple )(NB: Tonemes not found in test language,
only in listeners’ native language!)
Swedish Danish
bäbis vs äpple baby vs æble
• ‘Difficult sounds’: Danish sounds that have been shown to be significantly more difficult to decode for Swedes (Schüppert & Gooskens, in prep.): [�����������������
![Page 13: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/13.jpg)
Data
![Page 14: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/14.jpg)
Data Entry
![Page 15: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/15.jpg)
Data Entry
Block 1
![Page 16: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/16.jpg)
Data Entry
Block 2
![Page 17: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/17.jpg)
Data Entry
![Page 18: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/18.jpg)
Data Entry
![Page 19: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/19.jpg)
Output: Block 1 (Phonetic distance)
Improvement is significant: predictor ‘phonetic distance’contributes to the model
12.02 =RCS17.02 =RN
Improvement through addedvariable‘phonetic distance’
-2LL:Amount of unexplained variance
![Page 20: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/20.jpg)
Output: Block 1 (Phonetic distance)
76,80 + 498,34 = 575,14
Model predicts correctvalue in 75% of the cases.
Exp(B) < 1Indicates that phoneticdistance correlates negativelywith intelligibility.
![Page 21: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/21.jpg)
Output: Block 1 (Phonetic distance)
76,80 + 498,34 = 575,14
Nondecoded stimuli seem to be difficult to predict (the zeroes should be concentrated further to left).
Decoded stimuli are more correctly predicted by the model (note the 1-columns on the right hand side of the plot).
![Page 22: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/22.jpg)
Output: Block 2 (Phonetic distance, Toneme, Difficult Sounds)
14.02 =RCS21.02 =RN
Significant value:indicates that one orboth of the newpredictors improve thenew model.
Model has further improved through added variable(s)
(Block 1: )17.2 =RN12.02 =RCS
-2LL:Amount of unexplained variance is reduced from 513,09 to 498,34
![Page 23: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/23.jpg)
Output: Block 2 (Phonetic distance, Toneme, Difficult Sounds)
76,80 + 498,34 = 575,14
Non-significant valueindicates that variable ‘toneme’does not improve the model.
Significant valueindicates that variable ‘difficultsounds’ improves the model.Exp(B) < 1 indicates a negativecorrelation.
![Page 24: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/24.jpg)
Results and Conclusion
76,80 + 498,34 = 575,14
Phonetic distance correlates negatively with intelligibility and contributes significantly to the model.
Tonemes seem not to be contributing to the model. This phenomenon, that listeners are familiar with from their native language but that is missing in the test language, does not seem to puzzle the listeners.
The number of difficult sounds correlate negatively with intelligibility and contribute significantly to the model.
Together, phonetic distance and number of strange sounds account for 14%to 21% of the variance.
![Page 25: Binomial (or Binary) Logistic Regression](https://reader034.fdocuments.us/reader034/viewer/2022051405/588dbe5e1a28ab7f208be941/html5/thumbnails/25.jpg)
References
76,80 + 498,34 = 575,14
• Cohen, J., P. Cohen, S.G. West, L.S. Aiken. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. London: Lawrence Erlbaum.
• Field, A. 2005. Discovering Statistics Using SPSS. London: Sage.• Rietveld, T., R. van Hout. 1993. Statistical Techniques for the Study of
Language and Language Behaviour. Berlin: Mouton de Gruyter.• Schüppert, A. & Gooskens, Ch. In prep. Easy sounds, difficult sounds.
Evidence from a word comprehension task with Swedish children listening to Danish.
• Sieben, I. Logistische regressie analyse: een handleiding. http://scm.ou.nl/Manual/logisticregreskun.html
• Tabachnick, B.G., L.S. Fidell. 2007. Using Multivariate Statistics. Boston: Pearson.