Autoterminal Śląsk Logistic Company profile. ATS Logistic – Shareholders 50%
C3.1.logistic intro
-
Upload
daniel-liao -
Category
Education
-
view
33 -
download
0
Transcript of C3.1.logistic intro
![Page 1: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/1.jpg)
Machine Learning Specialization
Linear classifiers: Logistic regression
Emily Fox & Carlos Guestrin Machine Learning Specialization
University of Washington
1 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 2: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/2.jpg)
Machine Learning Specialization
Predicting sentiment by topic: An intelligent restaurant review system
2 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 3: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/3.jpg)
Machine Learning Specialization 3
It’s a big day & I want to book a table at a nice Japanese restaurant
©2015-2016 Emily Fox & Carlos Guestrin
Seattle has many ★★★★
sushi restaurants
What are people saying about
the food? the ambiance?...
![Page 4: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/4.jpg)
Machine Learning Specialization 4
Positive reviews not positive about everything
©2015-2016 Emily Fox & Carlos Guestrin
Sample review: Watching the chefs create incredible edible art made the experience very unique.
My wife tried their ramen and it was pretty forgettable.
All the sushi was delicious! Easily best sushi in Seattle.
Experience
![Page 5: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/5.jpg)
Machine Learning Specialization 5
Classifying sentiment of review
©2015-2016 Emily Fox & Carlos Guestrin
Easily best sushi in Seattle.
Sentence Sentiment Classifier
![Page 6: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/6.jpg)
Machine Learning Specialization
Linear classifier: Intuition
6 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 7: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/7.jpg)
Machine Learning Specialization 7
Classifier
©2015-2016 Emily Fox & Carlos Guestrin
Sentence from
review
Classifier MODEL
Input: x Output: y Predicted class
Note: we’ll start talking about 2 classes, and address multiclass later
ŷ = +1
ŷ = -1
![Page 8: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/8.jpg)
Machine Learning Specialization 8
A (linear) classifier • Will use training data to learn a weight
or coefficient for each word
©2015-2016 Emily Fox & Carlos Guestrin
Word Coefficient good 1.0
great 1.5
awesome 2.7
bad -1.0
terrible -2.1
awful -3.3
restaurant, the, we, where, … 0.0
… …
![Page 9: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/9.jpg)
Machine Learning Specialization 10
Scoring a sentence
©2015-2016 Emily Fox & Carlos Guestrin
Word Coefficient good 1.0
great 1.2
awesome 1.7
bad -1.0
terrible -2.1
awful -3.3
restaurant, the, we, where, …
0.0
… …
Input xi: Sushi was great, the food was awesome, but the service was terrible.
Called a linear classifier, because output is weighted sum of input.
![Page 10: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/10.jpg)
Machine Learning Specialization 12
Score(x) = weighted count of words in sentence
If Score (x) > 0:
ŷ = Else:
ŷ =
Word Coefficient
… …
©2015-2016 Emily Fox & Carlos Guestrin
Sentence from
review
Input: x
Simple linear classifier
![Page 11: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/11.jpg)
Machine Learning Specialization 13
Training a classifier = Learning the coefficients
©2015-2016 Emily Fox & Carlos Guestrin
Data
(x,y) (Sentence1, ) (Sentence2, )
…
Training set
Validation set
Learn classifier
Accuracy
Word Coefficient
good 1.0
awesome 1.7
bad -1.0
awful -3.3
… …
![Page 12: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/12.jpg)
Machine Learning Specialization
Decision boundaries
14 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 13: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/13.jpg)
Machine Learning Specialization 15
Suppose only two words had non-zero coefficient
©2015-2016 Emily Fox & Carlos Guestrin
Word Coefficient
#awesome 1.0
#awful -1.5
#awesome 0 1 2 3 4 …
#aw
ful
0
1
2
3
4
…
Sushi was awesome, the food was awesome, but the service was awful.
Score(x) = 1.0 #awesome – 1.5 #awful
![Page 14: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/14.jpg)
Machine Learning Specialization 16
Decision boundary example
©2015-2016 Emily Fox & Carlos Guestrin
#awesome
#aw
ful
0 1 2 3 4 …
0
1
2
3
4
…
Score(x) > 0
Score(x) < 0
Word Coefficient
#awesome 1.0
#awful -1.5 Score(x) = 1.0 #awesome – 1.5 #awful
![Page 15: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/15.jpg)
Machine Learning Specialization 18
Decision boundary separates positive & negative predictions
• For linear classifiers: - When 2 coefficients are non-zero è line
- When 3 coefficients are non-zero è plane
- When many coefficients are non-zero è hyperplane
• For more general classifiers è more complicated shapes
©2015-2016 Emily Fox & Carlos Guestrin
![Page 16: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/16.jpg)
Machine Learning Specialization
Linear classifier: Model
20 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 17: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/17.jpg)
Machine Learning Specialization 21 ©2015-2016 Emily Fox & Carlos Guestrin
Training Data
Feature extraction
ML model
Quality metric
ML algorithm
y
ŷ
ŵ
h(x) x
![Page 18: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/18.jpg)
Machine Learning Specialization 23
Coefficients of classifier
©2015-2016 Emily Fox & Carlos Guestrin
#awesome
#aw
ful
x[1]
x[3]
Score(x) = w0 + w1
#awesome + w2
#awful + w3
#great
x[2
]
![Page 19: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/19.jpg)
Machine Learning Specialization 25
General notation
Output: y Inputs: x = (x[1],x[2],…, x[d]) Notational conventions:
x[j] = jth input (scalar) hj(x) = jth feature (scalar) xi = input of ith data point (vector) xi[j] = jth input of ith data point (scalar)
©2015-2016 Emily Fox & Carlos Guestrin
d-dim vector
{-1,+1}
![Page 20: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/20.jpg)
Machine Learning Specialization 26
Simple hyperplane
©2015-2016 Emily Fox & Carlos Guestrin
Model: ŷi = sign(Score(xi))
Score(xi) = w0 + w1 xi[1] + … + wd
xi[d] feature 1 = 1
feature 2 = x[1] … e.g., #awesome
feature 3 = x[2] … e.g., #awful
…
feature d+1 = x[d] … e.g., #ramen
= wTxi
![Page 21: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/21.jpg)
Machine Learning Specialization 27
Decision boundary: effect of changing coefficients
©2015-2016 Emily Fox & Carlos Guestrin
Input Coefficient Value
w0 0.0
#awesome w1 1.0
#awful w2 -1.5
#awesome
#aw
ful
0 1 2 3 4 …
0
1
2
3
4
…
Score(x) = 1.0 #awesome – 1.5 #awful
Score(x) > 0
Score(x) < 0
![Page 22: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/22.jpg)
Machine Learning Specialization 28
Input Coefficient Value
w0
#awesome w1 1.0
#awful w2 -1.5
Decision boundary: effect of changing coefficients
©2015-2016 Emily Fox & Carlos Guestrin
#awesome
#aw
ful
0 1 2 3 4 …
0
1
2
3
4
…
Score(x) =
Score(x) > 0
Score(x) < 0
0.0
1.0 #awesome – 1.5 #awful 1.0 + 1.0
![Page 23: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/23.jpg)
Machine Learning Specialization 29
Input Coefficient Value
w0
#awesome w1 1.0
#awful w2
Decision boundary: effect of changing coefficients
©2015-2016 Emily Fox & Carlos Guestrin
#awesome
#aw
ful
0 1 2 3 4 …
0
1
2
3
4
…
Score(x) =
Score(x) > 0
Score(x) < 0
1.0 + 1.0 #awesome – 1.5 #awful
1.0
-1.5 -3.0 – 3.0
![Page 24: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/24.jpg)
Machine Learning Specialization 30
More generic features… D-dimensional hyperplane
©2015-2016 Emily Fox & Carlos Guestrin
feature 1 = h0(x) … e.g., 1
feature 2 = h1(x) … e.g., x[1] = #awesome
feature 3 = h2(x) … e.g., x[2] = #awful or, log(x[7]) x[2] = log(#bad) x #awful
or, tf-idf(“awful”)
…
feature D+1 = hD(x) … some other function of x[1],…, x[d]
DX
j=0
Model: ŷi = sign(Score(xi))
Score(xi) = w0 h0(xi) + w1 h1(xi) + … + wD
hD(xi)
= wj hj(xi) = wTh(xi)
![Page 25: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/25.jpg)
Machine Learning Specialization 31 ©2015-2016 Emily Fox & Carlos Guestrin
Training Data
Feature extraction
ML model
Quality metric
ML algorithm
y ŵ
(either -1 or +1)
ŷ=sign(ŵTh(x))h(x) x
![Page 26: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/26.jpg)
Machine Learning Specialization
Are you sure about the prediction? Class probability
32 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 27: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/27.jpg)
Machine Learning Specialization 33
How confident is your prediction?
• Thus far, we’ve outputted a prediction or
• But, how sure are you about the prediction?
©2015-2016 Emily Fox & Carlos Guestrin
Definite
“The sushi & everything else were awesome!”
“The sushi was good, the service was OK”
Not sure
ŷ = +1 with high probability
ŷ = +1 with probability 0.5
![Page 28: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/28.jpg)
Machine Learning Specialization
Basics of probabilities – quick review
34 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 29: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/29.jpg)
Machine Learning Specialization 36
Basic probability
©2015-2016 Emily Fox & Carlos Guestrin
x = review text
y = sentiment
All the sushi was delicious! Easily best sushi in Seattle. +1
The sushi & everything else were awesome! +1
My wife tried their ramen, it was pretty forgettable. -1
The sushi was good, the service was OK +1
… …
Probability a review is positive is 0.7
I expect 70% of rows to have y = +1
(Exact number will vary for each specific dataset)
![Page 30: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/30.jpg)
Machine Learning Specialization 37
Interpreting probabilities as degrees of belief
©2015-2016 Emily Fox & Carlos Guestrin
0.0 1.0
P(y=+1)
Absolutely sure reviews are positive
Absolutely sure reviews are negative
Not sure if reviews are positive or negative
0.5
![Page 31: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/31.jpg)
Machine Learning Specialization 38
Key properties of probabilities
©2015-2016 Emily Fox & Carlos Guestrin
Property Two class (e.g., y is +1 or -1)
Multiple classes (e.g., y is dog, cat or bird)
Probabilities always between 0 & 1
Probabilities sum up to 1
![Page 32: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/32.jpg)
Machine Learning Specialization 39
Conditional probability
©2015-2016 Emily Fox & Carlos Guestrin
x = review text y = sentiment All the sushi was delicious! Easily best sushi in Seattle. +1
Sushi was awesome & everything else was awesome! The service was awful, but overall awesome place!
+1
My wife tried their ramen, it was pretty forgettable. -1
The sushi was good, the service was OK +1
… …
awesome … awesome … awful … awesome +1
… …
awesome … awesome … awful … awesome -1
… …
… …
awesome … awesome … awful … awesome +1
x = review text y = sentiment All the sushi was delicious! Easily best sushi in Seattle. +1
Sushi was awesome & everything else was awesome! The service was awful, but overall awesome place!
+1
My wife tried their ramen, it was pretty forgettable. -1
The sushi was good, the service was OK +1
… …
awesome … awesome … awful … awesome +1
… …
awesome … awesome … awful … awesome -1
… …
… …
awesome … awesome … awful … awesome +1
I expect 90% of rows with reviews containing
3 “awesome” & 1 “awful” to have y = +1
(Exact number will vary for each specific dataset)
Probability a review with 3 “awesome” and 1 “awful” is positive is 0.9
![Page 33: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/33.jpg)
Machine Learning Specialization 40
Interpreting conditional probabilities
©2015-2016 Emily Fox & Carlos Guestrin
0.0 1.0
P(y=+1|xi=“All the sushi was delicious!”)
Absolutely sure review “All the sushi was delicious!”
is positive
Absolutely sure review “All the sushi was delicious!”
is negative
Not sure if review “All the sushi was delicious!”
is positive or negative
0.5
Output label Input sentence
![Page 34: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/34.jpg)
Machine Learning Specialization 41
Key properties of conditional probabilities
©2015-2016 Emily Fox & Carlos Guestrin
Property Two class
(e.g., y is +1 or -1, xi is review text)
Multiple classes (e.g., y is dog, cat or bird,
xi is image)
Conditional probabilities always between 0 & 1
Conditional probabilities sum up to 1 over y, but not over x
![Page 35: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/35.jpg)
Machine Learning Specialization
Using probabilities in classification
43 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 36: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/36.jpg)
Machine Learning Specialization 44
How confident is your prediction?
©2015-2016 Emily Fox & Carlos Guestrin
Definite
“The sushi & everything else were awesome!”
“The sushi was good, the service was OK”
Not sure
P(y=+1| x= )
= 0.99
P(y=+1| x= )
= 0.55
“The sushi & everything else were awesome!”
“The sushi was good, the service was OK”
Many classifiers provide a degree of certainty:
P(y|x)
Extremely useful in practice
Output label Input sentence
![Page 37: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/37.jpg)
Machine Learning Specialization 45
Goal: Learn conditional probabilities from data
x[1] = #awesome x[2] = #awful y = sentiment
2 1 +1
0 2 -1
3 3 -1
4 1 +1
… … …
©2015-2016 Emily Fox & Carlos Guestrin
Training data: N observations (xi,yi)
Optimize quality metric on training data
Find best model P by finding best ŵ
⌃ Useful for
predicting ŷ
![Page 38: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/38.jpg)
Machine Learning Specialization 46
= estimate of class probabilities
If > 0.5:
ŷ = Else:
ŷ =
©2015-2016 Emily Fox & Carlos Guestrin
Sentence from
review
Input: x
Predict most likely class
P(y|x) ⌃
P(y=+1|x) ⌃
• Estimating improves interpretability: - Predict ŷ = +1 and tell me how sure you are
P(y|x) ⌃
![Page 39: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/39.jpg)
Machine Learning Specialization
Predicting class probabilities with generalized linear models
47 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 40: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/40.jpg)
Machine Learning Specialization 48 ©2015-2016 Emily Fox & Carlos Guestrin
Training Data
Feature extraction
ML model
Quality metric
ML algorithm
y
h(x)
ŵ
x P(y|x) ⌃
![Page 41: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/41.jpg)
Machine Learning Specialization 49
Thus far, we focused on decision boundaries
©2015-2016 Emily Fox & Carlos Guestrin
#awesome
#aw
ful
0 1 2 3 4 …
0
1
2
3
4
…
Score(x) > 0
Score(x) < 0
Score(xi) = w0 h0(xi) + w1 h1(xi) + … + wD
hD(xi)
= wTh(xi)
Relate Score(xi) to
P(y=+1|x,ŵ)? ⌃
![Page 42: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/42.jpg)
Machine Learning Specialization 50
Interpreting Score(xi)
©2015-2016 Emily Fox & Carlos Guestrin
-∞ +∞
Score(xi)=wTh(xi)
ŷi = +1ŷi = -1
Very sure ŷi = +1
Very sure ŷi = -1
P(y=+1|xi) = 1 ⌃
P(y=+1|xi) = 0 ⌃
Not sure if
ŷi = -1 or +1
P(y=+1|xi) = 0.5 ⌃
0.0
![Page 43: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/43.jpg)
Machine Learning Specialization 51
Why not just use regression to build classifier?
©2015-2016 Emily Fox & Carlos Guestrin
Score(xi) = w0h0(xi) + w1 h1(xi) + … + wD
hD(xi)
-∞ +∞0.0
0.50.0 1.0
P(y=+1|xi)
-∞ < Score(xi) < +∞
But probabilities between 0 and 1
How do we link -∞,+∞
to 0,1???
![Page 44: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/44.jpg)
Machine Learning Specialization 52
Link function: squeeze real line into [0,1]
©2015-2016 Emily Fox & Carlos Guestrin
-∞ +∞
Score(xi) = wTh(xi)
0.0 1.0
0.0
0.5
g(wTh(xi))
Link function
P(y=+1|xi) = ⌃
Generalized linear model
![Page 45: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/45.jpg)
Machine Learning Specialization 53 ©2015-2016 Emily Fox & Carlos Guestrin
Training Data
Feature extraction
ML model
Quality metric
ML algorithm
y ŵ
P(y=+1|x,ŵ) =g(ŵTh(x))
⌃
h(x) x
![Page 46: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/46.jpg)
Machine Learning Specialization
Logistic regression classifier: linear score with logistic link function
54 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 47: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/47.jpg)
Machine Learning Specialization 56
Logistic function (sigmoid, logit)
©2015-2016 Emily Fox & Carlos Guestrin
Score -∞ -2 0.0 +2 +∞
sigmoid(Score)
sigmoid(Score) =
1
1 + e
�Score
![Page 48: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/48.jpg)
Machine Learning Specialization 57
Logistic regression model
©2015-2016 Emily Fox & Carlos Guestrin
-∞ +∞
Score(xi) = wTh(xi)
0.0 1.0
0.0
0.5
P(y=+1|xi,w) = sigmoid(Score(xi))
![Page 49: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/49.jpg)
Machine Learning Specialization 58
Understanding the logistic regression model
©2015-2016 Emily Fox & Carlos Guestrin
P (y = +1 | x,w) =1
1 + e�w
>h(x)
1
1+
e�w
>h(x
)
w
>h(x)
Score(xi) P(y=+1|xi,w)
![Page 50: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/50.jpg)
Machine Learning Specialization 59
Logistic regression è Linear decision boundary
©2015-2016 Emily Fox & Carlos Guestrin
#awesome
#aw
ful
0 1 2 3 4 …
0
1
2
3
4
…
1
1+
e�w
>h(x
)
w
>h(x)
![Page 51: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/51.jpg)
Machine Learning Specialization 60
Effect of coefficients on logistic regression model
©2015-2016 Emily Fox & Carlos Guestrin
w0 0
w#awesome +1
w#awful -1
w0 0
w#awesome +3
w#awful -3
w0 -2
w#awesome +1
w#awful -1
#awesome - #awful
1
1+
e�w
>h(x
)
1
1+
e�w
>h(x
)
#awesome - #awful
1
1+
e�w
>h(x
)
#awesome - #awful
![Page 52: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/52.jpg)
Machine Learning Specialization 62 ©2015-2016 Emily Fox & Carlos Guestrin
Training Data
Feature extraction
ML model
Quality metric
ML algorithm
y ŵ
h(x) x
P(y=+1|x,ŵ) =sigmoid(ŵTh(x)) = 1 .
⌃
1 + e-ŵ h(x)T
![Page 53: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/53.jpg)
Machine Learning Specialization
Overview of learning logistic regression model
©2015-2016 Emily Fox & Carlos Guestrin
![Page 54: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/54.jpg)
Machine Learning Specialization 64
Training a classifier = Learning the coefficients
©2015-2016 Emily Fox & Carlos Guestrin
Data
(x,y) (Sentence1, ) (Sentence2, )
…
Training set
Validation set
Learn classifier
Quality
Word Coefficient Value
ŵ0 -2.0
good ŵ1 1.0
awesome ŵ2 1.7
bad ŵ3 -1.0
awful ŵ4 -3.3
… … …
P(y=+1|x,ŵ) =1 .
⌃
1 + e-ŵ h(x)T
![Page 55: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/55.jpg)
Machine Learning Specialization 66
Find “best” classifier = Maximize quality metric over all possible w0,w1,w2
©2015-2016 Emily Fox & Carlos Guestrin
#awesome
#aw
ful
0 1 2 3 4 …
0
1
2
3
4
…
ℓ(w0=1, w1=0.5, w2=-1.5) = 10-4
ℓ(w0=1, w1=1, w2=-1.5) = 10-5
ℓ(w0=0, w1=1, w2=-1.5) = 10-6
Best model: Highest likelihood ℓ(w) ŵ = (w0=1, w1=0.5, w2=-1.5)
Likelihood ℓ(w)
Find best model coefficients w with
gradient ascent!
![Page 56: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/56.jpg)
Machine Learning Specialization
Encoding categorical inputs
68 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 57: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/57.jpg)
Machine Learning Specialization 69
Categorical inputs • Numeric inputs:
- #awesome, age, salary,…
- Intuitive when multiplied by coefficient • e.g., 1.5 #awesome
• Categorical inputs:
©2015-2016 Emily Fox & Carlos Guestrin
Gender (Male, Female,...)
Country of birth (Argentina, Brazil, USA,...)
Zipcode (10005, 98195,...)
Numeric value, but should be interpreted as category
(98195 not about 9x larger than 10005)
How do we multiply category by coefficient??? Must convert categorical inputs into numeric features
![Page 58: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/58.jpg)
Machine Learning Specialization 71
Encoding categories as numeric features
©2015-2016 Emily Fox & Carlos Guestrin
Country of birth (Argentina, Brazil, USA,...)
x =
196 categories
1-hot encoding x h1(x) h2(x) … h195(x) h196(x)
Brazil
Zimbabwe
196 features
10,000 words in vocabulary
Bag of words
x h1(x) h2(x) … h9999(x) h10000(x)
10,000 features
Restaurant review (Text data)
x =
![Page 59: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/59.jpg)
Machine Learning Specialization
Multiclass classification using 1 versus all
73 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 60: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/60.jpg)
Machine Learning Specialization 74
Multiclass classification
©2015-2016 Emily Fox & Carlos Guestrin
Input: x Image pixels
Output: y Object in image
![Page 61: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/61.jpg)
Machine Learning Specialization 75
Multiclass classification formulation
• C possible classes: - y can be 1, 2,…, C
• N datapoints:
©2015-2016 Emily Fox & Carlos Guestrin
Data point
x[1] x[2] y
x1,y1 2 1
x2,y2 0 2
x3,y3 3 3
x4,y4 4 1
Learn:
P(y= |x) ⌃
P(y= |x) ⌃
P(y= |x) ⌃
![Page 62: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/62.jpg)
Machine Learning Specialization 77
1 versus all: Estimate using 2-class model
©2015-2016 Emily Fox & Carlos Guestrin
Predict:
Train classifier:
+1 class: points with yi= -1 class: points with yi= OR
P(y= |x) ⌃
P (y=+1|x) ⌃
P(y= |xi)= ⌃
P (y=+1|xi) ⌃
![Page 63: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/63.jpg)
Machine Learning Specialization 78
1 versus all: simple multiclass classification using C 2-class models
©2015-2016 Emily Fox & Carlos Guestrin
P(y= |xi) = ⌃
P(y= |xi) = ⌃
P(y= |xi) = ⌃
![Page 64: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/64.jpg)
Machine Learning Specialization 80
= estimate of 1 vs all model for each class
max_prob = 0; ŷ = 0
For c = 1,…,C:
If > max_prob:
ŷ = c
max_prob =
©2015-2016 Emily Fox & Carlos Guestrin
Input: xi
Multiclass training
Pc(y=+1|x) ⌃
Predict most likely class
Pc(y=+1|xi) ⌃
Pc(y=+1|xi) ⌃
![Page 65: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/65.jpg)
Machine Learning Specialization
Summary of logistic regression classifier
81 ©2015-2016 Emily Fox & Carlos Guestrin
![Page 66: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/66.jpg)
Machine Learning Specialization 83 ©2015-2016 Emily Fox & Carlos Guestrin
Training Data
Feature extraction
ML model
Quality metric
ML algorithm
y ŵ
h(x) x
P(y=+1|x,ŵ) = 1 .
⌃
1 + e-ŵ h(x)T
![Page 67: C3.1.logistic intro](https://reader031.fdocuments.us/reader031/viewer/2022022203/58729b411a28ab07208b49f3/html5/thumbnails/67.jpg)
Machine Learning Specialization 85
What you can do now… • Describe decision boundaries and linear
classifiers • Use class probability to express degree of
confidence in prediction • Define a logistic regression model • Interpret logistic regression outputs as class
probabilities • Describe impact of coefficient values on
logistic regression output • Use 1-hot encoding to represent categorical
inputs • Perform multiclass classification using the
1-versus-all approach ©2015-2016 Emily Fox & Carlos Guestrin