Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4...

63
Maximum Entropy Models/ Logistic Regression CMSC 678 UMBC February 7 th , 2018

Transcript of Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4...

Page 1: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Maximum Entropy Models/Logistic Regression

CMSC 678UMBC

February 7th, 2018

Page 2: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Announcements: Assignment 2

Due 11:59 AM, Monday 3/5

Topic: classification and neural models

Out: by Friday 2/9

Page 3: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Announcements: Course Project

Teams of 1-4

Three “checkins”:Proposal: 3/14Update: 4/9Final submission: 5/23

Some novel aspect is neededEx 1: reimplement existing technique and apply to new domainEx 2: explore novel technique on existing problemEx 3: explore novel task

If you run into compute issues, talk to course staff.

Options exist, e.g.,

https://aws.amazon.com/education/awseducate/

Page 4: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Recap from last time…

Page 5: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

A Simple Linear Regression Model: From ERM

loss function: ℓ = 𝑦𝑦𝑖𝑖 − �𝑦𝑦𝑖𝑖 2

argmin𝑤𝑤

�𝑖𝑖

𝑦𝑦𝑖𝑖 − 𝐰𝐰T𝑥𝑥𝑖𝑖 − 𝑏𝑏 2 =

min𝑤𝑤

𝒚𝒚 − 𝐗𝐗𝐰𝐰 − 𝒃𝒃 𝑇𝑇 𝒚𝒚 − 𝐗𝐗𝐰𝐰 − 𝒃𝒃

least squares estimation

ESL, Fig 3.1

Page 6: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

A Simple Linear Model for Classification

𝑦𝑦𝑖𝑖 = 𝐰𝐰𝐓𝐓𝐱𝐱𝐢𝐢

data point xi, as a vector of features

label yi, (WLOG, binary {0, 1} value)

vector w of weights

loss function: ℓ = �1, 𝑦𝑦𝑖𝑖𝐰𝐰𝐓𝐓𝐱𝐱𝐢𝐢 < 00, 𝑦𝑦𝑖𝑖𝐰𝐰𝐓𝐓𝐱𝐱𝐢𝐢 ≥ 0

= 1[𝑦𝑦𝑖𝑖𝐰𝐰𝐓𝐓𝐱𝐱𝐢𝐢 < 0]

predict 1

predict 0b = 0

�𝑦𝑦𝑖𝑖

𝐰𝐰𝐓𝐓𝐱𝐱𝐢𝐢

Page 7: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Loss Function Example: 0-1 Loss

ℓ 𝑦𝑦, �𝑦𝑦 = �0, if 𝑦𝑦 = �𝑦𝑦1, if 𝑦𝑦 ≠ �𝑦𝑦

Problem: not differentiable wrt �𝑦𝑦 (or θ)

Solution 3: is the data linearly separable? Perceptron can work

Solution 1: is h(x) a conditional distribution p(y | x)? Use MAP

Solution 2: use a surrogate loss that approximates 0-1

Page 8: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Classification Evaluation:Accuracy, Precision, and Recall

Accuracy: % of items correct

Precision: % of selected items that are correct

Recall: % of correct items that are selected

Actually Correct Actually IncorrectSelected/Guessed True Positive (TP) False Positive (FP)

Not select/not guessed False Negative (FN) True Negative (TN)

TPTP + FP

TPTP + FN

TP + TNTP + FP + FN + TN

Page 9: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Maximum Entropy (Log-linear) Models

classify in one go

𝑝𝑝 𝑥𝑥 𝑦𝑦) ∝ exp(𝜃𝜃 ⋅ 𝑓𝑓 𝑥𝑥,𝑦𝑦 )

Page 10: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Document Classification

ATTACKThree people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

Observed document Label

Page 11: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

Document Classification

ATTACK• # killed:• Type:• Perp:

shot ATTACK

Page 12: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Document Classification

ATTACKThree people have beenfatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

Page 13: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Document Classification

ATTACKThree people have beenfatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

Page 14: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Document Classification

ATTACKThree people have beenfatally shot, and five people, including a mayor, were seriously woundedas a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

Page 15: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Document Classification

ATTACKThree people have beenfatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

Page 16: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Document Classification

ATTACKThree people have beenfatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

Page 17: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

ATTACKThree people have beenfatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region.

We need to score the different combinations.

Page 18: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Score and Combine Our Possibilities

score1(fatally shot, ATTACK)

score2(seriously wounded, ATTACK)score3(Shining Path, ATTACK)

COMBINE posterior probability of

ATTACK

are all of these uncorrelated?

…scorek(department, ATTACK)

Page 19: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Score and Combine Our Possibilities

score1(fatally shot, ATTACK)

score2(seriously wounded, ATTACK)score3(Shining Path, ATTACK)

COMBINE posterior probability of

ATTACK

Q: What are the score and combine functions for Naïve

Bayes?

Page 20: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Scoring Our PossibilitiesThree people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .

score( , ) =ATTACK

score1(fatally shot, ATTACK)

score2(seriously wounded, ATTACK)score3(Shining Path, ATTACK)

Page 21: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .

p( | )∝ATTACK

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .

SNAP(score( , ))ATTACK

Maxent Modeling

Page 22: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

What function…

operates on any real number?

is never less than 0?

Page 23: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

What function…

operates on any real number?

is never less than 0?

f(x) = exp(x)

Page 24: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .

exp(score( , ))ATTACK

Maxent ModelingThree people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .

p( | )∝ATTACK

Page 25: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( ))score1(fatally shot, ATTACK)

score2(seriously wounded, ATTACK)score3(Shining Path, ATTACK)

Maxent ModelingThree people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .

p( | )∝ATTACK

Page 26: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( ))score1(fatally shot, ATTACK)

score2(seriously wounded, ATTACK)score3(Shining Path, ATTACK)

Maxent Modeling

Learn the scores (but we’ll declare what combinations should be looked at)

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .

p( | )∝ATTACK

Page 27: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( ))weight1 * applies1(fatally shot, ATTACK)

weight2 * applies2(seriously wounded, ATTACK)weight3 * applies3(Shining Path, ATTACK)

Maxent ModelingThree people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .

p( | )∝ATTACK

Page 28: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region .

p( | ) =ATTACK

exp( ))weight1 * applies1(fatally shot, ATTACK)

weight2 * applies2(seriously wounded, ATTACK)weight3 * applies3(Shining Path, ATTACK)

Maxent Modeling

1Z

Q: How do we define Z?

Page 29: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( )weight1 * applies1(fatally shot, X)

weight2 * applies2(seriously wounded, X)weight3 * applies3(Shining Path, X)

…Σ

label x

Z =Normalization for Classification

𝑝𝑝 𝑥𝑥 𝑦𝑦) ∝ exp(𝜃𝜃 ⋅ 𝑓𝑓 𝑥𝑥, 𝑦𝑦 ) classify doc y with label x in one go

Page 30: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Q: What if none of our features apply?

Page 31: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Guiding Principle for Log-Linear Models

“[The log-linear estimate]is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information.”Edwin T. Jaynes, 1957

Page 32: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Guiding Principle for Log-Linear Models

“[The log-linear estimate]is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information.”Edwin T. Jaynes, 1957

exp(θ· f) exp(θ· 0) = 1

Page 33: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( ))θ1 * f1(fatally shot, ATTACK)

θ 2 * f2(seriously wounded, ATTACK)θ 3 * f 3(Shining Path, ATTACK)

Easier-to-write form

Page 34: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( ))θ1 * f1(fatally shot, ATTACK)

θ 2 * f2(seriously wounded, ATTACK)θ 3 * f 3(Shining Path, ATTACK)

Easier-to-write form

K weights

K features

Page 35: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( )θ · f (doc, ATTACK)

Easier-to-write form

K-dimensional weight vector

K-dimensional feature vector

dot product

Page 36: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Linear Models

𝑝𝑝𝜃𝜃 𝑥𝑥 𝑦𝑦) ∝ exp(𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥,𝑦𝑦 )

Page 37: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Linear Models

Feature function(s)Sufficient statistics

“Strength” function(s)

𝑝𝑝𝜃𝜃 𝑥𝑥 𝑦𝑦) ∝ exp(𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥,𝑦𝑦 )

Page 38: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Linear Models

Feature WeightsNatural parameters

Distribution Parameters

𝑝𝑝𝜃𝜃 𝑥𝑥 𝑦𝑦) ∝ exp(𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥,𝑦𝑦 )

Page 39: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

exp( )weight1 * applies1(fatally shot, X)

weight2 * applies2(seriously wounded, X)weight3 * applies3(Shining Path, X)

…Σ

label x

Z =Normalization for Classification

𝑝𝑝 𝑥𝑥 𝑦𝑦) ∝ exp(𝜃𝜃 ⋅ 𝑓𝑓 𝑥𝑥, 𝑦𝑦 ) classify doc y with label x in one go

Page 40: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Linear Models

𝑝𝑝𝜃𝜃 𝑥𝑥 𝑦𝑦) =exp(𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥,𝑦𝑦 )

∑𝑥𝑥′ exp(𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥′,𝑦𝑦 )

Page 41: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

https://www.csee.umbc.edu/courses/graduate/678/spring18/loglin-tutorial

https://goo.gl/SsXQS2

Lesson 1

Page 42: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Connections to Other Techniques

Log-Linear Models

Page 43: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Connections to Other Techniques

Log-Linear Models(Multinomial) logistic regressionSoftmax regression

as statistical regression

Page 44: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

“Solution” 1: A Simple Probabilistic (Linear*) Classifierloss function:

ℓ = 1[𝑦𝑦𝑖𝑖𝑝𝑝 �𝑦𝑦𝑖𝑖 = 1 𝑥𝑥𝑖𝑖 < 0]

turn responses into probabilities

min𝐰𝐰

�𝑖𝑖

𝔼𝔼�𝑦𝑦𝑖𝑖[1 𝑦𝑦𝑖𝑖𝑝𝑝 �𝑦𝑦𝑖𝑖 = 1 𝑥𝑥𝑖𝑖 < 0 ] =

minimize posterior 0-1 loss:

max𝐰𝐰

�𝑖𝑖

𝑝𝑝 �𝑦𝑦𝑖𝑖 = 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖

why MAP classifiers are

reasonable

decision rule:

�𝑦𝑦𝑖𝑖 = �0, 𝜎𝜎(𝐰𝐰𝐓𝐓𝐱𝐱𝐢𝐢 + 𝑏𝑏) < .51, 𝜎𝜎(𝐰𝐰𝐓𝐓𝐱𝐱𝐢𝐢 + 𝑏𝑏) ≥ .5

Plot from https://towardsdatascience.com/multi-layer-neural-networks-with-sigmoid-function-deep-learning-for-rookies-2-bf464f09eb7f *linear not strictly required

Page 45: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Connections to Other Techniques

Log-Linear Models(Multinomial) logistic regressionSoftmax regressionMaximum Entropy models (MaxEnt)

as statistical regression

based in information theory

Page 46: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Connections to Other Techniques

Log-Linear Models(Multinomial) logistic regressionSoftmax regressionMaximum Entropy models (MaxEnt)Generalized Linear Models

as statistical regression

a form of

based in information theory

Page 47: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Connections to Other Techniques

Log-Linear Models(Multinomial) logistic regressionSoftmax regressionMaximum Entropy models (MaxEnt)Generalized Linear ModelsDiscriminative Naïve Bayes

as statistical regression

a form of

viewed as

based in information theory

Page 48: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Connections to Other Techniques

Log-Linear Models(Multinomial) logistic regressionSoftmax regressionMaximum Entropy models (MaxEnt)Generalized Linear ModelsDiscriminative Naïve BayesVery shallow (sigmoidal) neural nets

as statistical regression

a form of

viewed as

based in information theory

to be cool today :)

Page 49: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Objective = Full Likelihood?

These values can have very small magnitude underflow

Differentiating this product could be a pain

�𝑖𝑖

𝑝𝑝𝜃𝜃 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 ∝�𝑖𝑖

exp(𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥𝑖𝑖 ,𝑦𝑦𝑖𝑖 )

Page 50: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-LikelihoodWide range of (negative) numbers

Sums are more stable

Differentiating this becomes nicer (even

though Z depends on θ)

log�𝑖𝑖

𝑝𝑝𝜃𝜃 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 = �𝑖𝑖

log𝑝𝑝𝜃𝜃(𝑥𝑥𝑖𝑖|𝑦𝑦𝑖𝑖)

= �𝑖𝑖

𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥𝑖𝑖 ,𝑦𝑦𝑖𝑖 − log𝑍𝑍(𝑦𝑦𝑖𝑖)

Page 51: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient

Each component k is the difference between:

Page 52: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient

Each component k is the difference between:

the total value of feature fk in the training data

Page 53: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient

Each component k is the difference between:

the total value of feature fk in the training data

and

the total value the current model pθthinks it computes for feature fk

“Moment Matching” A1 Q2 &A1 Q3, Eq-1 (what were the feature functions)?

�𝑖𝑖

𝔼𝔼𝑝𝑝[𝑓𝑓(𝑥𝑥′,𝑦𝑦𝑖𝑖)

Page 54: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

https://www.csee.umbc.edu/courses/graduate/678/spring18/loglin-tutorial

https://goo.gl/SsXQS2

Lesson 6

Page 55: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient Derivation

𝑦𝑦𝑖𝑖

Page 56: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient Derivation

depends on θ

𝑍𝑍 𝑦𝑦𝑖𝑖 = �𝑥𝑥′

exp(𝜃𝜃 ⋅ 𝑓𝑓 𝑥𝑥′,𝑦𝑦𝑖𝑖 )

𝑦𝑦𝑖𝑖

Page 57: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient Derivation

𝜕𝜕𝜕𝜕𝜃𝜃

log𝑔𝑔(ℎ 𝜃𝜃 ) =𝜕𝜕𝑔𝑔

𝜕𝜕ℎ(𝜃𝜃)𝜕𝜕ℎ𝜕𝜕𝜃𝜃

use the (calculus) chain rulescalar p(y’ | xi)

vector of functions

𝑦𝑦𝑖𝑖

�𝑖𝑖

�𝑥𝑥′

exp 𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥′,𝑦𝑦𝑖𝑖𝑍𝑍 𝑦𝑦𝑖𝑖

𝑓𝑓(𝑥𝑥′,𝑦𝑦𝑖𝑖)

Page 58: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient Derivation

𝜕𝜕𝜕𝜕𝜃𝜃

log𝑔𝑔(ℎ 𝜃𝜃 ) =𝜕𝜕𝑔𝑔

𝜕𝜕ℎ(𝜃𝜃)𝜕𝜕ℎ𝜕𝜕𝜃𝜃

use the (calculus) chain rulescalar p(x’ | yi)

vector of functions

𝑦𝑦𝑖𝑖

�𝑖𝑖

�𝑥𝑥′

exp 𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥′,𝑦𝑦𝑖𝑖𝑍𝑍 𝑦𝑦𝑖𝑖

𝑓𝑓(𝑥𝑥′,𝑦𝑦𝑖𝑖)

Page 59: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Log-Likelihood Gradient Derivation

𝑦𝑦𝑖𝑖

�𝑖𝑖

�𝑥𝑥′

exp 𝜃𝜃𝑇𝑇𝑓𝑓 𝑥𝑥′,𝑦𝑦𝑖𝑖𝑍𝑍 𝑦𝑦𝑖𝑖

𝑓𝑓(𝑥𝑥′,𝑦𝑦𝑖𝑖)

Do we want these to fully match?

What does it mean if they do?

What if we have missing values in our data?

Page 60: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

https://www.csee.umbc.edu/courses/graduate/678/spring18/loglin-tutorial

https://goo.gl/SsXQS2

Lesson 8---Regularization

Page 61: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Understanding Conditioning

𝑝𝑝 𝑥𝑥 𝑦𝑦) ∝ exp(𝜃𝜃 ⋅ 𝑓𝑓 y )

Is this a good posterior classifier? (no)

Page 62: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

https://www.csee.umbc.edu/courses/graduate/678/spring18/loglin-tutorial

https://goo.gl/SsXQS2

Lesson 11

Page 63: Maximum Entropy Models/ Logistic Regression€¦ · Announcements: Course Project Teams of 1-4 Three “checkins”: Proposal: 3/14. Update: 4/9. Final submission: 5/23. Some novel

Connections to Other Techniques

Log-Linear Models(Multinomial) logistic regressionSoftmax regressionMaximum Entropy models (MaxEnt)Generalized Linear ModelsDiscriminative Naïve BayesVery shallow (sigmoidal) neural nets

as statistical regression

a form of

viewed as

based in information theory

to be cool today :)