Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model ...
-
date post
21-Dec-2015 -
Category
Documents
-
view
222 -
download
1
Transcript of Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model ...
![Page 1: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/1.jpg)
Generative Models
Rong Jin
![Page 2: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/2.jpg)
Statistical Inference
Training Examples
1 2{ , ,..., }nx x x
Learning a Statistical Model
Prediction
p(x;)
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.10
1
2
3
4
5
6
7
8
9
10
Heigth
Num
ber
of P
eopl
e Female: Gaussian distribution N(1,1)
Male: Gaussian distribution N(2,2)
Pr(male|1.67m)
Pr(female|1.67m)
![Page 3: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/3.jpg)
Statistical Inference
Training Examples
1 2{ , ,..., }nx x x
Learning a Statistical Model
Prediction
p(y|x;)
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.10
1
2
3
4
5
6
7
8
9
10
Heigth
Num
ber
of P
eopl
e Male: Gaussian distribution N(1,1)
Female: Gaussian distribution N(2,2)
Pr(male|1.67m)
Pr(female|1.67m)
![Page 4: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/4.jpg)
Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example
using maximum likelihood approach The class of a new instance is predicted by
1
,n
i i ix y
( | ; )p y x
* arg max ( | ; )y
y p y x
Y
x
![Page 5: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/5.jpg)
Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example
using maximum likelihood approach The class of a new instance is predicted by
1
,n
i i ix y
( | ; )p y x
* arg max ( | ; )y
y p y x
Y
x
![Page 6: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/6.jpg)
Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example
using maximum likelihood approach The class of a new instance is predicted by
1
,n
i i ix y
( | ; )p y x
* arg max ( | ; )y
y p y x
Y
x
![Page 7: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/7.jpg)
Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example
using the maximum likelihood approach The class of a new instance is predicted by
1
,n
i i ix y
( | ; )p y x
* arg max ( | ; )y
y p y x
Y
x
![Page 8: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/8.jpg)
Maximum Likelihood Estimation (MLE) Given training example Compute log-likelihood of data
Find the parameters that maximizes the log-likelihood
In many case, the expression for log-likelihood is not closed form and therefore MLE requires numerical calculation
1( ) log ( | ; )
ntrain i ii
l D p y x
*1
max ( ) log ( | ; )n
train i iil D p y x
1 1 2 2, , , ,..., ,n nx y x y x y
![Page 9: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/9.jpg)
Maximum Likelihood Estimation (MLE) Given training example Compute log-likelihood of data
Find the parameters that maximizes the log-likelihood
In many case, the expression for log-likelihood is not closed form and therefore MLE requires numerical calculation
1( ) log ( | ; )
ntrain i ii
l D p y x
*1
max ( ) log ( | ; )n
train i iil D p y x
1 1 2 2, , , ,..., ,n nx y x y x y
![Page 10: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/10.jpg)
Probabilistic Models for Classification Problems Apply statistical inference methods Given training example Assume a parametric model Learn the model parameters from training example
using the maximum likelihood approach The class of a new instance is predicted by
1
,n
i i ix y
( | ; )p y x
* arg max ( | ; )y
y p y x
Y
x
![Page 11: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/11.jpg)
Generative Models Most probabilistic distributions are joint distribution (i.e.,
p(x;)), not conditional distribution (i.e., p(y|x;))
Using Bayes rule
p(xly;) { p(y|x;); p(y;)}
( ; ) ( | ; )( | ; )
( , ; )
p y p x yp y x
p y x
![Page 12: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/12.jpg)
Generative Models Most probabilistic distributions are joint distribution (i.e.,
p(x;)), not conditional distribution (i.e., p(y|x;))
Using Bayes rule
p(y|x;) { p(x|y;); p(y;)}
( ; ) ( | ; )( | ; )
( ; )
p y p x yp y x
p x
![Page 13: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/13.jpg)
Generative Models (cont’d) Treatment of p(x|y;) Let yY={1, 2, …, c} Allocate a separate set of parameters for each class
{1, 2,…, c}
p(xly;) p(x;y) Data in different class have different input patterns
![Page 14: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/14.jpg)
Generative Models (cont’d) Parameter space
Parameters for distribution: {1, 2,…, c}
Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE
Compute log-likelihood
Search for the optimal parameters by maximizing the log-likelihood
1
1
( ) log ( | ; )
log ( | ) log ( ) log ( | )i i
ntrain i ii
ni y i i yi
l D p y x
p x p y p x
1max ( ) max log ( ) ( | )
i
ntrain i i yi
l D p y p x
![Page 15: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/15.jpg)
Generative Models (cont’d) Parameter space
Parameters for distribution: {1, 2,…, c}
Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE
Compute log-likelihood
Search for the optimal parameters by maximizing the log-likelihood
1
1
( ) log ( | ; )
log ( ; ) log ( ) log ( )i
ntrain i ii
ni y i ii
l D p y x
p x p y p x
1max ( ) max log ( ) ( ; )
i
ntrain i i yi
l D p y p x
( ; ) ( | ; )( | ; )
( ; )
p y p x yp y x
p x
![Page 16: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/16.jpg)
Generative Models (cont’d) Parameter space
Parameters for distribution: {1, 2,…, c}
Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE
Compute log-likelihood
Search for the optimal parameters by maximizing the log-likelihood
1
1
( ) log ( | ; )
log ( ; ) log ( ) log ( )i
ntrain i ii
ni y i ii
l D p y x
p x p y p x
1max ( ) max log ( ) ( ; )
i
ntrain i i yi
l D p y p x
( ; ) ( | ; )( | ; )
( ; )
p y p x yp y x
p x
![Page 17: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/17.jpg)
Generative Models (cont’d) Parameter space
Parameters for distribution: {1, 2,…, c}
Class priors: {p(y=1), p(y=2), …, p(y=c)} Learn parameters from training examples using MLE
Compute log-likelihood
Search for the optimal parameters by maximizing the log-likelihood
1
1
( ) log ( | ; )
log ( ; ) log ( ) log ( )i
ntrain i ii
ni y i ii
l D p y x
p x p y p x
1max ( ) max log ( ) ( ; )
i
ntrain i i yi
l D p y p x
![Page 18: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/18.jpg)
Example
• Task: predict gender of individuals based on their heights
• Given
• 100 height examples of women
• 100 height examples of man
• Assume height of women and man follow different Gaussian distributions
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
5
10
15
20
25
30
35
40
Empirical data for male
Empirical data for female
![Page 19: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/19.jpg)
Example (cont’d) Gaussian distribution
Parameter space Gaussian distribution for man: (m m)
Gaussian distribution for man: (w w)
Class priors: pm = p(y=man), pw = p(y=women)
1max ( ) max log ( ) ( | )
i
ntrain i i yi
l D p y p x
2
22
( )1( ) exp ,
22
xp x
![Page 20: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/20.jpg)
Example (cont’d) Gaussian distribution
Parameter space Gaussian distribution for male: (m, m)
Gaussian distribution for female: (f , f)
Class priors: pm = p(y=male), pf = p(y=female)
2
22
( )1( ) exp ,
22
xp x
![Page 21: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/21.jpg)
Example (cont’d)
1
1 1
2
2
2
1 2 1 2
log ( | )
log ( ; , ) log log ( ; , ) log
exp2
log2
Given training examples , ,..., ; , ,...,
m female
m f
m f
Nii
N N fmi m m m i f f fi i
mi m
m
m
f f fm m mN N
N N N
l p h y
p h p p h p
h
h h h h h h
2
2
1 1 2
exp2
log log log2
male male
fi f
fN N
m fi if
h
p p
![Page 22: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/22.jpg)
Example (cont’d)
1
1 1
2
2
1 2 1 2
log ( | ) log ( )
log ( ; , ) log log ( ; , ) log
exp2
log2
Given training examples , ,..., ; , ,...,
m f
m f
m f
Ni i ii
N N fmi m m m i f f fi i
mi m
m
f f fm m mN N
N N N
l p h y p y
p h p p h p
h
h h h h h h
2
2
1 12 2
exp2
log log log2
male male
fi f
fN N
m fi im f
h
p p
![Page 23: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/23.jpg)
Example (cont’d)
1
1 1
2
2
1 2 1 2
log ( | ) log ( )
log ( ; , ) log log ( ; , ) log
exp2
log2
Given training examples , ,..., ; , ,...,
m f
m f
m f
Ni i ii
N N fmi m m m i f f fi i
mi m
m
f f fm m mN N
N N N
l p h y p y
p h p p h p
h
h h h h h h
2
2
1 12 2
exp2
log log log2
m f
fi f
fN N
m fi im f
h
p p
![Page 24: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/24.jpg)
Learn a Gaussian generative model
Example (cont’d)
*
221 1
221 1
, , ; , , max
( ), ,
( ), ,
m m
f f
m m m f f f
N Nm mi i mi i m
m m mm m
N Nf fi i f fi i
f f ff f
p p l
h h Np
N N N
h h Np
N N N
![Page 25: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/25.jpg)
Learn a Gaussian generative model
Example (cont’d)
*
221 1
221 1
, , ; , , max
( ), ,
( ), ,
m m
f f
m m m f f f
N Nm mi i mi i m
m m mm m
N Nf fi i f fi i
f f ff f
p p l
h h Np
N N N
h h Np
N N N
![Page 26: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/26.jpg)
Example (cont’d)
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
5
10
15
20
25
30
35
40
Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female
![Page 27: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/27.jpg)
Predict the gender of an individual given his/her height
Example (cont’d)
2
22
2
22
( )( | ) ( | , ) exp
22
( )( | ) ( | , ) exp
22
m mm m m
mm
f ff f f
ff
p xp male h p p h
p xp female h p p h
![Page 28: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/28.jpg)
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
5
10
15
20
25
30
35
40
Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female
Decision boundary Decision boundary h*
Predict female when h<h* Predict male when h>h* Random when h=h*
Where is the decision boundary?
It depends on the ratio pm/pf
h*
![Page 29: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/29.jpg)
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
5
10
15
20
25
30
35
40
Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female
Example Decision boundary h*
Predict female when h<h* Predict male when h>h* Random when h=h*
Where is the decision boundary?
It depends on the ratio pm/pf
pf< pmpf> pm
![Page 30: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/30.jpg)
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20
5
10
15
20
25
30
35
40
Empirical data for maleFitted distributionfor maleEmpirical data for femaleFitted distribution for female
Example Decision boundary h*
Predict female when h<h* Predict male when h>h* Random when h=h*
Where is the decision boundary?
It depends on the ratio pm/pf
pf< pmpf> pm
![Page 31: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/31.jpg)
Gaussian Generative Model (II) Inputs contain multiple features Example
Task: predict if an individual is overweight based on his/her salary and the number of hours on watching TV
Input: (s: salary, h: hours for watching TV) Output: +1 (overweight), -1 (normal)
1 2, ,..., dx x x x
![Page 32: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/32.jpg)
Multi-variate Gaussian Distribution
1/ 2 1/ 2
1 2
1 21
1,1 1,
,
,1 ,
, ,
1 1( ; , ) exp
22 | |
Input : , ,...,
1mean : , ,...,
variance matrix :
1
Ty y d
d
N
d ik
d
i j d d
d d d
i j i i j j k i i k
p x x x
x x x x
xN
E x x x x x x xN
,1
1
1
N
j jk
N T
k
x
x x x xN
![Page 33: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/33.jpg)
Multi-variate Gaussian Distribution
1/ 2 1/ 2
1 2
1 21
1,1 1,
,
,1 ,
,
1 1( ; , ) exp
22 | |
Input : , ,...,
1mean : , ,...,
covariance matrix :
1
Ty y d
d
N
d ik
d
i j d d
d d d
i i j j i ii j k
p x x x
x x x x
xN
E x x x x x xN
1
1
1
Nj j
kk
N T
k kk
x x
x x x xN
![Page 34: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/34.jpg)
Multi-variate Gaussian Distribution
1/ 2 1/ 2
1 2
1 21
1,1 1,
,
,1 ,
,
1 1( ; , ) exp
22 | |
Input : , ,...,
1mean : , ,...,
covariance matrix :
1
Ty y d
d
N
d ik
d
i j d d
d d d
i i j j i ii j k
p x x x
x x x x
xN
E x x x x x xN
1
1
1
Nj j
kk
NT
k kk
x x
x xN
![Page 35: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/35.jpg)
Properties of Covariance Matrix
What if the number of data points N < d? How about for any vector ?
Positive semi-definitive matrix
1 21
1, , ,...,
NT
k k dk
x x x x x xN
Ta a
a
![Page 36: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/36.jpg)
Properties of Covariance Matrix
What if the number of data points N < d? How about for any ?
Positive semi-definitive matrix
Ta a
1 21
1, , ,...,
NT
k k dk
x x x x x xN
a
![Page 37: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/37.jpg)
Properties of Covariance Matrix
What if the number of data points N < d? How about for any ?
Positive semi-definitive matrix Number of different elements in ?
Ta a
1 21
1, , ,...,
NT
k k dk
x x x x x xN
a
![Page 38: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/38.jpg)
Joint distribution p(s,h) for salary (s) and hours for watching TV (h)
12/ 2 1/ 2
, ,1 1
, ,
, ,
2 2, , , ,
1 1
1 1( ; , ) exp
22 | |
Input : ,
1 1mean : , , ,
covariance matrix :
1 1,
Ty y
N N
s h s k s h k hk k
s s s h
h s h h
N N
s s k s s h h k h hk k
s
p x x x
x s h
x xN N
x x x xN N
, , , ,1
1 N
h h s k s s k h hk
x x x xN
Gaussian Generative Model (II)
![Page 39: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/39.jpg)
Joint distribution p(s,h) for salary (s) and hours for watching TV (h)
Gaussian Generative Model (II)
12/ 2 1/ 2
, ,1 1
, ,
, ,
2 2, , , ,
1 1
1 1( ; , ) exp
22 | |
Input : ,
1 1mean : , , ,
covariance matrix :
1 1,
Ty y
N N
s h s k s h k hk k
s s s h
h s h h
N N
s s k s s h h k h hk k
s
p x x x
x s h
x xN N
x x x xN N
, , , ,1
1 N
h h s k s s k h hk
x x x xN
![Page 40: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/40.jpg)
Multi-variate Gaussian Generative Model Input with multiple input features A multi-variate Gaussian distribution for each class
1
/ 2 1/ 2
( | ; ) ~ ( , )
1 1( | ; ) exp
22 | |
Overweight: ( , , ( overweight))
Normal: ( , , ( normal))
y y
T
y y ydy
o o o o
n n n n
p x y N
p x y x x
p p y
p p y
1 2, ,..., dx x x x
![Page 41: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/41.jpg)
Improve Multivariate Gaussian Model How could we improve the prediction of model for
overweight? Multiple modes for each class Introduce more attributes of individuals
Location Occupation The number of children House Age …
![Page 42: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/42.jpg)
Problems with Using Multi-variate Gaussian Generative Model
is a matrix of size dxd, contains d(d+1)/2 independent variables d=100: the number of variables in is 5,050 d=1000: the number of variables in is 505,000 A large parameter space
can be singular If N < d If two features are linear correlated -1 does not exist
1
/ 2 1/ 2
1 1( | ; ) exp
22 | |
T
y y ydy
p x y x x
![Page 43: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/43.jpg)
Problems with Using Multi-variate Gaussian Generative Model
Diagonalize
1
/ 2 1/ 2
1 1( | ; ) exp
22 | |
T
y y ydy
p x y x x
21
2
22,
1
0
0
1
d
N
i k i ik
x xN
21
1
2
0
0 d
![Page 44: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/44.jpg)
Problems with Using Multi-variate Gaussian Generative Model
Diagonalize
Feature independence assumption (Naïve Bayes assumption)
1
/ 2 1/ 2
1 1( | ; ) exp
22 | |
T
y y ydy
p x y x x
2
21/ 2 2
1
1 1( | ; ) exp
22
di i
did i
ii
xp x y
![Page 45: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/45.jpg)
Problems with Using Multi-variate Gaussian Generative Model
Diagonalize
Smooth the covariance matrix
1
/ 2 1/ 2
1 1( | ; ) exp
22 | |
T
y y ydy
p x y x x
2
21/ 2 2
1
1 1( | ; ) exp
22
di i
did i
ii
xp x y
, 0 is a smoothing parameterdI
![Page 46: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/46.jpg)
Overfitting Issue Complex model vs. insufficient training Example
Consider a classification problem of multiple inputs 100 input features 5 classes 1000 training examples
Total number parameters for a full Gaussian model is 5 class prior 5 parameters 5 means 500 parameters 5 covariance matrices 50,500 parameters 51,005 parameters insufficient training data
![Page 47: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/47.jpg)
Model Complexity Vs. Data
-6 -4 -2 0 2 4 6-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
![Page 48: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/48.jpg)
Model Complexity Vs. Data
-6 -4 -2 0 2 4 6-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
![Page 49: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/49.jpg)
Model Complexity Vs. Data
-6 -4 -2 0 2 4 6-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
![Page 50: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/50.jpg)
Model Complexity Vs. Data
-8 -6 -4 -2 0 2 4 6 8-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
![Page 51: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/51.jpg)
Problems with Using Multi-variate Gaussian Generative Model Diagonalize
Feature independence assumption
2
21/ 2 2
1
1 1( | ; ) exp
22
di i
did i
ii
xp x y
2
221 1
1 1( | ; ) exp ( | ; )
22
( | ; ) ~ ( , )
d di i i
i iii
ii i
xp x y p x y
p x y N
![Page 52: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/52.jpg)
Naïve Bayes Model In general, for any generative model, we have to
estimate For x in high dimension space, this probability is hard
to estimate In Naïve Bayes Model, we approximate
( | ; ) (or, ( | ))yp x y p x
( | ; )p x y
1
1 2
( | ; ) ( | ;; )
( | ;; ) ( | ;; )... ( | ;; )
d
ii
d
p x y p x y
p x y p x y p x y
![Page 53: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/53.jpg)
Naïve Bayes Model In general, for any generative model, we have to
estimate For x in high dimension space, this probability is hard
to estimate In Naïve Bayes Model, we approximate
( | ; ) (or, ( | ))yp x y p x
( | ; )p x y
1
1 2
( | ; ) ( | ;; )
( | ;; ) ( | ;; )... ( | ;; )
d
ii
d
p x y p x y
p x y p x y p x y
![Page 54: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/54.jpg)
Naïve Bayes Model In general, for any generative model, we have to
estimate For x in high dimension space, this probability is hard
to estimate In Naïve Bayes Model, we approximate
( | ; ) (or, ( | ))yp x y p x
( | ; )p x y
1
1 2
( | ; ) ( | ; )
( | ; ) ( | ; )... ( | ; )
di
i
d
p x y p x y
p x y p x y p x y
![Page 55: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/55.jpg)
Text Categorization Learn to classify text into predefined categories Input x: a document
Represented by a vector of words Example: {(president, 10), (bush, 2), (election, 5), …}
Output y: if the document is politics or not +1 for political document, -1 for not political document
![Page 56: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/56.jpg)
Text Categorization A generative model for text classification (TC)
Parameter space p(+) and p(-) p(doc|+;), p(doc|-;)
It is difficult to estimate both p(doc|+;), p(doc|-;) Typical vocabulary size ~ 100,000 Each document is a vector of 100,000 attributes ! Too many words in a document
A Naïve Bayes approach
( | ) ~ ( ) ( | )p y doc p y p doc y
![Page 57: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/57.jpg)
Text Classification A generative model for text classification (TC)
Parameter space p(+) and p(-) p(doc|+;), p(doc|-;)
It is difficult to estimate both p(doc|+;), p(doc|-;) Typical vocabulary size ~ 100,000 Each document is a vector of 100,000 attributes ! Too many words in a document
A Naïve Bayes approach
( | ) ~ ( ) ( | )p y doc p y p doc y
![Page 58: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/58.jpg)
Text Classification A generative model for text classification (TC)
Parameter space p(+) and p(-) p(doc|+;), p(doc|-;)
It is difficult to estimate both p(doc|+;), p(doc|-;) Typical vocabulary size ~ 100,000 Each document is a vector of 100,000 attributes ! Too many words in a document
A Naïve Bayes approach
( | ) ~ ( ) ( | )p y doc p y p doc y
![Page 59: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/59.jpg)
Text Classification A Naïve Bayes approach For a document
1 21 2
1
( | ) ( | ) ( | ) ... ( | )
( | )
n
i
t t tn
n tii
p doc p w p w p w
p w
1 1 2 2, , , ,..., ,n ndoc w t w t w t
1 21 2
1
( | ) ( | ) ( | ) ... ( | )
( | )
n
i
t t tn
n tii
p doc p w p w p w
p w
![Page 60: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/60.jpg)
Text Classification The original parameter space
p(+) and p(-) p(doc|+;), p(doc|-;)
Parameter space after Naïve Bayes simplification p(+) and p(-) {p(w1|+), p(w2|+),…, p(wn|+)} {p(w1|-), p(w2|-),…, p(wn|-)}
![Page 61: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/61.jpg)
Text Classification Learning parameters from training examples
Each document
Learn parameters using maximum likelihood estimation
1 2 1 2 , ,..., ; , ,..., n n
N n n
d d d d d d
1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t
![Page 62: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/62.jpg)
Text Classification
,
,
1 1
1 1
1 1
,1 1
,1 1
log ( | ) log ( | )
log ( ) ( | )
log ( ) ( | )
log ( ) log ( | )
log ( ) log ( | )
i j
i j
n ni ii i
tnnji j
tnnji j
n ni j ji j
n ni j ji j
l p d p d
p p w
p p w
p t p w
p t p w
1 2 1 2 , ,..., ; , ,...,n n
N n n
d d d d d d
1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t
![Page 63: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/63.jpg)
Text Classification
,
,
1 1
1 1
1 1
,1 1
,1 1
log ( | ) log ( | )
log ( ) ( | )
log ( ) ( | )
log ( ) log ( | )
log ( ) log ( | )
i j
i j
n ni ii i
tnnji j
tnnji j
n ni j ji j
n ni j ji j
l p d p d
p p w
p p w
p t p w
p t p w
1 2 1 2 , ,..., ; , ,...,n n
N n n
d d d d d d
1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t
![Page 64: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/64.jpg)
Text Classification
,
,
1 1
1 1
1 1
,1 1
,1 1
log ( | ) log ( | )
log ( ) ( | )
log ( ) ( | )
log ( ) log ( | )
log ( ) log ( | )
i j
i j
n ni ii i
tnnji j
tnnji j
n ni j ji j
n ni j ji j
l p d p d
p p w
p p w
p t p w
p t p w
1 2 1 2 , ,..., ; , ,...,n n
N n n
d d d d d d
1 ,1 2 ,2 , = , , , ,..., ,i i i n i nd w t w t w t
![Page 65: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/65.jpg)
Text Classification
, ,1 1
, ,1 1 1 1
( ) , ( )
( | ) , ( | )
n ni j i ji i
j jn n n ni j i jj i j i
n np p
N N
t tp w p w
t t
The optimal solution that maximizes the likelihood of training data
![Page 66: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/66.jpg)
Text ClassificationTwenty Newsgroups An Example
![Page 67: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/67.jpg)
Text Classification Any problems with the Naïve Bayes text classifier? Unseen words
Word ‘w’ is unseen from the training documents, what is the consequence?
Word ‘w’ is only unseen for documents of one class, what is the consequence?
Related to the overfitting problem Any suggestion? Solution: word class approach
Introducing word class T= {t1, t2, …, tm} Compute p(ti|+), p(ti|-) When w is unseen before, replace p(w|) with p(ti|)
Introducing prior for word probabilities
![Page 68: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/68.jpg)
Naïve Bayes Model
This is a terrible approximation
1( | ; ) ( | ; )
d ii
p x y p x y
0 2 1,
0 1 2
0 2 0,
0 0 2
![Page 69: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/69.jpg)
Naïve Bayes Model Why use Naïve Bayes Model ? We are essentially interested in p(y|x;), not
p(x|y;)
' 1
' 1
( ; ) ( | ; ) ( ; ) ( | ; )( | ; )
( ; ) ( '; ) ( | '; )
1( '; ) ( | '; )
( ; ) ( | ; )
c
y
c
y
p y p x y p y p x yp y x
p x p y p x y
p y p x y
p y p x y
![Page 70: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/70.jpg)
Naïve Bayes Model Why use Naïve Bayes Model ? We are essentially interested in p(y|x;), not
p(x|y;)
' 1
' 1
( ; ) ( | ; ) ( ; ) ( | ; )( | ; )
( ; ) ( '; ) ( | '; )
1( '; ) ( | '; )
( ; ) ( | ; )
c
y
c
y
p y p x y p y p x yp y x
p x p y p x y
p y p x y
p y p x y
![Page 71: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/71.jpg)
Naïve Bayes Model Why use Naïve Bayes Model ? We are essentially interested in p(y|x;), not
p(x|y;)
' 1
' 1
( ; ) ( | ; ) ( ; ) ( | ; )( | ; )
( ; ) ( '; ) ( | '; )
1( '; ) ( | '; )
( ; ) ( | ; )
c
y
c
y
p y p x y p y p x yp y x
p x p y p x y
p y p x y
p y p x y
![Page 72: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/72.jpg)
Naïve Bayes Model The key for the prediction model is not p(x|
y;), but the ratio p(x|y;)/p(x|y’;)
Although Naïve Bayes model does a poor job for estimating p(x|y;), it does a reasonable good on estimating the ratio.
![Page 73: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/73.jpg)
The Ratio of Likelihood for Binary Classes Assume that both classes share the same variance
2 2
, ,
2 21
2 2
, ,
2 21 1
( 1) ( | 1)log
( 1) ( | 1)
( 1)log
( 1)
( 1)2 log
( 1)
i i i id
ii i
i ii im m
ii i ii
p y p x y
p y p x y
x xp y
p y
p yx
p y
![Page 74: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/74.jpg)
The Ratio of Likelihood for Binary Classes Assume that both classes share the same variance
2 2
, ,
2 21
2 2
, ,
2 21 1
( 1) ( | 1)log
( 1) ( | 1)
( 1)log
( 1)
( 1)2 log
( 1)
i i i id
ii i
i ii im m
ii i ii
p y p x y
p y p x y
x xp y
p y
p yx
p y
![Page 75: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/75.jpg)
The Ratio of Likelihood for Binary Classes Assume that both classes share the same variance
2 2
, ,
2 21
2 2
, ,
2 21 1
( 1) ( | 1)log
( 1) ( | 1)
( 1)log
( 1)
( 1)2 log
( 1)
i i i id
ii i
i ii im m
ii i ii
p y p x y
p y p x y
x xp y
p y
p yx
p y
Gaussian generative model is a linear model
![Page 76: Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(](https://reader036.fdocuments.us/reader036/viewer/2022081514/56649d5e5503460f94a3e077/html5/thumbnails/76.jpg)
Linear Decision Boundary Gaussian Generative Models == Finding a linear
decision boundary Why not directly estimate the decision boundary?