An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling?...
Transcript of An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling?...
![Page 1: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/1.jpg)
Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1
An Introduction to Probabilistic modelingOliver Stegle and Karsten Borgwardt
Machine Learning andComputational Biology Research Group,
Max Planck Institute for Biological Cybernetics andMax Planck Institute for Developmental Biology, Tübingen
![Page 2: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/2.jpg)
Motivation
Why probabilistic modeling?
I Inferences from data are intrinsically uncertain.
I Probability theory: model uncertainty instead of ignoring it!
I Applications: Machine learning, Data Mining, Pattern Recognition,etc.
I Goal of this part of the courseI Overview on probabilistic modelingI Key conceptsI Focus on Applications in Bioinformatics
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 1
![Page 3: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/3.jpg)
Motivation
Why probabilistic modeling?
I Inferences from data are intrinsically uncertain.
I Probability theory: model uncertainty instead of ignoring it!
I Applications: Machine learning, Data Mining, Pattern Recognition,etc.
I Goal of this part of the courseI Overview on probabilistic modelingI Key conceptsI Focus on Applications in Bioinformatics
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 1
![Page 4: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/4.jpg)
Motivation
Why probabilistic modeling?
I Inferences from data are intrinsically uncertain.
I Probability theory: model uncertainty instead of ignoring it!
I Applications: Machine learning, Data Mining, Pattern Recognition,etc.
I Goal of this part of the courseI Overview on probabilistic modelingI Key conceptsI Focus on Applications in Bioinformatics
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 1
![Page 5: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/5.jpg)
Motivation
Further reading, useful material
I Christopher M. Bishop: Pattern Recognition and Machine learning.I Good background, covers most of the course material and much more!I Substantial parts of this tutorial borrow figures and ideas from this
book.
I David J.C. MacKay: Information Theory, Learning and InferenceI Very worth while reading, not quite the same quality of overlap with
the lecture synopsis.I Freely available online.
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 2
![Page 6: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/6.jpg)
Motivation
Lecture overview
1. An Introduction to probabilistic modeling
2. Applications: linear models, hypothesis testing
3. An introduction to Gaussian processes
4. Applications: time series, model comparison
5. Applications: continued
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 3
![Page 7: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/7.jpg)
Outline
Outline
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 4
![Page 8: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/8.jpg)
Prerequisites
Outline
Motivation
Prerequisites
Probability Theory
Parameter Inference for the Gaussian
Summary
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 5
![Page 9: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/9.jpg)
Prerequisites
Key conceptsData
I Let D denote a dataset, consisting of N datapointsD = { xn︸︷︷︸
Inputs
, yn︸︷︷︸Outputs
}Nn=1.
I Typical (this course)I x = {x1, . . . , xD} multivariate, spanning D features for each
observation (nodes in a graph, etc.).I y univariate (fitness, expression level etc.).
I Notation:I Scalars are printed as y.I Vectors are printed in bold: x.I Matrices are printed in capital
bold: Σ.
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 6
![Page 10: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/10.jpg)
Prerequisites
Key conceptsData
I Let D denote a dataset, consisting of N datapointsD = { xn︸︷︷︸
Inputs
, yn︸︷︷︸Outputs
}Nn=1.
I Typical (this course)I x = {x1, . . . , xD} multivariate, spanning D features for each
observation (nodes in a graph, etc.).I y univariate (fitness, expression level etc.).
I Notation:I Scalars are printed as y.I Vectors are printed in bold: x.I Matrices are printed in capital
bold: Σ.
X
Y
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 6
![Page 11: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/11.jpg)
Prerequisites
Key conceptsData
I Let D denote a dataset, consisting of N datapointsD = { xn︸︷︷︸
Inputs
, yn︸︷︷︸Outputs
}Nn=1.
I Typical (this course)I x = {x1, . . . , xD} multivariate, spanning D features for each
observation (nodes in a graph, etc.).I y univariate (fitness, expression level etc.).
I Notation:I Scalars are printed as y.I Vectors are printed in bold: x.I Matrices are printed in capital
bold: Σ.
X
Y
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 6
![Page 12: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/12.jpg)
Prerequisites
Key conceptsPredictions
I Observed dataset D = { xn︸︷︷︸Inputs
, yn︸︷︷︸Outputs
}Nn=1.
I Given D, what can we say about y? at an unseen test input x??
X
Y
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 7
![Page 13: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/13.jpg)
Prerequisites
Key conceptsPredictions
I Observed dataset D = { xn︸︷︷︸Inputs
, yn︸︷︷︸Outputs
}Nn=1.
I Given D, what can we say about y? at an unseen test input x??
X
Y
?
x*
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 7
![Page 14: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/14.jpg)
Prerequisites
Key conceptsModel
I Observed dataset D = { xn︸︷︷︸Inputs
, yn︸︷︷︸Outputs
}Nn=1.
I Given D, what can we say about y? at an unseen test input x??
I To make predictions we need to make assumptions.
I A model H encodes these assumptions and often depends on someparameters θ.
I Curve fitting: the model relatesx to y,
y = f(x |θ)
= θ0 + θ1 · x︸ ︷︷ ︸example, a linear model
X
Y
?
x*
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 8
![Page 15: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/15.jpg)
Prerequisites
Key conceptsModel
I Observed dataset D = { xn︸︷︷︸Inputs
, yn︸︷︷︸Outputs
}Nn=1.
I Given D, what can we say about y? at an unseen test input x??
I To make predictions we need to make assumptions.
I A model H encodes these assumptions and often depends on someparameters θ.
I Curve fitting: the model relatesx to y,
y = f(x |θ)
= θ0 + θ1 · x︸ ︷︷ ︸example, a linear model
X
Y
x*
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 8
![Page 16: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/16.jpg)
Prerequisites
Key conceptsUncertainty
I Virtually in all steps there is uncertaintyI Measurement uncertainty (D)I Parameter uncertainty (θ)I Uncertainty regarding the correct model (H)
I Uncertainty can occur in bothinputs and outputs.
I How to represent uncertainty?
X
Y
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 9
![Page 17: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/17.jpg)
Prerequisites
Key conceptsUncertainty
I Virtually in all steps there is uncertaintyI Measurement uncertainty (D)I Parameter uncertainty (θ)I Uncertainty regarding the correct model (H)
I Uncertainty can occur in bothinputs and outputs.
I How to represent uncertainty?
X
Y
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 9
![Page 18: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/18.jpg)
Prerequisites
Key conceptsUncertainty
I Virtually in all steps there is uncertaintyI Measurement uncertainty (D)I Parameter uncertainty (θ)I Uncertainty regarding the correct model (H)
Measurement uncertainty
I Uncertainty can occur in bothinputs and outputs.
I How to represent uncertainty?
X
Y
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 9
![Page 19: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/19.jpg)
Probability Theory
Outline
Motivation
Prerequisites
Probability Theory
Parameter Inference for the Gaussian
Summary
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 10
![Page 20: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/20.jpg)
Probability Theory
Probabilities
I Let X be a random variable, defined over a set X or measurablespace.
I P (X = x) denotes the probability that X takes value x, short p(x).I Probabilities are positive, P (X = x) ≥ 0I Probabilities sum to one∫
x∈Xp(x)dx = 1
∑x∈X
p(x) = 1
I Special case: no uncertainty p(x) = δ(x− x).
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 11
![Page 21: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/21.jpg)
Probability Theory
Probability Theory
Joint Probability
P (X = xi, Y = yj) =ni,jN
Marginal Probability
P (X = xi) =ciN
Conditional Probability
P (Y = yj |X = xi) =ni,jci
(C.M. Bishop, Pattern Recognition and Machine Learning)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 12
![Page 22: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/22.jpg)
Probability Theory
Probability Theory
Product Rule
P (X = xi, Y = yj) =ni,jN
=ni,jci· ciN
= P (Y = yj |X = xi)P (X = xi)
Marginal Probability
P (X = xi) =ciN
Conditional Probability
P (Y = yj |X = xi) =ni,jci
(C.M. Bishop, Pattern Recognition and Machine Learning)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 12
![Page 23: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/23.jpg)
Probability Theory
Probability Theory
Product Rule
P (X = xi, Y = yj) =ni,jN
=ni,jci· ciN
= P (Y = yj |X = xi)P (X = xi)
Sum Rule
P (X = xi) =ciN
=1
N
L∑j=1
ni,j
=∑j
P (X = xi, Y = yj)
(C.M. Bishop, Pattern Recognition and Machine Learning)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 12
![Page 24: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/24.jpg)
Probability Theory
The Rules of Probability
Sum & Product Rule
Sum Rule p(x) =∑
y p(x, y)
Product Rule p(x, y) = p(y |x)p(x)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 13
![Page 25: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/25.jpg)
Probability Theory
The Rules of Probability
Bayes Theorem
I Using the product rule we obtain
p(y |x) =p(x | y)p(y)
p(x)
p(x) =∑y
p(x | y)p(y)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 14
![Page 26: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/26.jpg)
Probability Theory
Bayesian probability calculus
I Bayes rule is the basis for inference and learning.
I Assume we have a model with parameters θ,e.g.
y = θ0 + θ1 · x X
Y
x*
I Goal: learn parameters θ given Data D.
p(θ | D) =p(D |θ) p(θ)
p(D)
I Posterior
I Likelihood
I Prior
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 15
![Page 27: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/27.jpg)
Probability Theory
Bayesian probability calculus
I Bayes rule is the basis for inference and learning.
I Assume we have a model with parameters θ,e.g.
y = θ0 + θ1 · x X
Y
x*
I Goal: learn parameters θ given Data D.
p(θ | D) =p(D |θ) p(θ)
p(D)
posterior ∝ likelihood · prior
I Posterior
I Likelihood
I Prior
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 15
![Page 28: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/28.jpg)
Probability Theory
Information and Entropy
I Information is the reduction of uncertainty.I Entropy H(X) is the quantitative description of uncertainty
I H(X) = 0: certainty about X.I H(X) maximal if all possibilities are equal probable.I Uncertainty and information are additive.
I These conditions are fulfilled by the entropy function:
H(X) = −∑x∈X
P (X = x) logP (X = x)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 16
![Page 29: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/29.jpg)
Probability Theory
Information and Entropy
I Information is the reduction of uncertainty.I Entropy H(X) is the quantitative description of uncertainty
I H(X) = 0: certainty about X.I H(X) maximal if all possibilities are equal probable.I Uncertainty and information are additive.
I These conditions are fulfilled by the entropy function:
H(X) = −∑x∈X
P (X = x) logP (X = x)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 16
![Page 30: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/30.jpg)
Probability Theory
Definitions related to entropy and information
I Entropy is the average surprise
H(X) =∑x∈X
P (X = x) (− logP (X = x))︸ ︷︷ ︸surprise
I Conditional entropy
H(X |Y ) = −∑
x∈X ,y∈YP (X = x, Y = y) logP (X = x |Y = y)
I Mutual information
I(X : Y ) = H(X)−H(X |Y ) = H(Y )−H(Y |X)
H(X) +H(Y )−H(X,Y )
I Independence of X and Y , p(x, y) = p(x)p(y).
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 17
![Page 31: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/31.jpg)
Probability Theory
Definitions related to entropy and information
I Entropy is the average surprise
H(X) =∑x∈X
P (X = x) (− logP (X = x))︸ ︷︷ ︸surprise
I Conditional entropy
H(X |Y ) = −∑
x∈X ,y∈YP (X = x, Y = y) logP (X = x |Y = y)
I Mutual information
I(X : Y ) = H(X)−H(X |Y ) = H(Y )−H(Y |X)
H(X) +H(Y )−H(X,Y )
I Independence of X and Y , p(x, y) = p(x)p(y).
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 17
![Page 32: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/32.jpg)
Probability Theory
Definitions related to entropy and information
I Entropy is the average surprise
H(X) =∑x∈X
P (X = x) (− logP (X = x))︸ ︷︷ ︸surprise
I Conditional entropy
H(X |Y ) = −∑
x∈X ,y∈YP (X = x, Y = y) logP (X = x |Y = y)
I Mutual information
I(X : Y ) = H(X)−H(X |Y ) = H(Y )−H(Y |X)
H(X) +H(Y )−H(X,Y )
I Independence of X and Y , p(x, y) = p(x)p(y).
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 17
![Page 33: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/33.jpg)
Probability Theory
Definitions related to entropy and information
I Entropy is the average surprise
H(X) =∑x∈X
P (X = x) (− logP (X = x))︸ ︷︷ ︸surprise
I Conditional entropy
H(X |Y ) = −∑
x∈X ,y∈YP (X = x, Y = y) logP (X = x |Y = y)
I Mutual information
I(X : Y ) = H(X)−H(X |Y ) = H(Y )−H(Y |X)
H(X) +H(Y )−H(X,Y )
I Independence of X and Y , p(x, y) = p(x)p(y).
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 17
![Page 34: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/34.jpg)
Probability Theory
Entropy in actionThe optimal weighting problem
I Given 12 balls, all equal except for one that is lighter or heavier.I What is the ideal weighting strategy and how many weightings are
needed to identify the odd ball?
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 18
![Page 35: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/35.jpg)
Probability Theory
Probability distributions
I Gaussian
p(x |µ, σ2) = N (x |µ, σ) =1√
2πσ2e−
12σ2
(x−µ)2
−6 −4 −2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
I Multivariate Gaussian
p(x |µ,Σ) = N (x |µ,Σ)
=1√|2πΣ|
exp
[−1
2(x− µ)TΣ−1(x− µ)
]
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 19
![Page 36: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/36.jpg)
Probability Theory
Probability distributions
I Gaussian
p(x |µ, σ2) = N (x |µ, σ) =1√
2πσ2e−
12σ2
(x−µ)2
−6 −4 −2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
I Multivariate Gaussian
p(x |µ,Σ) = N (x |µ,Σ)
=1√|2πΣ|
exp
[−1
2(x− µ)TΣ−1(x− µ)
]−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Σ =1 0.80.8 1
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 19
![Page 37: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/37.jpg)
Probability Theory
Probability distributionscontinued...
I Bernoulli
p(x | θ) = θx(1− θ)1−xI Gamma
p(x | a, b) =ba
Γ(a)xa−1e−bx
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 20
![Page 38: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/38.jpg)
Probability Theory
Probability distributionscontinued...
I Bernoulli
p(x | θ) = θx(1− θ)1−xI Gamma
p(x | a, b) =ba
Γ(a)xa−1e−bx
0 1 2 3 4 5x
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
p(x|a
=1,b
=1)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 20
![Page 39: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/39.jpg)
Probability Theory
Probability distributionsThe Gaussian revisited
I Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
I Positive: N(x∣∣µ, σ2
)> 0
I Normalized:∫ +∞
−∞N (x |µ, σ) dx = 1 (check)
I Expectation:
< x >=
∫ +∞
−∞N(x∣∣µ, σ2
)xdx = µ
I Variance: Var[x] =< x2 > − < x >2
= µ2 + σ2 − µ2 = σ2
−6 −4 −2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 21
![Page 40: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/40.jpg)
Probability Theory
Probability distributionsThe Gaussian revisited
I Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
I Positive: N(x∣∣µ, σ2
)> 0
I Normalized:∫ +∞
−∞N (x |µ, σ) dx = 1 (check)
I Expectation:
< x >=
∫ +∞
−∞N(x∣∣µ, σ2
)xdx = µ
I Variance: Var[x] =< x2 > − < x >2
= µ2 + σ2 − µ2 = σ2
− 6 − 4 − 2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 21
![Page 41: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/41.jpg)
Parameter Inference for the Gaussian
Outline
Motivation
Prerequisites
Probability Theory
Parameter Inference for the Gaussian
Summary
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 22
![Page 42: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/42.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianIngredients
I Data
D = {x1, . . . , xN}
I Model HGauss – Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
θ = {µ, σ2}
I Likelihood
p(D |θ) =
N∏n=1
N(xn∣∣µ, σ2)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 23
![Page 43: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/43.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianIngredients
I Data
D = {x1, . . . , xN}
I Model HGauss – Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
θ = {µ, σ2}
I Likelihood
p(D |θ) =
N∏n=1
N(xn∣∣µ, σ2)
− 6 − 4 − 2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 23
![Page 44: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/44.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianIngredients
I Data
D = {x1, . . . , xN}
I Model HGauss – Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
θ = {µ, σ2}
I Likelihood
p(D |θ) =
N∏n=1
N(xn∣∣µ, σ2)
x
p(x)
xn
N (xn|µ, σ2)
(C.M. Bishop, Pattern Recognition and Machine
Learning)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 23
![Page 45: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/45.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
I Likelihood
p(D |θ) =
N∏n=1
N(xn∣∣µ, σ2)
I Maximum likelihood
θ = argmaxθ
p(D |θ)
x
p(x)
xn
N (xn|µ, σ2)
(C.M. Bishop, Pattern Recognition and Machine
Learning)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 24
![Page 46: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/46.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
θ = argmaxθ
p(D |θ) = argmaxθ
N∏n=1
1√2πσ2
e−1
2σ2(xn−µ)2
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 25
![Page 47: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/47.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
θ = argmaxθ
ln p(D |θ) = argmaxθ
ln
N∏n=1
1√2πσ2
e−1
2σ2(xn−µ)2
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 25
![Page 48: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/48.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
θ = argmaxθ
ln p(D |θ) = argmaxθ
[−N
2ln(2π)− N
2lnσ2 − 1
2σ2
N∑n=1
(xn − µ)2
]
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 26
![Page 49: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/49.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
θ = argmaxθ
ln p(D |θ) = argmaxθ
[−N
2ln(2π)− N
2lnσ2 − 1
2σ2
N∑n=1
(xn − µ)2
]
µ :d
µln p(D |µ) = 0 σ2 :
d
σ2ln p(D |σ2) = 0
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 26
![Page 50: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/50.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 27
![Page 51: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/51.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
I Maximum likelihood solutions
µML =1
N
N∑n=1
xn
σ2ML =1
N
N∑n=1
(xn − µML)2
Equivalent to common mean and variance estimators (almost).I Maximum likelihood ignores parameter uncertainty
I Think of the ML solution for a single observed datapoint x1
µML1 = x1
σ2ML1 = (x1 − µML1)2 = 0
I How about Bayesian inference?
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 28
![Page 52: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/52.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
I Maximum likelihood solutions
µML =1
N
N∑n=1
xn
σ2ML =1
N
N∑n=1
(xn − µML)2
Equivalent to common mean and variance estimators (almost).I Maximum likelihood ignores parameter uncertainty
I Think of the ML solution for a single observed datapoint x1
µML1 = x1
σ2ML1 = (x1 − µML1)2 = 0
I How about Bayesian inference?
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 28
![Page 53: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/53.jpg)
Parameter Inference for the Gaussian
Inference for the GaussianMaximum likelihood
I Maximum likelihood solutions
µML =1
N
N∑n=1
xn
σ2ML =1
N
N∑n=1
(xn − µML)2
Equivalent to common mean and variance estimators (almost).I Maximum likelihood ignores parameter uncertainty
I Think of the ML solution for a single observed datapoint x1
µML1 = x1
σ2ML1 = (x1 − µML1)2 = 0
I How about Bayesian inference?
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 28
![Page 54: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/54.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the GaussianIngredients
I Data
D = {x1, . . . , xN}
I Model HGauss – Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
θ = {µ}I For simplicity: assume variance σ2 is
known.
I Likelihood
p(D |µ) =
N∏n=1
N(xn∣∣µ, σ2)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 29
![Page 55: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/55.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the GaussianIngredients
I Data
D = {x1, . . . , xN}
I Model HGauss – Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
θ = {µ}I For simplicity: assume variance σ2 is
known.
I Likelihood
p(D |µ) =
N∏n=1
N(xn∣∣µ, σ2)
− 6 − 4 − 2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 29
![Page 56: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/56.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the GaussianIngredients
I Data
D = {x1, . . . , xN}
I Model HGauss – Gaussian PDF
N(x∣∣µ, σ2
)=
1√2πσ2
e−1
2σ2(x−µ)2
θ = {µ}I For simplicity: assume variance σ2 is
known.
I Likelihood
p(D |µ) =
N∏n=1
N(xn∣∣µ, σ2)
x
p(x)
xn
N (xn|µ, σ2)
(C.M. Bishop, Pattern Recognition and Machine
Learning)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 29
![Page 57: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/57.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the GaussianBayes rule
I Combine likelihood with a Gaussian prior over µ
p(µ) = N(µ∣∣m0, s
20
)I The posterior is proportional to
p(µ | D, σ2) ∝ p(D |µ, σ2)p(µ)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 30
![Page 58: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/58.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the Gaussian
p(µ | D, σ2) ∝ p(D |µ)p(µ)
=
[N∏n=1
1√2πσ2
e−1
2σ2(xn−µ)2
]1√2πs20
e− 1
2s20(µ−m0)
2
=1√
2πσ2
N 1√2πs20︸ ︷︷ ︸
C1
exp
[− 1
2s20(µ2 − 2µm0 +m2
0)− 1
2σ2
N∑n=1
(µ2 − 2µxn + x2n)
]
= C2 exp
[− 1
2
(1
s20+N
σ2
)︸ ︷︷ ︸
1/σ
(µ2 − 2µ σ(
1
s20m0 +
1
σ2
N∑n=1
xn)︸ ︷︷ ︸µ
)+ C3
]
I Posterior parameters follow as the new coefficients.
I Note: All the constants we dropped on the way yield the model
evidence: p(µ | D, σ2) =p(D |µ)p(µ)
ZO. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 31
![Page 59: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/59.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the Gaussian
p(µ | D, σ2) ∝ p(D |µ)p(µ)
=
[N∏n=1
1√2πσ2
e−1
2σ2(xn−µ)2
]1√2πs20
e− 1
2s20(µ−m0)
2
=1√
2πσ2
N 1√2πs20︸ ︷︷ ︸
C1
exp
[− 1
2s20(µ2 − 2µm0 +m2
0)− 1
2σ2
N∑n=1
(µ2 − 2µxn + x2n)
]
= C2 exp
[− 1
2
(1
s20+N
σ2
)︸ ︷︷ ︸
1/σ
(µ2 − 2µ σ(
1
s20m0 +
1
σ2
N∑n=1
xn)︸ ︷︷ ︸µ
)+ C3
]
I Posterior parameters follow as the new coefficients.
I Note: All the constants we dropped on the way yield the model
evidence: p(µ | D, σ2) =p(D |µ)p(µ)
ZO. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 31
![Page 60: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/60.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the Gaussian
p(µ | D, σ2) ∝ p(D |µ)p(µ)
=
[N∏n=1
1√2πσ2
e−1
2σ2(xn−µ)2
]1√2πs20
e− 1
2s20(µ−m0)
2
=1√
2πσ2
N 1√2πs20︸ ︷︷ ︸
C1
exp
[− 1
2s20(µ2 − 2µm0 +m2
0)− 1
2σ2
N∑n=1
(µ2 − 2µxn + x2n)
]
= C2 exp
[− 1
2
(1
s20+N
σ2
)︸ ︷︷ ︸
1/σ
(µ2 − 2µ σ(
1
s20m0 +
1
σ2
N∑n=1
xn)︸ ︷︷ ︸µ
)+ C3
]
I Posterior parameters follow as the new coefficients.
I Note: All the constants we dropped on the way yield the model
evidence: p(µ | D, σ2) =p(D |µ)p(µ)
ZO. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 31
![Page 61: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/61.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the Gaussian
I Posterior of the mean: p(µ | D, σ2) ∝ N (µ | µ, σ), after somerewriting
µ =σ2
Ns20 + σ2m0 +
Ns20Ns20 + σ2
µML, µML =1
N
N∑n=1
xn
1
σ2=
1
s20+N
σ2
I Limiting cases for no and infinite amount of data
N = 0 N →∞µ m0 µML
σ2 s20 0
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 32
![Page 62: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/62.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the GaussianExamples
I Posterior p(µ | D, σ2) for increasing data sizes.
N = 0
N = 1
N = 2
N = 10
−1 0 10
5
(C.M. Bishop, Pattern Recognition and Machine Learning)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 33
![Page 63: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/63.jpg)
Parameter Inference for the Gaussian
Conjugate priors
I It is not chance that the posterior
p(µ | D, σ2) ∝ p(D |µ, σ2)p(µ)
is tractable in closed form for the Gaussian.
Conjugate prior
p(θ) is a conjugate prior for a particular likelihood p(D | θ) if the posterioris of the same functional form than the prior.
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 34
![Page 64: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/64.jpg)
Parameter Inference for the Gaussian
Conjugate priors
I It is not chance that the posterior
p(µ | D, σ2) ∝ p(D |µ, σ2)p(µ)
is tractable in closed form for the Gaussian.
Conjugate prior
p(θ) is a conjugate prior for a particular likelihood p(D | θ) if the posterioris of the same functional form than the prior.
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 34
![Page 65: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/65.jpg)
Parameter Inference for the Gaussian
Conjugate priorsExponential family distributions
I A large class of probability distributions are part of the exponentialfamily (all in this course) and can be written as:
p(x |θ) = h(x)g(θ) exp{θTu(x)}
I For example for the Gaussian:
p(x |µ, σ2) =1
2πσ2exp{− 1
2σ2(x2 − 2xµ+ µ2)}
= h(x)g(θ)exp{θTu(x)}
with θ =
(µ/σ2
−1/2σ2
), h(x) =
1√2π
u(x) =
(xx2
), g(θ) = (−2θ2)
1/2 exp
(θ214θ2
)
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 35
![Page 66: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/66.jpg)
Parameter Inference for the Gaussian
Conjugate priorsExponential family distributions
Conjugacy and exponential family distributions
I For all members of the exponential family it is possible to construct aconjugate prior.
I Intuition: The exponential form ensures that we can construct a priorthat keeps its functional form.
I Conjugate priors for the Gaussian N(x∣∣µ, σ2)
I p(µ) = N(µ∣∣m0, s
20
)I p(
1
σ2) = Γ(
1
σ2, a0, b0).
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 36
![Page 67: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/67.jpg)
Parameter Inference for the Gaussian
Conjugate priorsExponential family distributions
Conjugacy and exponential family distributions
I For all members of the exponential family it is possible to construct aconjugate prior.
I Intuition: The exponential form ensures that we can construct a priorthat keeps its functional form.
I Conjugate priors for the Gaussian N(x∣∣µ, σ2)
I p(µ) = N(µ∣∣m0, s
20
)I p(
1
σ2) = Γ(
1
σ2, a0, b0).
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 36
![Page 68: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/68.jpg)
Parameter Inference for the Gaussian
Bayesian Inference for the GaussianSequential learning
I Bayes rule naturally leads itself to sequential learning
I Assume one by one multiple datasets become available: D1, . . . ,DS
p1(θ) ∝ p(D1 |θ)p(θ)
p2(θ) ∝ p(D2 |θ)p1(θ)
. . .
I Note: Assuming the datasets are independent, sequential updates anda single learning step yield the same answer.
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 37
![Page 69: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/69.jpg)
Summary
Outline
Motivation
Prerequisites
Probability Theory
Parameter Inference for the Gaussian
Summary
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 38
![Page 70: An Introduction to Probabilistic modeling · 2014-10-29 · Motivation Why probabilistic modeling? I Inferences from data are intrinsicallyuncertain. I Probability theory: model uncertainty](https://reader033.fdocuments.us/reader033/viewer/2022042211/5eb22289acc03e17722823b0/html5/thumbnails/70.jpg)
Summary
Summary
I Probability theory: the language of uncertainty.
I Key rules of probability: sum rule, product rule.
I Bayes rules formes the fundamentals of learning.(posterior ∝ likelihood · prior).
I The entropy quantifies uncertainty.
I Parameter learning using maximum likelihood.
I Bayesian inference for the Gaussian.
O. Stegle & K. Borgwardt An introduction to probabilistic modeling Tubingen 39