Uncertainty and Probability
Transcript of Uncertainty and Probability
![Page 1: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/1.jpg)
Uncertainty and Probability
MLAI: Week 1
Neil D. Lawrence
Department of Computer ScienceSheffield University
29th September 2015
![Page 2: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/2.jpg)
Outline
Course Text
Review: Basic Probability
![Page 3: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/3.jpg)
Rogers and Girolami
![Page 4: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/4.jpg)
Bishop
![Page 5: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/5.jpg)
What is Machine Learning?
data
+ model = prediction
I data: observations, could be actively or passively acquired(meta-data).
I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.
I prediction: an action to be taken or a categorization or aquality score.
![Page 6: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/6.jpg)
What is Machine Learning?
data +
model = prediction
I data: observations, could be actively or passively acquired(meta-data).
I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.
I prediction: an action to be taken or a categorization or aquality score.
![Page 7: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/7.jpg)
What is Machine Learning?
data + model
= prediction
I data: observations, could be actively or passively acquired(meta-data).
I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.
I prediction: an action to be taken or a categorization or aquality score.
![Page 8: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/8.jpg)
What is Machine Learning?
data + model =
prediction
I data: observations, could be actively or passively acquired(meta-data).
I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.
I prediction: an action to be taken or a categorization or aquality score.
![Page 9: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/9.jpg)
What is Machine Learning?
data + model = prediction
I data: observations, could be actively or passively acquired(meta-data).
I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.
I prediction: an action to be taken or a categorization or aquality score.
![Page 10: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/10.jpg)
y = mx + c
![Page 11: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/11.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
y
x
y = mx + c
![Page 12: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/12.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
y
x
y = mx + cc
m
![Page 13: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/13.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
y
x
y = mx + cc
m
![Page 14: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/14.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
y
x
y = mx + cc
m
![Page 15: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/15.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
y
x
y = mx + c
![Page 16: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/16.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
y
x
y = mx + c
![Page 17: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/17.jpg)
0
1
2
3
4
5
0 1 2 3 4 5
y
x
y = mx + c
![Page 18: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/18.jpg)
y = mx + c
point 1: x = 1, y = 3
3 = m + c
point 2: x = 3, y = 1
1 = 3m + c
point 3: x = 2, y = 2.5
2.5 = 2m + c
![Page 19: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/19.jpg)
![Page 20: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/20.jpg)
![Page 21: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/21.jpg)
![Page 22: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/22.jpg)
6 A PHILOSOPHICAL ESSAY ON PROBABILITIES.
height: "The day will come when, by study pursued
through several ages, the things now concealed will
appear with evidence; and posterity will be astonished
that truths so clear had escaped us.' '
Clairaut then
undertook to submit to analysis the perturbations which
the comet had experienced by the action of the two
great planets, Jupiter and Saturn; after immense cal-
culations he fixed its next passage at the perihelion
toward the beginning of April, 1759, which was actually
verified by observation. The regularity which astronomyshows us in the movements of the comets doubtless
exists also in all phenomena. -
The curve described by a simple molecule of air or
vapor is regulated in a manner just as certain as the
planetary orbits;the only difference between them is
that which comes from our ignorance.
Probability is relative, in part to this ignorance, in
part to our knowledge. We know that of three or a
greater number of events a single one ought to occur;
but nothing induces us to believe that one of them will
occur rather than the others. In this state of indecision
it is impossible for us to announce their occurrence with
certainty. It is, however, probable that one of these
events, chosen at will, will not occur because we see
several cases equally possible which exclude its occur-
rence, while only a single one favors it.
The theory of chance consists in reducing all the
events of the same kind to a certain number of cases
equally possible, that is to say, to such as we may be
equally undecided about in regard to their existence,and in determining the number of cases favorable to
the event whose probability is sought. The ratio of
![Page 23: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/23.jpg)
y = mx + c + ε
point 1: x = 1, y = 3
3 = m + c + ε1
point 2: x = 3, y = 1
1 = 3m + c + ε2
point 3: x = 2, y = 2.5
2.5 = 2m + c + ε3
![Page 24: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/24.jpg)
Outline
Course Text
Review: Basic Probability
![Page 25: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/25.jpg)
Probability Review I
I We are interested in trials which result in two randomvariables, X and Y, each of which has an ‘outcome’denoted by x or y.
I We summarise the notation and terminology for thesedistributions in the following table.
Terminology Notation DescriptionJoint P
(X = x,Y = y
)‘The probability that
Probability X = x and Y = y’Marginal P (X = x) ‘The probability that
Probability X = x regardless of Y’Conditional P
(X = x|Y = y
)‘The probability that
Probability X = x given that Y = y’
Table: The different basic probability distributions.
![Page 26: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/26.jpg)
A Pictorial Definition of Probability
1
2
3
4
1 2 3 4 5 6
nY=4 nX=5nX=3,Y=4
N crosses total
X
Y
Figure: Representation of joint and conditional probabilities.
![Page 27: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/27.jpg)
Different Distributions
Terminology Definition Notation
Joint limN→∞nX=3,Y=4
N P (X = 3,Y = 4)Probability
Marginal limN→∞nX=5
N P (X = 5)Probability
Conditional limN→∞nX=3,Y=4
nY=4P (X = 3|Y = 4)
Probability
Table: Definition of probability distributions.
![Page 28: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/28.jpg)
Notational Details
I Typically we should write out P(X = x,Y = y
).
I In practice, we often use P(x, y
).
I This looks very much like we might write a multivariatefunction, e.g. f
(x, y
)= x
y .
I For a multivariate function though, f(x, y
), f
(y, x
).
I However P(x, y
)= P
(y, x
)because
P(X = x,Y = y
)= P
(Y = y,X = x
).
I We now quickly review the ‘rules of probability’.
![Page 29: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/29.jpg)
Normalization
All distributions are normalized. This is clear from the fact that∑x nx = N, which gives∑
xP (x) =
∑x nx
N=
NN
= 1.
A similar result can be derived for the marginal and conditionaldistributions.
![Page 30: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/30.jpg)
The Sum Rule
Ignoring the limit in our definitions:
I The marginal probability P(y)
isny
N (ignoring the limit).
I The joint distribution P(x, y
)is
nx,y
N .I ny =
∑x nx,y so
ny
N=
∑x
nx,y
N,
in other wordsP(y)
=∑
xP(x, y
).
This is known as the sum rule of probability.
![Page 31: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/31.jpg)
The Product Rule
I P(x|y
)is
nx,y
ny.
I P(x, y
)is
nx,y
N=
nx,y
ny
ny
N
or in other words
P(x, y
)= P
(x|y
)P(y).
This is known as the product rule of probability.
![Page 32: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/32.jpg)
Bayes’ Rule
I From the product rule,
P(y, x
)= P
(x, y
)= P
(x|y
)P(y),
soP(y|x
)P (x) = P
(x|y
)P(y)
which leads to Bayes’ rule,
P(y|x
)=
P(x|y
)P(y)
P (x).
![Page 33: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/33.jpg)
Bayes’ Theorem Example
I There are two barrels in front of you. Barrel One contains20 apples and 4 oranges. Barrel Two other contains 4apples and 8 oranges. You choose a barrel randomly andselect a fruit. It is an apple. What is the probability that thebarrel was Barrel One?
![Page 34: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/34.jpg)
Bayes’ Theorem Example: Answer I
I We are given that:
P(F = A|B = 1) =20/24P(F = A|B = 2) =4/12
P(B = 1) =0.5P(B = 2) =0.5
![Page 35: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/35.jpg)
Bayes’ Theorem Example: Answer II
I We use the sum rule to compute:
P(F = A) =P(F = A|B = 1)P(B = 1)+ P(F = A|B = 2)P(B = 2)
=20/24 × 0.5 + 4/12 × 0.5 = 7/12
I And Bayes’ theorem tells us that:
P(B = 1|F = A) =P(F = A|B = 1)P(B = 1)
P(F = A)
=20/24 × 0.5
7/12= 5/7
![Page 36: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/36.jpg)
Bayes’ Theorem Example: Answer II
I We use the sum rule to compute:
P(F = A) =P(F = A|B = 1)P(B = 1)+ P(F = A|B = 2)P(B = 2)
=20/24 × 0.5 + 4/12 × 0.5 = 7/12
I And Bayes’ theorem tells us that:
P(B = 1|F = A) =P(F = A|B = 1)P(B = 1)
P(F = A)
=20/24 × 0.5
7/12= 5/7
![Page 37: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/37.jpg)
Reading & Exercises
Before Friday, review the example on Bayes Theorem!
I Read and understand Bishop on probability distributions:page 12–17 (Section 1.2).
I Complete Exercise 1.3 in Bishop.
![Page 38: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/38.jpg)
Distribution Representation
I We can represent probabilities as tables
y 0 1 2P(y)
0.2 0.5 0.3
![Page 39: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/39.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2
P(y)
y
Figure: Histogram representation of the simple distribution.
![Page 40: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/40.jpg)
Expectations of Distributions
I Writing down the entire distribution is tedious.I Can summarise through expectations.⟨
f (y)⟩
P(y) =∑
yf (y)p(y)
I Consider:y 0 1 2
P(y)
0.2 0.5 0.3I We have
⟨y⟩
P(y) = 0.2 × 0 + 0.5 × 1 + 0.3 × 2 = 1.1I This is the first moment or mean of the distribution.
![Page 41: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/41.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2
P(y)
y
Figure: Histogram representation of the simple distribution includingthe expectation of y (red line), the mean of the distribution.
![Page 42: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/42.jpg)
Variance and Standard Deviation
I Mean gives us the centre of the distribution.I Consider:
y 0 1 2y2 0 1 4
P(y)
0.2 0.5 0.3
I Second moment is⟨y2
⟩P(y)
= 0.2 × 0 + 0.5 × 1 + 0.3 × 4 = 1.7
I Variance is⟨y2
⟩−
⟨y⟩2 = 1.7 − 1.1 × 1.1 = 0.49
I Standard deviation is square root of variance.I Standard deviation gives us the “width” of the
distribution.
![Page 43: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/43.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2
P(y)
y
Figure: Histogram representation of the simple distribution includinglines at one standard deviation from the mean of the distribution(magenta lines).
![Page 44: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/44.jpg)
Expectation Computation Example
I Consider the following distribution.
y 1 2 3 4P(y)
0.3 0.2 0.1 0.4I What is the mean of the distribution?
I What is the standard deviation of the distribution?I Are the mean and standard deviation representative of the
distribution form?I What is the expected value of − log P(y)?
![Page 45: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/45.jpg)
Expectation Computation Example
I Consider the following distribution.
y 1 2 3 4P(y)
0.3 0.2 0.1 0.4I What is the mean of the distribution?I What is the standard deviation of the distribution?
I Are the mean and standard deviation representative of thedistribution form?
I What is the expected value of − log P(y)?
![Page 46: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/46.jpg)
Expectation Computation Example
I Consider the following distribution.
y 1 2 3 4P(y)
0.3 0.2 0.1 0.4I What is the mean of the distribution?I What is the standard deviation of the distribution?I Are the mean and standard deviation representative of the
distribution form?
I What is the expected value of − log P(y)?
![Page 47: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/47.jpg)
Expectation Computation Example
I Consider the following distribution.
y 1 2 3 4P(y)
0.3 0.2 0.1 0.4I What is the mean of the distribution?I What is the standard deviation of the distribution?I Are the mean and standard deviation representative of the
distribution form?I What is the expected value of − log P(y)?
![Page 48: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/48.jpg)
Expectations Example: Answer
I We are given that:
y 1 2 3 4P(y)
0.3 0.2 0.1 0.4y2 1 4 9 16
− log(P(y)) 1.204 1.609 2.302 0.916I Mean: 1 × 0.3 + 2 × 0.2 + 3 × 0.1 + 4 × 0.4 = 2.6I Second moment: 1 × 0.3 + 4 × 0.2 + 9 × 0.1 + 16 × 0.4 = 8.4I Variance: 8.4 − 2.6 × 2.6 = 1.64I Standard deviation:
√1.64 = 1.2806
I Expectation − log(P(y)):0.3 × 1.204 + 0.2 × 1.609 + 0.1 × 2.302 + 0.4 × 0.916 = 1.280
![Page 49: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/49.jpg)
Sample Based Approximation Example
I You are given the following values samples of heights ofstudents,
i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80
I What is the sample mean?
I What is the sample variance?I Can you compute sample approximation expected value of− log P(y)?
I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?
![Page 50: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/50.jpg)
Sample Based Approximation Example
I You are given the following values samples of heights ofstudents,
i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80
I What is the sample mean?I What is the sample variance?
I Can you compute sample approximation expected value of− log P(y)?
I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?
![Page 51: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/51.jpg)
Sample Based Approximation Example
I You are given the following values samples of heights ofstudents,
i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80
I What is the sample mean?I What is the sample variance?I Can you compute sample approximation expected value of− log P(y)?
I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?
![Page 52: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/52.jpg)
Sample Based Approximation Example
I You are given the following values samples of heights ofstudents,
i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80
I What is the sample mean?I What is the sample variance?I Can you compute sample approximation expected value of− log P(y)?
I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?
![Page 53: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/53.jpg)
Sample Based Approximation Example: Answer
I We can compute:
i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80y2
i 3.0976 2.9929 3.2041 3.2761 3.4225 3.2400
I Mean: 1.76+1.73+1.79+1.81+1.85+1.806 = 1.79
I Second moment:3.0976+2.9929+3.2041+3.2761+3.4225+3.2400
6 = 3.2055I Variance: 3.2055 − 1.79 × 1.79 = 1.43 × 10−3
I Standard deviation: 0.0379I No, you can’t compute it. You don’t have access to P(y)
directly.
![Page 54: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/54.jpg)
Reading
I See probability review at end of slides for reminders.I Read and understand Rogers and Girolami on:
1. Section 2.2 (pg 41–53).2. Section 2.4 (pg 55–58).3. Section 2.5.1 (pg 58–60).4. Section 2.5.3 (pg 61–62).
I For other material in Bishop read:1. Probability densities: Section 1.2.1 (Pages 17–19).2. Expectations and Covariances: Section 1.2.2 (Pages 19–20).3. The Gaussian density: Section 1.2.4 (Pages 24–28) (don’t
worry about material on bias).4. For material on information theory and KL divergence try
Section 1.6 & 1.6.1 of Bishop (pg 48 onwards).I If you are unfamiliar with probabilities you should
complete the following exercises:1. Bishop Exercise 1.72. Bishop Exercise 1.83. Bishop Exercise 1.9
![Page 55: Uncertainty and Probability](https://reader030.fdocuments.us/reader030/viewer/2022012712/61ab52129a62ad346714aa76/html5/thumbnails/55.jpg)
References I
C. M. Bishop. Pattern Recognition and Machine Learning.Springer-Verlag, 2006. [Google Books] .
P. S. Laplace. Essai philosophique sur les probabilites. Courcier, Paris, 2ndedition, 1814. Sixth edition of 1840 translated and repreinted (1951)as A Philosophical Essay on Probabilities, New York: Dover; fifthedition of 1825 reprinted 1986 with notes by Bernard Bru, Paris:Christian Bourgois Editeur, translated by Andrew Dale (1995) asPhilosophical Essay on Probabilities, New York:Springer-Verlag.
S. Rogers and M. Girolami. A First Course in Machine Learning. CRCPress, 2011. [Google Books] .