Lecture 4 Probability and what it has to do with data analysis.
-
date post
21-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Lecture 4 Probability and what it has to do with data analysis.
![Page 1: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/1.jpg)
Lecture 4
Probabilityand what it has to do with
data analysis
![Page 2: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/2.jpg)
Please Read
Doug Martinson’s
Chapter 2: ‘Probability Theory’
Available on Courseworks
![Page 3: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/3.jpg)
Abstraction
Random variable, x
it has no set value, until you ‘realize’ it
its properties are described by a distribution, p(x)
![Page 4: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/4.jpg)
When you realize x
the probability that the value you get is
between x and x+dx
is p(x) dx
Probability density distribution
![Page 5: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/5.jpg)
the probability, P, that the value you get is
is between x1 and x2
is P = x1
x2 p(x) dx
Note that it is written with a capital P
And represented by a fraction between
0 = never
And
1 = always
![Page 6: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/6.jpg)
p(x)
x x1x2
Probability P that x is between x1 and x2 is proportional to this area
![Page 7: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/7.jpg)
the probability that the value you get is
is something is unity
-+ p(x) dx = 1
Or whatever the allowable range of x is …
p(x)
x
Probability that x is between - and + is unity, so total area = 1
![Page 8: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/8.jpg)
Why all this is relevant …
Any measurement is that contains noise is treated as a random variable, x
The distribution p(x) embodies both the ‘true value’ of the quantity being measured and the measurement noise
All quantities derived from a random variable are themselves random variables, so …
The algebra of random variables allows you to understand how measurement noise affects inferences made from the data
![Page 9: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/9.jpg)
Basic Description of Distributions
![Page 10: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/10.jpg)
p(x)
xxmode
Mode
x at which distribution has peak
most-likely value of x
peak
![Page 11: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/11.jpg)
But modes can be deceptive …p(
x)
xxmode
peak
0 10
x N0-1 31-2 182-3 113-4 84-5 115-6 146-7 87-8 78-9 119-10 9
Sure, the 1-2 range has the most counts, but most of the measurements are bigger than 2!
100 realizations of x
![Page 12: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/12.jpg)
p(x)
xxmedian
Median
50% chance x is smaller than xmedian
50% chance x is bigger than xmedian
No special reason the median needs to coincide with the peak
50% 50%
![Page 13: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/13.jpg)
p(x)
x
Expected value or ‘mean’
x you would get if you took the mean of lots of realizations of x
01
2
3
4
1 2 3
Let’s examine a discrete distribution, for simplicity ...
![Page 14: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/14.jpg)
x N
1 20
2 80
3 40
Total 140
mean = [ 20 1 + 80 2 + 40 3 ] / 140
= (20/140) 1 + (80/140) 2 + (40/140) 3
= p(1) 1 + p(2) 2 + p(3) 3
= Σi p(xi) xi
Hypothetical table of 140 realizations of x
![Page 15: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/15.jpg)
by analogyfor a smooth distribution
Expected value of x
E(x) = -+ x p(x) dx
![Page 16: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/16.jpg)
by the way …You can compute the expected (“mean”)
value of any function of x this way …
E(x) = -+ x p(x) dx
E(x2) = -+ x2 p(x) dx
E(x) = -+ x p(x) dx
etc.
![Page 17: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/17.jpg)
Beware
E(x2) E(x)2
E(x) E(x)2
and so forth …
![Page 18: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/18.jpg)
p(x)
x
Width of a distribution
Here’s a perfectly sensible way to define the width of a distribution…
50%25%25%
W50
… it’s not used much, though
![Page 19: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/19.jpg)
p(x)
x
Width of a distribution
Here’s another way…
… multiply and integrate
E(x)
Parabola [x-E(x)]2
![Page 20: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/20.jpg)
p(x)
x
Variance = 2 = -+ [x-E(x)]2 p(x) dx
E(x)
[x-E
(x)]
2
[x-E
(x)]
2 p(
x)
xE(x)
Compute this total area …
Idea is that if distribution is narrow, then most of the probability lines up with the low spot of the parabola
But if it is wide, then some of the probability lines up with the high parts of the parabola
![Page 21: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/21.jpg)
p(x)
x
variance =
A measure of width …
we don’t immediately know its relationship to area, though …
E(x)
![Page 22: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/22.jpg)
the Gaussian or normal distribution
p(x) = exp{ - (x-x)2 / 22 ) 1(2)
expected value
variance
Memorize me !
![Page 23: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/23.jpg)
x = 1
= 1
x = 3
= 0.5
x
x
p(x)
p(x)
Examples of
Normal
Distributions
![Page 24: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/24.jpg)
x
p(x)
x x+2x-2
95%
Expectation =
Median =
Mode = x
95% of probability within 2 of the expected value
Properties of the normal distribution
![Page 25: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/25.jpg)
Functions of a random variable
any function of a random variable is itself a random variable
![Page 26: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/26.jpg)
If x has distribution p(x)
the y(x) has distribution
p(y) = p[x(y)] dx/dy
![Page 27: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/27.jpg)
This follows from the rule for transforming integrals …
1 = x1
x2 p(x) dx = y1
y2 p[x(y)] dx/dy dy
Limits so that y1=y(x1), etc.
![Page 28: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/28.jpg)
example
Let x have a uniform (white) distribution of [0,1]
p(x)
0 x 1
1
Uniform probability that x is anywhere between 0 and 1
![Page 29: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/29.jpg)
Let y = x2
then x=y½
y(x=0)=0y(x=1)=1dx/dy=½y-½
p[x(y)]=1
So p(y)=½y-½ on the interval [0,1]
1
![Page 30: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/30.jpg)
Numerical testhistogram of 1000 random numbers
Histogram of x, generated with Excel’s rand() function which claims to be based upon a uniform distribution
Histogram of x2, generated by squaring x’s from above
Plausible that it’s proportional to 1/y
Plausible that it’s uniform
![Page 31: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/31.jpg)
multivariate distributions
![Page 32: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/32.jpg)
example
Liberty island is inhabited by both pigeons and seagulls
40% of the birds are pigeonsand 60% of the birds are gulls
50% of pigeons are white and 50% are grey100% of gulls are white
![Page 33: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/33.jpg)
Two variables
species s takes two values
pigeon p
and gull g
color c takes two values
white w
and tan t
Of 100 birds,
20 are white pigeons
20 are tan pigeons
60 are white gulls
0 are tan gulls
![Page 34: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/34.jpg)
What is the probability that a bird has species s and color c ?
cw t
p
g
s
20% 20%
60% 0%
Note: sum of all boxes is 100%
a random bird, that is
![Page 35: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/35.jpg)
This is called theJoint Probability
and is writtenP(s,c)
![Page 36: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/36.jpg)
Two continuous variablessay x1 and x2
have a joint probability distributionand written
p(x1, x2)with
p(x1, x2) dx1 dx2 = 1
![Page 37: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/37.jpg)
The probability thatx1 is between x1 and x1+dx1
and x2 is between x2 and x2+dx2
isp(x1, x2) dx1 dx2
so p(x1, x2) dx1 dx2 = 1
![Page 38: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/38.jpg)
You would contour a joint probability distribution
and it would look something like
x2
x1
![Page 39: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/39.jpg)
What is the probability that a bird has color c ?
cw t
p
g
s
20% 20%
60% 0%
start with P(s,c)
80% 20%
and sum columns
To get P(c)
Of 100 birds,
20 are white pigeons
20 are tan pigeons
60 are white gulls
0 are tan gulls
![Page 40: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/40.jpg)
What is the probability that a bird has species s ?
cw t
p
g
s
20% 20%
60% 0%
start with P(s,c)
60%
40%
and sum rows
To get P(s)
Of 100 birds,
20 are white pigeons
20 are tan pigeons
60 are white gulls
0 are tan gulls
![Page 41: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/41.jpg)
These operations make sense with distributions, too
x2
x1
x2
x1
x1
p(x1)
p(x1) = p(x1,x2) dx2
x2
p(x2)
p(x2) = p(x1,x2) dx1
distribution of x1
(irrespective of x2)distribution of x2
(irrespective of x1)
![Page 42: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/42.jpg)
Given that a bird is species swhat is the probability that it has color c ?
cw t
p
g
s
50% 50%
100% 0%
Note, all rows sum to 100
Of 100 birds,
20 are white pigeons
20 are tan pigeons
60 are white gulls
0 are tan gulls
![Page 43: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/43.jpg)
This is called theConditional Probability of c given s
and is writtenP(c|s)
similarly …
![Page 44: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/44.jpg)
Given that a bird is color cwhat is the probability that it has species s ?
cw t
p
g
s
25% 100%
75% 0%
Note, all columns sum to 100
Of 100 birds,
20 are white pigeons
20 are tan pigeons
60 are white gulls
0 are tan gulls
So 25% of white birds are pigeons
![Page 45: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/45.jpg)
This is called theConditional Probability of s given c
and is writtenP(s|c)
![Page 46: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/46.jpg)
Beware!P(c|s) P(s|c)
cw t
p
g
s
50% 50%
100% 0%
cw t
p
g
s
25% 100%
75% 0%
![Page 47: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/47.jpg)
note
P(s,c) = P(s|c) P(c)
cw t
p
g
s
20 20
60 0
cw t
p
g
s
25 100
75 0
= 80 20
cw t
25% of 80 is 20
![Page 48: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/48.jpg)
and
P(s,c) = P(c|s) P(s)
cw t
p
g
s
20 20
60 0
=
cw t
p
g
s
50 50
100 0 60
40p
g
s
50% of 40 is 20
![Page 49: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/49.jpg)
and if
P(s,c) = P(s|c) P(c) = P(c|s) P(s)
thenP(s|c) = P(c|s) P(s) / P(c)
and
P(c|s) = P(s|c) P(c) / P(s)
… which is called Bayes Theorem
![Page 50: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/50.jpg)
Why Bayes Theorem is important
Consider the problem of fitting a straight line to data, d, where the intercept and slope are given by the vector m.
If we guess m and use it to predict d we are doing something like P(d|m)
But if we observe d and use it to estimate m then we are doing something like P(m|d)
Bayes Theorem provides a framework for relating what we do to get P(d|m) to what we do to get P(m|d)
![Page 51: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/51.jpg)
Expectation
Variance
And
Covariance
Of a multivariate distribution
![Page 52: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/52.jpg)
The expected value of x1 and x2 are calculated in a fashion analogous to the one-variable case:
E(x1)= x1 p(x1,x2) dx1dx2 E(x2)= x2 p(x1,x2) dx1dx2
x2
x1
Note
E(x1) = x1 p(x1,x2) dx1dx2
= x1 [ p(x1,x2)dx2 ] dx1
= x1 p(x1) dx1
So the formula really is just the expectation of a one-variable distribution
![Page 53: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/53.jpg)
The variance of x1 and x2 are calculated in a fashion analogous to the one-variable case, too:
x12= (x1-x1)2p(x1,x2) dx1dx2 with x1=E(x1)
and similarly for x22
x2
x1
Note, once againx1
2= (x1-x1)2p(x1,x2) dx1dx2
= (x1-x1)2 [p(x1,x2) dx2] dx2
= (x1-x1)2p(x1) dx1
So the formula really is just the variance of a one-variable distribution
![Page 54: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/54.jpg)
Note that in this distributionif x1 is bigger than x1, then x2 is bigger than x2 and if x1 is smaller than x1, then x2 is smaller than x2
x2
x1Expected value
x1
x2
This is a
positive correlation
![Page 55: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/55.jpg)
Conversely, in this distributionif x1 is bigger than x1, then x2 is smaller than x2 and if x1 is smaller than x1, then x2 is smaller than x2
x2
x1Expected value
x1
x2
This is a
negative correlation
![Page 56: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/56.jpg)
This correlation can be quantified by multiplying the distribution by a four-quadrant function
x2
x1
x1
x2
+
+ -
-
And then integrating. The function (x1-x1)(x2-x2) works fine
cov(x1,x2) = (x1-x1) (x2-x2) p(x1,x2) dx1dx2Called the “covariance”
![Page 57: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/57.jpg)
Note that the vector x with elements
xi = E(xi)= xi p(x1,x2) dx1dx2 is the expectation of x
and the matrix Cx with elementsCij = (xi-xi) (xj-xj) p(x1,x2) dx1dx2
has diagonal elements equal to the variance of x i
Cxii = xi
2
andoff-diagonal elements equal to the covariance of x i and xj
Cxij = cov(xi,xj)
![Page 58: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/58.jpg)
“Center” of multivatiate distribution
x
“Width” and “Correlatedness” of multivariate distribution
summarized a lot – but not everything –about a multivariate distribution
![Page 59: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/59.jpg)
Functions of a set of random variables, x
A vector of of N random variables in a vector, x
![Page 60: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/60.jpg)
given y(x)Do you remember how to
transform the integral
… p(x) dNx =
… ? dNy =
![Page 61: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/61.jpg)
given y(x)
then
… p(x) dNx =
… p[x(y)] |dx/dy| dNy =
Jacobian determinant, that is, the determinant of matrix Jij whose elements are dxi/dyj
![Page 62: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/62.jpg)
But here’s something that’s EASIER …
Suppose y(x) is a linear function y=Mx
Then we can easily calculate the expectation of y
yi = E(yi) = … yi p(x1 … xN) dx1…dxN
= … Mijxj p(x1 … xN) dx1… dxN
= Mij … xj p(x1 … xN) dx1 … dxN
= Mij E(xi) = Mij xi So y=Mx
![Page 63: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/63.jpg)
And we can easily calculate the covariance
Cyij = … (yi – yi) (yj – yj) p(x1,x2) dx1dx2
= … ΣpMip(xp – xp) ΣqMjq (xq – xq) p(x1…xN) dx1…dxN
= ΣpMip ΣqMjq … (xp – xp) (xq – xq) p(x1…xN) dx1 …dxN
= ΣpMip ΣqMjq Cxpq
So Cy = M Cx MT
Memorize!
![Page 64: Lecture 4 Probability and what it has to do with data analysis.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d6a5503460f94a4915f/html5/thumbnails/64.jpg)
Note that these rules work regardless of the distribution of x
if y is linearly related to x, y=Mx then
y=Mx (rule for means)
Cy = M Cx MT
(rule for propagating error)