Lec-1 (Review of Probability).pptce.sharif.edu/courses/86-87/1/ce695/resources/root/... · For two...
Transcript of Lec-1 (Review of Probability).pptce.sharif.edu/courses/86-87/1/ce695/resources/root/... · For two...
Stochastic Processes
R i f El P b bili
Hamid R. RabieeAli Jalali
Review of Elementary ProbabilityLecture I
Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions Correlation Important Theorems
History & Philosophy Started by gamblers’ dispute Probability as a game analyzer ! Formulated by B. Pascal and P. Fermet First Problem (1654) :
“Double Six” during 24 throws First Book (1657) :
Christian Huygens, “De Ratiociniis in LudoAleae”, In German, 1657.
History & Philosophy (Cont’d)
Rapid development during 18th Century
Major Contributions: J. Bernoulli (1654-1705) A. De Moivre (1667-1754)
History & Philosophy (Cont’d)
A renaissance: Generalizing the conceptsfrom mathematical analysis of games toanalyzing scientific and practicalproblems: P Laplace (1749 1827)problems: P. Laplace (1749-1827)
New approach first book: P. Laplace, “Théorie Analytique des
Probabilités”, In France, 1812.
History & Philosophy (Cont’d)
19th century’s developments: Theory of errors Actuarial mathematics St ti ti l h i Statistical mechanics
Other giants in the field: Chebyshev, Markov and Kolmogorov
History & Philosophy (Cont’d)
Modern theory of probability (20th) : A. Kolmogorov : Axiomatic approach
First modern book: First modern book: A. Kolmogorov, “Foundations of Probability
Theory”, Chelsea, New York, 1950
Nowadays, Probability theory as a partof a theory called Measure theory !
History & Philosophy (Cont’d)
Two major philosophies: Frequentist Philosophy
Observation is enough Bayesian Philosophy Bayesian Philosophy
Observation is NOT enough Prior knowledge is essential
Both are useful
History & Philosophy (Cont’d)
Frequentist philosophy There exist fixed
parameters like mean,θ. There is an underlying
distribution from which
Bayesian philosophy Parameters are variable Variation of the parameter
defined by the priorprobability
samples are drawn Likelihood functions(L(θ))
maximize parameter/data For Gaussian distribution
the L(θ) for the meanhappens to be 1/N∑ixi orthe average.
p y This is combined with
sample data p(X/θ) toupdate the posteriordistribution p(θ/X).
Mean of the posterior,p(θ/X),can be considered apoint estimate of θ.
History & Philosophy (Cont’d)
An Example: A coin is tossed 1000 times, yielding 800 heads and 200
tails. Let p = P(heads) be the bias of the coin. What is p? Bayesian Analysis
Our prior knowledge (believe) : (Uniform(0,1)) Our posterior knowledge :
Frequentist Analysis Answer is an estimator such that
Mean : Confidence Interval :
( ) 1=pπ( ) ( )200800 1 ppnObservatiop −=π
p[ ] 8.0ˆ =pE
( ) 95.0826.0ˆ774.0 ≥≤≤ pP
History & Philosophy (Cont’d)
Further reading: http://www.leidenuniv.nl/fsw/verduin/stat
hi t/ t thi t hthist/stathist.htm http://www.mrs.umn.edu/~sungurea/intro
stat/history/indexhistory.shtml www.cs.ucl.ac.uk/staff/D.Wischik/Talks/h
istprob.pdf
Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions Correlation Important Theorems
Random Variables Probability Space
A triple of represents a nonempty set, whose elements are
sometimes known as outcomes or states of nature
( )PF ,,ΩΩ
represents a set, whose elements are calledevents. The events are subsets of . should be a“Borel Field”.
represents the probability measure.
Fact:
FΩ F
P
( ) 1=ΩP
Random Variables (Cont’d)
Random variable is a “function” (“mapping”)from a set of possible outcomes of theexperiment to an interval of real (complex)numbers.
Ω IFXF In other words : ( )
=→
⊆Ω⊆
rXIFX
RIF
β:
:
Random Variables (Cont’d)
Example I : Mapping faces of a dice to the first six
natural numbers. Example II : Example II :
Mapping height of a man to the realinterval (0,3] (meter or something else).
Example III : Mapping success in an exam to the
discrete interval [0,20] by quantum 0.1 .
Random Variables (Cont’d)
Random Variables Discrete
Dice, Coin, Grade of a course, etc. Continuous Continuous
Temperature, Humidity, Length, etc.
Random Variables Real Complex
Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions Correlation Important Theorems
Density/Distribution Functions Probability Mass Function (PMF)
Discrete random variables Summation of impulses Th it d f h i l t The magnitude of each impulse represents
the probability of occurrence of the outcome
Example I: Rolling a fair dice
61
βX
( )XP
( )∑=
−=6
161
iiXPMF δ
Density/Distribution Functions (Cont’d)
Example II: Summation of two fair dices
61
( )XP
Note : Summation of all probabilities shouldbe equal to ONE. (Why?)
6
βX
Density/Distribution Functions (Cont’d)
Probability Density Function (PDF) Continuous random variables The probability of occurrence of
will be
+−∈
2,
20dxxdxxx
( )dxxP .
βX
( )XP( )xP
Density/Distribution Functions (Cont’d)
Some famous masses and densities Uniform Density
( ) ( ) ( )( )beginUendUa
xf −= .1 a1
( )XP
Gaussian (Normal) Density
a
βX
a
πσ 2.1
βX
( )XP
µ( )
( )( )σµ
πσσµ
,2.1 2
2
.2 Nexfx
==−
−
Density/Distribution Functions (Cont’d)
Binomial Density
( ) ( ) nNn ppnN
nf −−
= .1.
( )nf
Poisson Density
( ) ( )( ) !1:1
xxxNotex
exfx
=+Γ⇒ℵ∈+Γ
= − λλ
0 N.pn
N
x
( )xf
λ
Important Fact: ( ) ( )!..1.: .
npNepp
nN
NlargelySufficientForn
pNnnN −− ≈−
Density/Distribution Functions (Cont’d)
Cauchy Density
( )( ) 22
1γµ
γπ +−
×=x
xf
( )XP
Weibull Density
( ) γµ
βX
( )kxk
exkxf
−−
×
×= λ
λλ
1
Density/Distribution Functions (Cont’d)
Exponential Density
( ) ( )
<≥==
−−
000...
xxexUexf
xx
λλ λλ
Rayleigh Density
< 00 x
( ) 2
2 2
2
.σ
σx
exxf−
=
Density/Distribution Functions (Cont’d)
Expected Value The most likelihood value
[ ] ( )∫∞
∞−
= dxxfxXE X.
Linear Operator
Function of a random variable Expectation
( )[ ] ( ) ( )∫∞
∞−
= dxxfxgXgE X.
[ ] [ ] bXEabXaE +=+ ..
Density/Distribution Functions (Cont’d)
PDF of a function of random variables Assume RV “Y” such that The inverse equation may have more
than one solution called
( )XgY =
( )YgX 1−=
nXXX ,...,, 21
PDF of “Y” can be obtained from PDF of “X” asfollows
( ) ( )( )
∑=
=
=n
i
xx
iXY
i
xgdxd
xfyf1
Density/Distribution Functions (Cont’d)
Cumulative Distribution Function (CDF) Both Continuous and Discrete Could be defined as the integration of PDF
( ) ( ) ( )≤== X xXPxFxCDF
( ) ( )∫∞−
=x
XX dxxfxF .
βX
( )XPDF
Density/Distribution Functions (Cont’d)
Some CDF properties Non-decreasing Right Continuous F( i fi it ) 0 F(-infinity) = 0 F(infinity) = 1
Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions Correlation Important Theorems
Joint/Conditional Distributions Joint Probability Functions
Density Distribution
Example I
( ) ( )
( )∫ ∫=
≤≤=x y
YX
YX
dydxyxf
yYandxXPyxF
,
,
,
,
Example I In a rolling fair dice experiment represent the
outcome as a 3-bit digital number “xyz”.
( )
==
==
==
==
=
..0
1;161
0;131
1;031
0;061
,,
WO
yx
yx
yx
yx
yxf YX
1106
01130102
xyz
→→→→→→
10151004
0011
∞− ∞−
Joint/Conditional Distributions (Cont’d)
Example II Two normal random variables
( ) ( )( ) ( ) ( )( )
−−−
−+−
−−
= yx
yx
y
y
x
x yxryxr
YX eyxfσσ
µµσ
µσ
µ.
2121
2
2
2
2
21,
What is “r” ?
Independent Events (Strong Axiom)
( )−yx
YXr
yfσσπ 2,
1...2
( ) ( ) ( )yfxfyxf YXYX .,, =
Joint/Conditional Distributions (Cont’d)
Obtaining one variable densitydensity functions( ) ( )
( ) ( )∫
∫∞
∞
∞−
=
=
dxyxfyf
dyyxfxf
YXY
YXX ,,
DistributionDistribution functions can be obtained justfrom the density functions. (How?)
( ) ( )∫∞−
= dxyxfyf YXY ,,
Joint/Conditional Distributions (Cont’d)
Conditional Density Function Probability of occurrence of an event if another
event is observed (we know what “Y” is).
( ) ( )yxf YX ,
Bayes’ Rule
( ) ( )( )yfyxf
yxfY
YXYX
,,=
( ) ( ) ( )
( ) ( )∫∞
∞−
=dxxfxyf
xfxyfyxf
XXY
XXYYX
.
.
Joint/Conditional Distributions (Cont’d)
Example I Rolling a fair dice
X : the outcome is an even number Y : the outcome is a prime number
Example II Joint normal (Gaussian) random variables
( ) ( )( ) 3
1
2161, ===
YPYXPYXP
( ) ( )
−×−
−−
−
−=
2
2121
21..2
1 y
y
x
x yrx
r
xYX e
ryxf
σµ
σµ
σπ
Joint/Conditional Distributions (Cont’d)
Conditional Distribution Function( ) ( )
( )∫=
=≤=
dxyxf
yYwhilexXPyxFx
YX
YX
Note that “y” is a constant during the integration.
( )
( )
( )∫
∫
∫
∞
∞−
∞−
∞−
=dtytf
dtytf
YX
x
YX
YX
,
,
,
,
Joint/Conditional Distributions (Cont’d)
Independent Random Variables
( ) ( )( )yfyxf
yxfY
YXYX =
,,
Remember! Independency is NOT heuristic.
( )( ) ( )
( )( )xf
yfyfxf
X
Y
YX
=
= .
Joint/Conditional Distributions (Cont’d)
PDF of a functions of joint random variables Assume that The inverse equation set has a set of
solutions
( )YXgVU ,),( =
( )VUgYX ,),( 1−=
( ) ( ) ( )nn YXYXYX ,,...,,,, 2211
∂∂
VU Define Jacobean matrix as follows
The joint PDF will be
=
∂
∂
∂
∂
∂∂
VY
UX
XXJ
( ) ( )( ) ( )( )∑
= =
=n
i yxyx
iiYXVU
iiJtdeterminanabsolute
yxfvuf
1 ,,
,,
.
,,
Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions Correlation Important Theorems
Correlation Knowing about a random variable “X”, how
much information will we gain about theother random variable “Y” ?
Shows linear similarityy
More formal:
Covariance is normalized correlation
( ) [ ]YXEYXCrr ., =
( ) ( )[ ] [ ] YXYX YXEYXEYXCov µµµµ ...),( −=−−=
Correlation (cont’d)
Variance Covariance of a random variable with itself
( ) ( )[ ]22XX XEXVar µσ −==
Relation between correlation and covariance
Standard Deviation Square root of variance
[ ] 222XXXE µσ +=
Correlation (cont’d)
Moments nth order moment of a random variable “X” is the
expected value of “Xn”( )nn XEM =
Normalized form
Mean is first moment Variance is second moment added by the
square of the mean
( )( )nXn XEM µ−=
Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions Correlation Important Theorems
Important Theorems Central limit theorem
Suppose i.i.d. (Independent Identically Distributed) RVs“Xk” with finite variances
Let ∑=n
nnn XaS .
PDF of “Sn” converges to a normal distribution asnn increases, regardless to the density of RVs.
Exception : Cauchy Distribution (Why?)
∑=i
nnn1
Important Theorems (cont’d)
Law of Large Numbers (Weak) For i.i.d. RVs “Xk”
∑nX
0Prlim 10 =
>−∀∑=
∞→> εµε Xi
i
n n
X
Important Theorems (cont’d)
Law of Large Numbers (Strong) For i.i.d. RVs “Xk”
∑nX
Why this definition is stronger than before?
1limPr 1 =
=
∑=
∞→ Xi
i
n n
Xµ
Important Theorems (cont’d)
Chebyshev’s Inequality Let “X” be a nonnegative RV Let “c” be a positive number
[ ]XEX 1P
Another form:
It could be rewritten for negative RVs. (How?)
[ ]XEc
cX 1Pr ≤>
2
2Pr
εσεµ X
XX ≤>−
Important Theorems (cont’d)
Schwarz Inequality For two RVs “X” and “Y” with finite second
moments[ ] [ ] [ ]222 E.E.E YXYX ≤
Equality holds in case of linear dependency.
[ ] [ ] [ ]
Next Lecture
Elements of StochasticElements of StochasticProcessesProcesses