Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
-
Upload
clement-davidson -
Category
Documents
-
view
214 -
download
0
Transcript of Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Short Resume of Statistical Terms
Fall 2013
By Yaohang Li, Ph.D.
Review• Last Class
– Introduction to Monte Carlo• This Class
– Important Statistics Terms• Random Events
– Independence of Random Events– Axioms on Random Events
• Random Variables– Independence of Random Variables
• CDF• PDF• Expectation
– Characteristics of Expectation
• Moments of a Distribution– rth moment– rth central moment
• Mean• Variance• Standard Deviation• Covariance
– Characteristics of covariance
• Review of Statistics and Probability Terms• Important Distribution• Central Limit Theorem• Estimand and Estimator
• Next Class– Monte Carlo for Integration
Random Events and Probability• Random Event
– An event which has a chance of happening
• Probability– A numerical measure of that chance– Lying between 0 and 1, both inclusive
• Terminology– P(A)
• The probability that an event A occurs– P(A+B+…)
• The probability that at least one of the events A, B, … occurs– P(AB…)
• The probability that all the events A, B, … occur– P(A|B)
• The probability that the event A occurs when it known that the event B occurs• Conditional probability of A given B
Axioms in Probability
• P(A+B+…)P(A)+P(B)+…– If only one of the events A, B, … can occur, they are called
exclusive. The equality holds
– If at least one of the events A, B, … must occur, they are called exhaustive. P(A+B+…)=1
• P(AB)=P(A|B)P(B)– If P(A|B)=P(A), A and B are independent
• The chance of A occurring is uninfluenced by the occurrence of B
Random Variables and Distributions
• Random variable ()– A number to characterize a set of exclusive and exhaustive
events
• Cumulative Distribution Function (CDF)– F(y)=P( y)
– The probability that the event which occurs has a value not exceeding a prescribed y
– F(+)=1 and F(-)=1
– F(y) is a non-decreasing function of y
Expectation• If g() is a function of , the expectation (or mean value) of g is
denoted and defined by
– Stieltjes integral– The integral is taken over all values of y
• Explanation– Continuous random events
• F(y) is continuous and f(y) is a derivative
– Discrete random events• F(y) is a step function and fi is the step of height at the points of yi
• Probability Density Function (pdf)– f(y) and yi are the probability density functions
)()()( ydFygEg
dyyfygEg )()()(
i
ii fygEg )()(
More on Expectation
• The statistical physicist uses another notation for expectation– Suppose pi is the probability density function
• How about if g(x) is a constant function?
Linear Combination of the Expectation Values
Multi-dimensional Distribution• Multi-dimensional Random Variable
– Represented used a vector
• Multi-dimensional CDF– F(y)=P( y)
y means that each coordinate of is not greater than the corresponding coordinate of y
• Expectation
– Continuous multidimensional events
• where
)()()( yyη dFgEg
yyyη dfgEg )()()(
k
kk
k yyy
yyyFyyyff
...
),...,,(),...,,()(
21
2121y
Independence of Random Variables
• Consider a set of exhaustive and exclusive events, each characterized by a pair of numbers and , for which F(y,z) is the distribution. G(y) is an CDF for and H(z) is an CDF for .– F(y,z) = P( y, z)
– G(y) = P( y)
– H(z) = P( z)
• If it so happens that– F(y,z)=G(y)H(z) for all y and z
– the random variables and are called independent
Characteristics of Expectations
• Hold regardless whether or not the random variables i
are independent or not
• Hold only i are mutual independent
i i
iiii gEEg )()(
i
iii
ii gEEg )()(
Moments of Distribution• rth moment of a distribution
– E(r)
• Principle moment = E()
• rth central moment r= E{(- )r}
• Most important moments = E(), known as the mean of
• Measure of location of a random variable 2, known as the variance of (usually used abbreviation of “var”)
• Measure of dispersion about the mean– standard deviation
– coefficients of variation /
2
Covariance
• Definition of covariance (usually abbreviation of cov)– If and are random variables with means and v,
respectively, the quantity E{(- )(-v)} is called the covariance of and
– If and are independent, the covariance is 0
• Why?
– Also, cov(, )=var()
• Why?
Important Formula of Covariance
k
iji
k
i
k
ji
1 1 1
),cov()var(
Correlation Coefficient
• Definition
– Always between +1 and -1
– If =0, they are not correlated
– If <0, they are negatively correlated
– If >0, they are positively correlated
varvar/),cov(
Important Distributions
• Uniform Distribution• Exponential Distribution• Binomial Distribution• Poison Distribution• Normal Distribution
Uniform Distribution• Uniform Distribution (Rectangle Distribution)
– A distribution has constant probability
– Mean?
– Variance?
Exponential Distribution• Exponential Distribution
– mean 1/– variance 1/ 2
Binomial Distribution• Binomial Distribution
– Discrete probability distribution Pp(n|N) of obtaining exactly n successes out of N Bernoulli trials
– Each Bernoulli trial is true with probability p and false with probability q=1-p
= =
Poisson Distribution• Poisson Distribution
– The limit of the Binomial Distribution
– Mean is v
– Variance is v
!)(lim)(
n
evnPnP
vn
BN
v
Normal Distribution• Normal Distribution (Gaussian Distribution)
– Bell curve
– De Moivre developed the normal distribution as an approximation to the binomial distribution
Normal Distribution in Data Analysis• 68.26% of the data will be found within one SD either side of the mean
(±1SD) 95.44% of the data will be found within two SD either side of the
mean(±2SD) 99.74% of the data will be found within three SD either side of the mean
(±3SD)
Central Limit Theorem
• Central Limit Theorem– The sum of n independent random variables has an
approximately normal distribution when n is large
• Random variables conform to arbitrary distribution
Central Limit Theorem in Practice
• In practice– n = 10 is reasonably large number
– n = 25 is rather large (effective infinite)
Estimation• Monte Carlo Computation
– Goal: estimating the unknown numerical value of some parameter of some distribution• The parameter is called an estimand
• Sample• The available data (may consist of a number of observed random variables)• The number of observations in the sample is called the sample size
• Estimand– mean
• (1+ 2+…+ n)/n– weighted average
• (w11+w22+…+wnn)/(w1+w2+…+wn)• May be a better estimator
• Connection between the sample and the estimand– The estimand is a parameter of the distribution of the random variables constituting the
sample
Sampling Distribution• Parent Distribution
– We can represent the sample by a vector with coordinates 1, 2, 3,…, n
– The distribution of 1, 2, 3,…, n is called the Parent Distribution– To estimate the estimand (a parameter of the Parent Distribution), we use
some function t()• t is an estimator
• Sampling Distribution is a random variable, so is t()
• if we repeated the experiment, we should expect to get a different value of
– Since varies from experiment, t() has a distribution, called sampling distribution
– If t() is to be close to , then the sampling distribution ought to be closely concentrated around
Measuring Sampling Distribution
• The bias of t– The difference between and the average value of t() =E{t()-}
– t is an unbiased estimator if =0
• The sampling variance of t 2t=var{t()}=E{[t()-Et()]2}=E{[t- - ]2}
• If and 2t are small, t is a good estimator
Important Estimators
• Mean of the parent distribution
– standard error
• Variance of the parent distribution
– standard error
nn /)...( 21
n/
)1/()...(222
22
12 nns n
ns
5.0/22
Efficiency
• Goal of Monte Carlo Work– Obtain a respectably small standard error in the final result
– More random samples can lead to better accuracy
• Not very rewarding
– Variance Reduction Method
Summary• Important Statistics Terms
– Random Events• Independence of Random Events• Axioms on Random Events
– Random Variables• Independence of Random Variables
– CDF– PDF– Expectation
• Characteristics of Expectation– Moments of a Distribution
• rth moment• rth central moment
– Mean– Variance– Standard Deviation– Covariance
• Characteristics of covariance– Correlation Coefficient
Summary (Cont.)• Important Distributions
– Uniform Distribution– Exponential Distribution– Binomial Distribution– Poison Distribution– Normal Distribution
• Estimation– Sample– Estimand– Parent Distribution– Sampling Distribution– Estimator
• Important estimators– Buffon’s Needle
What I want you to do?
• Review Slides• Review basic probability/statistics concepts• Work on your Assignment 1