Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

32
Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Transcript of Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Page 1: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Short Resume of Statistical Terms

Fall 2012

By Yaohang Li, Ph.D.

Page 2: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Review• Last Class

– Introduction to Monte Carlo• This Class

– Important Statistics Terms• Random Events

– Independence of Random Events– Axioms on Random Events

• Random Variables– Independence of Random Variables

• CDF• PDF• Expectation

– Characteristics of Expectation

• Moments of a Distribution– rth moment– rth central moment

• Mean• Variance• Standard Deviation• Covariance

– Characteristics of covariance

• Review of Statistics and Probability Terms• Important Distribution• Central Limit Theorem• Estimand and Estimator

• Next Class– Monte Carlo for Integration

Page 3: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Random Events and Probability• Random Event

– An event which has a chance of happening

• Probability– A numerical measure of that chance– Lying between 0 and 1, both inclusive

• Terminology– P(A)

• The probability that an event A occurs– P(A+B+…)

• The probability that at least one of the events A, B, … occurs– P(AB…)

• The probability that all the events A, B, … occur– P(A|B)

• The probability that the event A occurs when it known that the event B occurs• Conditional probability of A given B

Page 4: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Axioms in Probability

• P(A+B+…)P(A)+P(B)+…– If only one of the events A, B, … can occur, they are called

exclusive. The equality holds

– If at least one of the events A, B, … must occur, they are called exhaustive. P(A+B+…)=1

• P(AB)=P(A|B)P(B)– If P(A|B)=P(A), A and B are independent

• The chance of A occurring is uninfluenced by the occurrence of B

Page 5: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Random Variables and Distributions

• Random variable ()– A number to characterize a set of exclusive and exhaustive

events

• Cumulative Distribution Function (CDF)– F(y)=P( y)

– The probability that the event which occurs has a value not exceeding a prescribed y

– F(+)=1 and F(-)=1

– F(y) is a non-decreasing function of y

Page 6: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Expectation• If g() is a function of , the expectation (or mean value) of g is

denoted and defined by

– Stieltjes integral– The integral is taken over all values of y

• Explanation– Continuous random events

• F(y) is continuous and f(y) is a derivative

– Discrete random events• F(y) is a step function and fi is the step of height at the points of yi

• Probability Density Function (pdf)– f(y) and yi are the probability density functions

)()()( ydFygEg

dyyfygEg )()()(

i

ii fygEg )()(

Page 7: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

More on Expectation

• The statistical physicist uses another notation for expectation– Suppose pi is the probability density function

• How about if g(x) is a constant function?

Page 8: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Linear Combination of the Expectation Values

Page 9: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Multi-dimensional Distribution• Multi-dimensional Random Variable

– Represented used a vector

• Multi-dimensional CDF– F(y)=P( y)

y means that each coordinate of is not greater than the corresponding coordinate of y

• Expectation

– Continuous multidimensional events

• where

)()()( yyη dFgEg

yyyη dfgEg )()()(

k

kk

k yyy

yyyFyyyff

...

),...,,(),...,,()(

21

2121y

Page 10: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Independence of Random Variables

• Consider a set of exhaustive and exclusive events, each characterized by a pair of numbers and , for which F(y,z) is the distribution. G(y) is an CDF for and H(z) is an CDF for .– F(y,z) = P( y, z)

– G(y) = P( y)

– H(z) = P( z)

• If it so happens that– F(y,z)=G(y)H(z) for all y and z

– the random variables and are called independent

Page 11: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Characteristics of Expectations

• Hold regardless whether or not the random variables i

are independent or not

• Hold only i are mutual independent

i i

iiii gEEg )()(

i

iii

ii gEEg )()(

Page 12: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Moments of Distribution• rth moment of a distribution

– E(r)

• Principle moment = E()

• rth central moment r= E{(- )r}

• Most important moments = E(), known as the mean of

• Measure of location of a random variable 2, known as the variance of (usually used abbreviation of “var”)

• Measure of dispersion about the mean– standard deviation

– coefficients of variation /

2

Page 13: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Covariance

• Definition of covariance (usually abbreviation of cov)– If and are random variables with means and v,

respectively, the quantity E{(- )(-v)} is called the covariance of and

– If and are independent, the covariance is 0

• Why?

– Also, cov(, )=var()

• Why?

Page 14: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Important Formula of Covariance

k

iji

k

i

k

ji

1 1 1

),cov()var(

Page 15: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Correlation Coefficient

• Definition

– Always between +1 and -1

– If =0, they are not correlated

– If <0, they are negatively correlated

– If >0, they are positively correlated

varvar/),cov(

Page 16: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Important Distributions

• Uniform Distribution• Exponential Distribution• Binomial Distribution• Poison Distribution• Normal Distribution

Page 17: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Uniform Distribution• Uniform Distribution (Rectangle Distribution)

– A distribution has constant probability

– Mean?

– Variance?

Page 18: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Exponential Distribution• Exponential Distribution

– mean 1/– variance 1/ 2

Page 19: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Binomial Distribution• Binomial Distribution

– Discrete probability distribution Pp(n|N) of obtaining exactly n successes out of N Bernoulli trials

– Each Bernoulli trial is true with probability p and false with probability q=1-p

= =

Page 20: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Poisson Distribution• Poisson Distribution

– The limit of the Binomial Distribution

– Mean is v

– Variance is v

!)(lim)(

n

evnPnP

vn

BN

v

Page 21: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Normal Distribution• Normal Distribution (Gaussian Distribution)

– Bell curve

– De Moivre developed the normal distribution as an approximation to the binomial distribution

Page 22: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Normal Distribution in Data Analysis• 68.26% of the data will be found within one SD either side of the mean

(±1SD)   95.44% of the data will be found within two SD either side of the

mean(±2SD)   99.74% of the data will be found within three SD either side of the mean

(±3SD)

Page 23: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Central Limit Theorem

• Central Limit Theorem– The sum of n independent random variables has an

approximately normal distribution when n is large

• Random variables conform to arbitrary distribution

Page 24: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Central Limit Theorem in Practice

• In practice– n = 10 is reasonably large number

– n = 25 is rather large (effective infinite)

Page 25: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Estimation• Monte Carlo Computation

– Goal: estimating the unknown numerical value of some parameter of some distribution• The parameter is called an estimand

• Sample• The available data (may consist of a number of observed random variables)• The number of observations in the sample is called the sample size

• Estimand– mean

• (1+ 2+…+ n)/n– weighted average

• (w11+w22+…+wnn)/(w1+w2+…+wn)• May be a better estimator

• Connection between the sample and the estimand– The estimand is a parameter of the distribution of the random variables constituting the

sample

Page 26: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Sampling Distribution• Parent Distribution

– We can represent the sample by a vector with coordinates 1, 2, 3,…, n

– The distribution of 1, 2, 3,…, n is called the Parent Distribution– To estimate the estimand (a parameter of the Parent Distribution), we use

some function t()• t is an estimator

• Sampling Distribution is a random variable, so is t()

• if we repeated the experiment, we should expect to get a different value of

– Since varies from experiment, t() has a distribution, called sampling distribution

– If t() is to be close to , then the sampling distribution ought to be closely concentrated around

Page 27: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Measuring Sampling Distribution

• The bias of t– The difference between and the average value of t() =E{t()-}

– t is an unbiased estimator if =0

• The sampling variance of t 2t=var{t()}=E{[t()-Et()]2}=E{[t- - ]2}

• If and 2t are small, t is a good estimator

Page 28: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Important Estimators

• Mean of the parent distribution

– standard error

• Variance of the parent distribution

– standard error

nn /)...( 21

n/

)1/()...(222

22

12 nns n

ns

5.0/22

Page 29: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Efficiency

• Goal of Monte Carlo Work– Obtain a respectably small standard error in the final result

– More random samples can lead to better accuracy

• Not very rewarding

– Variance Reduction Method

Page 30: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Summary• Important Statistics Terms

– Random Events• Independence of Random Events• Axioms on Random Events

– Random Variables• Independence of Random Variables

– CDF– PDF– Expectation

• Characteristics of Expectation– Moments of a Distribution

• rth moment• rth central moment

– Mean– Variance– Standard Deviation– Covariance

• Characteristics of covariance– Correlation Coefficient

Page 31: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Summary (Cont.)• Important Distributions

– Uniform Distribution– Exponential Distribution– Binomial Distribution– Poison Distribution– Normal Distribution

• Estimation– Sample– Estimand– Parent Distribution– Sampling Distribution– Estimator

• Important estimators– Buffon’s Needle

Page 32: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

What I want you to do?

• Review Slides• Review basic probability/statistics concepts• Work on your Assignment 1