Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive...

41
Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference) # Classification of the field of statistics i) Sampling theory ii) Estimation theory iii) Hypothesis testing iv) Curve fitting or Regression v) Analysis of variance

Transcript of Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive...

Page 1: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Chapter 4. Elements of Statistics

# brief introduction to some concepts of statistics

# descriptive statistics inductive statistics(statistical inference)

# Classification of the field of statisticsi) Sampling theoryii) Estimation theoryiii) Hypothesis testingiv) Curve fitting or Regressionv) Analysis of variance

Page 2: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.2 Sampling Theory–the Sample MeanHow many samples are required

for a given degree of confidence in the result?

# Terminology

- population

N(size of population) very large or ∞

- (random) sample

n(size of sample)

# one of the most important quantities is the sample mean

How close the sample mean might be

to the average value of the population?

Page 3: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Let the sample have the numerical value of x1, x2, … xn

Then, the sample mean is given by

Note that we are interested in the statistical properties of

arbitrary random samples rather than any particular sample.

That is, the sample mean becomes a random variable.

Therefore, it is appropriate to denote the sample mean as

n

i

xin

x1

1

n

i

Xin

x1

1

Page 4: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

We want the mean value of the sample mean

close to the true mean value of the population

the mean value of the sample mean

= the true mean value of the population

The sample mean is a unbiased estimate of the true mean.

But, this is not sufficient to indicate whether the sample mean is a good estimator of the true population mean.

n

i

n

iiXEn

Xin

EXE1 1

][1

]1

[]ˆ[

XXnn

1

X

Page 5: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

The variance of the sample mean 은 ?

N n ≫ 이라 가정 (population 의 특성이 sampling 중에 변하지 않는다 .)

Var mean

square of - square of the mean

n

i

n

jX

nXiXjEX

1 1

2

2 ]1

[)ˆ(

Page 6: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

가정 : statisticallyindep.

따라서 Var

(!)

n

i

n

jX

nXiXjE

1 1

2

2 ][1

XjXi& ji XXiXjE

2][ ji

X 2 ji

nn

nnX

XX

XXnXn

222

2222

2 ])([1ˆ

Page 7: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Where is the true variance of the population As n => ∞, Variance => 0,

Which means that large sample sizes lead to a better estimate

* 참고 : 1)N 이 크지 않을 때 N 이 클 때와 같은 효과를 얻을 수 있는 방법 “sampling with replacement”

2

Page 8: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

2)N 이 작고 replace 할 수 없을 때는Var

N->∞ 앞식으로 수렴N = n 일때는 0 ( 당연 !)

`Two examples : 교재 pp163 ~165 참조

)1

(ˆ2

N

nN

nX

Page 9: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.3 Sampling Theory – The sample Variance

The population variance is needed for determiningthe sample size required to achieve a desired varianceof the sample mean (see eq. 4-4)

Definition(Sample Variance):

The expected value of the sample variance

can be derived easily using

not the true variance , that is, a biased estimate rather than an unbiased one

n

iXXS in 1

22 ˆ1

22 1][

n

nE S

n

j

Xjn

X1

2

2

Page 10: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Now, we redefine the sample variance for having an unbia

sed estimate of the population variance :

Note that these hold for very large N, that is, N=∞.

How about when the population size is not large?

n

iXX

SS

in

n

n

1

2

22

ˆ

~

1

1

1

Page 11: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

# When N is not large, the expected value of S2 is given by

For obtaining an unbiased estimate, we redefine

# The variance of the estimates of the variance :

the variance of S2 :

the variance of :

where is the 4th central moment of the population

22 1

1][

n

n

N

NE S

SS n

n

N

N 22

1

1~

1 2)4( 42~

n

nVar S

n

Var S 4

42

S~2

][4

4 XXE

Page 12: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.4 Sampling Distributions & Confidence Intervalswhat is the probability that the estimates are within specified bounds?

p,d,f 를 알아야 함2 가지 종류 , 그리고 sample mean 에 대해서만 !

normalized sample mean Xi 가 Gaussian and independent 일때

=> Gaussian (0,1)

n

XXZ

ˆ

Page 13: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Xi 가 not Gaussian 이더라도 n=>∞ 이면Z 는 asymptotically Gaussian by the

central limit theorem(n 은 보통 n≥30 은 되어야 함 ; A rule of

thumb)

H.W) Solve the problems in chap.4;4-2.1, 4-2.5, 4-3.1, 4-4.1, 4-5.1, 4-6.1

Page 14: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

를 모를 때 대신에 로 대치그러나

No longer Gaussian =>”Student’s t distribution” with n-1 d.of f.

그림 p170 그림 4-2 참조

S~

1

ˆ~ˆ

nS

XX

nS

XXT

Page 15: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

`pdf of student’s t distribution

Where the gamma heavier tails (n ≥30) n 의 유사 any

= ! integer

1n

2

1)1(

)2

(1

)2

1(

)(2

tf

Tt

T

(.);)1(

)()1( kkk kk k

Page 16: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

( 당연히 )confidence interval 이란 ?

interval estimate ( 어떤 확률을 가지고 구간 내에 존재하는 가를 따짐 )q- percent confidence interval (q/100 의 확률을 갖고 ) 신뢰도

)2

1(,1)2()1( p

n

kXX

n

kX

ˆ

Page 17: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 여 기 서 k 는 q 와 의 pdf 에 의존하는 상수임 .

• k 의 구체적인 값은 p.172 표 .4-1 참조 .

• (q 가 클수록 k 가 커짐 )

kx

kx xdxxfq )(100 ˆ

Page 18: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 예 ) q=95% -> • 가 이 구간에 놓일 확률은 0.95 이다 .• 구간이 작을수록 확률이 적어짐• (q=99% 인 경우는 가 동일 구간이 넓어지나 추정에 필요한 정보 효용성은 떨어짐 !)

196.10ˆ804.9 x

Page 19: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 참고 : q from PDF

• 여기서 F 는 Prob. Distribution for Student’s + function

• (See Appendix F or Table 4-2 page 172 for v = 8 )

)()(100 ˆˆ kxFkxFqxx

Page 20: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.5 Hypothesis Testing

• The question arises; How does one decide to accept or reject a given hypothesis when the sample size and the confidence level are specified?

Page 21: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Two steps; i) to make some hypothesis about the population

• ii) to determine if the observed sample confirms or rejects this hypothesis.

Page 22: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Two tests; one-sided or two-sided.

The average life time of the light bulb >= 1000 hours

100ohms resisters too high or too low

Page 23: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

One-sided test 경우예 ) A capacitor manufacturer claims

that a mean value of breakdown voltage >= 300 V

• a sample of 100 capacitors– >

• 99% confidence level is used• 문 ) Is the manufacturer’s claim valid?• 답 ) We would reject the hypothesis!

)40,400()~,ˆ( 22 VVsx

Page 24: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Normalized r, v, Z

그런데 99% 의 신뢰수준은

5.2100/40

300290

/

n

Xxz

cz cZZ zdzzfzF 99.0)(1)()(

5.233.2 cz

Vx 300Vs 40~

- 2.5 - 2.33

Page 25: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 만약 99.5% 신뢰수준이라면– accept the hypothesis

• 신뢰수준이 낮을수록 구간이 좁아지고 가설을 받아들이기에 less likely

• 즉 more severe requirement 제시• 이것은 의미상 모순적으로 느껴짐

5.2575.2 cz

Page 26: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 이제 유의 수준 (level of significance)으로 재정의하자

• 즉 (100% - 신뢰수준 )• 유의수준이 클수록 more severe!

Page 27: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 예 ) 계속 sample size=9, • no longer Gaussian -> Student’s + distributi

on

• v=n-1=8 dof• 신뢰수준 99%,

– accept the hypothesis

)40,290( 2

75.0/~

ns

Xxt

75.0896.2 ct

Page 28: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• a small sample size 는 t 를 증가시키고

• heavier tail 을 가지고 있는 t distribution 을 를 감소

more likely to exceed the critical valuesmall size less reliable(less severe) than

large size tests

Page 29: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

Two-sided test 경우• 예 ) A manufacture of Zener diodes clai

ms that the true mean breakdown voltage = 10V

• 문 ) hypothesis : the true accepts or rejects?

• 100 samples ->• 95% 신뢰수준

)2.1,3.10( 2VV

Page 30: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 답 ) Rejected!

• z is outside the interval,

5.2100/2.1

103.10

/

n

Xxz

96.196.1 z

Page 31: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 문 ) 계속 9 samples

t is inside the interval,

• accepted!– Less severe than a large sample test

75.010/2.1

103.10

/~

ns

Xxt

306.2306.2 t

)2.1,3.10( 2VV

2.5% 2.5%

95%tc=2.306

Page 32: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.6 Curve Fitting and Linear Regression

• 변수들간의 ( 독립변수와 종속변수 ) 간의 함 수 관 계 를 자 료 를 매 개 체 로 하 여 통계적으로 찾아보는 분석방법 즉 , x 와 y의 관련성을 적절한 회귀방정식을 찾아 알아 보려함 .

• 대개 1 차식 (linear) or 2 차식• 반면 다음 절의 상관분석 (correlation analys

is) 는 x 와 y 의 관련성을 상관계수를 구하여 알아 보려함 .

Page 33: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 용어– Scatter diagram ( 산점도 ) data 도시

- n samples

nn yyyxxx ,,,,,, 2121

Page 34: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

- Curve fitting to find a mathematical relationship regression curve (equation) ; resulting curve

Page 35: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

- What is the “best” fit? In a least squares sense

– Let be the errors between the regression curve and the scatter diagram

– 이것을 minimum 으로 하는 미지계수를 정하는 문제임 .

– 먼저 the type of equation to be fitted to the data 를 정하고 미지계수 수가 n 보다 훨씬 작게하면 smoothing 효과 얻음

222

21 n

i

2cxbxay

Page 36: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Linear regression

• 이 최소가

되도록하는 a, b 는 ?

bxay

n

iii bxayJ

1

2)(

Page 37: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• 해 )

• 연립방정식을 풀면

n

i

n

iii xbany

a

J

1 10

n

i

n

ii

n

iiii xbxayx

b

J

1 1

2

10

Page 38: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

2

11

2

111

n

ii

n

ii

n

ii

n

ii

n

iii

xxn

yxyxnb

n

xbya

n

ii

n

ii

11

MATLAB in function, p = polyfit(y, x, n)

Page 39: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• A second-order regression ( 교 재 p.180, 표 4-3, 그림 4-6)

0500.4266540.00334.0 2 TTvB

Page 40: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

4.7 Correlation between Two Sets of Data

• Two data sets correlated or not?

nxxx ,,, 21

n

iixn

x1

1

nyyy ,,, 21

n

iiyn

y1

1

Page 41: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)

• Linear correlation coefficient“ Pearson’s r ”

Usage ; useful in determining the sources of errors예 ) a point-to-point digital communication link

BER(Bit Error Rate) 로 이 link 의 quality 판단BER may fluctuate randomly due to wind

문 ) error source 는 wind 인가 ?wind 속도 20 개 측정치와 resulting BER 과의 correlation test → r=0.891 충분히 크므로 yes!

1r

Gaussianelyapproximat500)( large;randomalso

)()(

))((

1

2

1

2

1

rnr

yyxx

yyxxr

n

ii

n

ii

n

iii