Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class...

37
Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview

Transcript of Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class...

Page 1: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Probability and Statistics for Computer Engineer

What is model?Type of ModelsPurpose of the ClassCourse Overview

Page 2: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Model

• Model– Virtual system to explain phenomena or

behavior– Example

• Stock price and weather forecasting rule, Ohm’s law

• Types of Models– Deterministic v.s. Statistic(Stochastic)– Chaotic v.s. Non-chaotic- Deterministic Model

Differential Equations, Functions, Transform

Page 3: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Model

– Statistical Model• Not data but statistics(mean, variance, probability

density function)

• Uncertainty – Ambiguity due to lack of evidence

• Relative Frequency

– Vagueness inherent in language

• Probability– Mathematical model of relative frequency– Relative Frequency

Page 4: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Why we need to study?

• Purpose of Study– Tool for analyzing & understanding statistical

models

• Related Courses in Computer EngineeringStatistical Pattern Recognition and Machine LearningData MiningData CommunicationArtificial IntelligenceSimulation EngineeringStatistical Communication TheoryDigital Signal ProcessingImage Processing

Page 5: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Lecture Plan

• Text공학인증을 위한 확률과 통계 이재원외 카오스북

• Topics to be covered– Descriptive Statistics– Probability and Random Variables– Sample Distrribution– Statistical Estimation– Hypothesis Test

Page 6: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Lecture Plan

• Grading Policy– Exam I 25%, Exam II 25%, Exam III 25%– Home Work with Programming 15%– Presence 10%

Page 7: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Descriptive Statistics

• Graph for Data Analysis• Sample mean, Variance and Standard Deviation• Histogram and Cumulative Histogram• Measures of Central Tendency• Bivariate Data and Scatter Diagram (Plot)• Covariance and Correlation Coefficient• Uniform Random Number for Simulation

Page 8: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Graph for Data Analysis

• Data Table, Graph– Data:

• Summarized for some purpose and

– Graph• Histogram of frequency• Dispersion plot• Cumulative histogram of frequency

Page 9: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Graph for Data Analysis• Example: Sample weights of male student

• {65,67,64,66,63,….62}• Ascending OS = {53, 58, 60, 61, … 72}• Frequency Distribution

Class Range

Class Center (X)

Frequency(FR)

50.5-53.5 52 1

53.5-56.5 55 2

56.5-59.5 58 6

59.5-62.5 61 11

62.5-65.5 64 16

65.5-68.5 67 9

68.5-71.5 70 4

71.5-74.5 73 1

Page 10: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Graph for Data Analysis

– How to make frequency table (도수분포표 )• Number of classes ( 계급수 ): 6-20• Class interval ( 계급범위 )= [Range (Max. Data–

Min Data)/Number of class +1]

– Type of Graphs for Univariate • Histogram of frequency• Relative frequency = frequency/total number of data• Frequency polygon• Cumulative relative frequency polygon

Page 11: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Graph for Data Analysis

Page 12: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

How to calculate the sample mean?What does the sample mean stand for?Anything else for more precise description of the data ?

Page 13: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Sample Mean, Variance and Standard Deviation

• Example:– Height data of all the students in this class (Not

sample, but population)– Weights of sampled male students in CBNU

(Sample)

• (Sample) Mean of the data

– A representatives of the data– Simple but not enough description

n

ii

i

xn

x

nix

1

1

..., ,3 ,2 ,1 ,For

0)(

Residual

xnxxxd

xxd

iii

ii

Page 14: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Sample Mean, Variance and Standard Deviation

• Note : • Optimal in the sense of sum of squared residuals

• Sometimes it is poor: Outlier ( 외톨이 ) DataExample: 98 96 97 68 97

Mean = 91.2 Is it reasonable?

• Kinds of RepresentativesMedian of the data, Trimmed mean of the dataNeeds of the other representatives than mean

iii

i

xn

CxnCCxC

E

CxE

1or or 0)(2

)( 2

Page 15: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Sample Mean, Variance and Standard Deviation

• Sample Variance and Standard Deviation– Unit of standard deviation = Unit of data– A measure of dispersion of data– Variance with mean is still not enough to describe data.– Then how can the data be described completely?

)1(

)(1

1

)it! (Derive )1(

)(1

1

222

2222

nn

xxnxx

ns

nn

xxnxx

ns

iiix

iiix

large. very is when )(

1

)(1

2

222

2

2222

nn

xxnxx

ns

n

xxnxx

ns

iiix

iiix

Page 16: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Histogram and Cumulative Histogram

• Frequency/Cumulative Frequency

Score # of students(Frequency)

Cum no. (Cum. Freq.)

Relative Freq. Cum. Relative Freq.

0-9 2 2 0.02 0.02

10-19 3 5 0.03 0.05

20-29 5 10 0.05 0.10

30-39 7 17 0.07 0.17

40-49 8 25 0.08 0.25

50-59 16 41 0.16 0.41

60-69 25 66 0.25 0.66

70-79 17 83 0.17 0.83

80-89 12 95 0.12 0.95

90-99 5 100 0.05 1.00

Total 100 1.00

Page 17: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Histogram and Cumulative Histogram

Histogram Cumulative Histogram

The area of the histogram = 100

The area of the relative frequency = 1.00

Non-decreasing property of cumulative histogram

Probability is a mathematical model of relative frequency.

The most precise description of data : Density or Distribution

Page 18: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Population and Sample

• Population ( 모집단 )– 관심의 대상이 되는 모든 가능한 관측치나 측정값의

집단• 유한모집단 ( 선거인 ), 무한모집단 ( 자연수 공간 )

• Sample ( 표본 )– 일정기준에 의해 추출한 모집합의 부분집합

• 예 : 스마트 폰 공장의 불량검사– Population: 생산된 모든 스마트 폰– Sample: 임의로 추출된 일정 대수의 스마트 폰

Page 19: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Population and Sample

• Parameter( 파라메터 )– 모집단으로부터 얻어진 자료의 특성치 또는

요약치– 예 : 모평균 ( ), 모분산 ( ), 모표준편차 ( )

• Statistics( 통계치 또는 통계량 )– 표본의 특성이나 성격을 나타내는 수치– 예 : 표본평균 ( ), 표본분산 ( ),

표본표준편차 (s), 최빈수 (mode)

2

X 2s

Page 20: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Population and Sample

• Summary

(populti모집단

on)

표본(sample)

비고

크기 (size) N n

평균 (mean)

분산variance)

표준편차 S.D.) s

2

2s

X )(XE22 )( sE

Page 21: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Measures of Central Tendency

• Arithmetic mean ( 산술평균 )– Geometric mean ( 기하평균 )– Harmonic mean ( 조화평균 )

• Median ( 중위수 )• Mode ( 최빈수 )• Weighted average ( 가중평균 )• Winsored mean

Page 22: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Arithmetic Mean ( 산술평균 )

• Mean in frequency distribution– Freq. in population– Sample freq.– Class center of population– Class center of population– Number of classes

Lfff ,...,, 21

lfff ,...,, 21

Lxxx ,...,, 21

lxxx ,...,, 21

L

i

L

iiiiw fxf

1 1

/

l

i

l

iiiiw fxfx

1 1

/

L l

Remember these equation for understanding the expected value.

Page 23: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Arithmetic Mean

• Example: Number of responsible family members of a worker

number class center Freq.

0-2 1 3

3-5 4 26

6-8 7 23

9-11 10 1

25.5123263

10172342613

w

Page 24: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Arithmetic Mean

• Features of arithmetic mean– The simplest representative– Good estimate of central tendency– Optimal with respect to mean squared error

– Center of the range in symmetric distribution– Sensitive to outlier

N

ii

N

ii

C

xN

CxN 1

2

1

2 )(1

)(1

min

Page 25: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Median ( 중위수 )

• Median, Center value after sorting the magnitude

Example Med {3, 4, 10, 9} = (4+9)/2 = 6.5

P = {50,75,60,55,70,200,55,55} Arithmetic mean = 77.5 Median = (55+60)/2 = 57.5

Which one is better for central tendency? Outlier = 200

eM

}{ NN-121 ,X,...,X,XXP

2/)1(odd, is If Ne XMN

2/)(even, is If 1)2/(2/ NNe XXMN

Page 26: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Mode ( 최빈수 )

• Mode,The value that has the maximum freq.

Position of concentration in freq.In symmetric distributionIn single-mode asymmeric distribution

Example: Mode(2,3,2,1,4) = 2, Mode(5,6,7,8) = NoneMode(9,5,4,8,9,8) = 8 or 9

oM

oe MMM

)(3 eo MMMM

Page 27: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Mode

• Example 75,72,70 oe MMM

Page 28: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Weighted Mean ( 가중평균 )

• Data and weight• Weighted Mean

• Example: 영어 (4 학점 ,C(2 점 )), 통계학 (3 학점 ,A(4 점 )), 체육 (1 학

점 ,A(4 점 )) Weighted Mean = (4x2 + 3x4 + 1x4)/(4+3+1) =

3(B)

)},),...(,(),,{( 2211 nn WXWXWX

n

ii

n

iii

W

XW

1

1Mean Weighted

Page 29: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Winsored Mean

• Winsored Mean– Sort the data in order, subtutute the

data less than ¼-th order into ¼-th data, and the data greater than ¾-th order into ¾-th data, and take the average

– Example: S = {5,6,7,8,9,11,13} Winsored data = {6,6,7,8,9,11,11}

Winsored Mean= Sum of Winsored data/n=58/7

Page 30: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Bivatiate Data and Scatter Diagram

• Scatter Diagram(Plot) for Multivariate Data

– Something to be considered• Density: No. of data in an unit volume• Relation between variables:

– Regression Analysis– Correlations between variables

Page 31: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Covariance and Correlation Coefficient

• Covariance and correlation Coefficient

– Properties

• Factor Analysis

)1())((

1

1

nn

yxyxnyyxx

nc iiii

iixy

x

y

deviations standardby Normalized :yx

xyxy ss

cr

duncorelate :0

ncorrelatio negative:0

ncorrelatio positive :0

10

xy

xy

xy

xy

r

r

r

r

Page 32: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Covariance and Correlation Coefficient

• Just thinking about– 2-D or more dimensional (accumulated) histogram

• Linear Regression

iix

iiii

ii

iiii

iiii

iiiiiii

ii

xn

yn

xysnn

yxyxn

xxn

yxyxn

ynxxyn

yxxxxxyn

-

xyn-

xy

11

)1()(

. gives 0)(1

2

gives 0)(1

2

:Solution

.)(1

1 minimizethat

equation linear theFind

222

2

2

Page 33: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Uniform Random Number

• Examples– Histogram of fair die or coin

– Note: • Cumulated histogram of the fair die• Law of Large Number• Random number with any distribution can be generated

from uniform random number.

0

2000

4000

6000

8000

10000

12000

1 2 3 4 5 6

1계열

Page 34: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Uniform Random Number

Page 35: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Uniform Random Number

Page 36: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Uniform Random Number

Page 37: Probability and Statistics for Computer Engineer What is model? Type of Models Purpose of the Class Course Overview.

Homework #1

• Matlab Installation• Calculation of

– Sample Mean, Variance and Standard Deviation– Linear Regression– Covariance and Correlation Coefficients

• Program– Generate uniform random number– Making a fair die– Experiment and count the frequency– Draw the histogram and cumulative histogram