Download - Introduction to Probability and Statistics Ihomes.ieu.edu.tr/~ytutuncu/Math211_2010_Lecture5.pdf · ... (x,y) = 0 x and y are independent Interpreting Covariance. Coefficient of Correlation

Lecture 5

Chebyshev’s Theorem

and

Exercises

Introduction to Probability and

Statistics I

Cruise agency – number of weekly specials to the

Caribbean: 20, 73, 75, 80, 82

Review Example

Compute the mean, median and

mode and interpret your

results?

Review Example:Summary Statistics

Mean:

Median: middlemost observation = 75

Mode: no unique mode exists

33066

5

ixx

n

The median best describes the data due to the

presence of the outlier of 20. This skews the

distribution to the left. The manager should first check

to see if the value ‘20’ is correct.

20, 73, 75, 80, 82

common stocks

4 14.3 19 -14.7 -26.5 37.2 23.8

treasury bills

6.5 4.4 3.8 6.9 8 5.8 5.1

Review Example

57.128.16

7

i

stocks

x

N

40.5025.786

7

i

Tbills

x

N

The mean annual % return on stocks is higher than the

return for U.S. Treasury bills

common stocks

4 14.3 19 -14.7 -26.5 37.2 23.8

treasury bills

6.5 4.4 3.8 6.9 8 5.8 5.1

Review Example

2

2( )i

stocks

x

N

2 2 2 2 2 2 2(4.0 8.16) (14.3 8.16) (19 8.16) ( 14.7 8.16) ( 26.5 8.16) (37.2 8.16) (23.8 8.16)

7

= 20.648

2

2( )i

Tbills

x

N

2 2 2 2 2 2 2(6.5 5.8) (4.4 5.8) (3.8 5.8) (6.9 5.8) (8.0 5.8) (5.8 5.8) (5.1 5.8)

7

=1.362

The variability of the U.S. Treasury bills is much smaller than the return on stocks.

For any population with mean μ and

standard deviation σ , and k > 1 , the

percentage of observations that fall within

the interval

[μ + kσ]Is at least


)]%(1/k100[1 2

Regardless of how the data are distributed,

at least (1 - 1/k2) of the values will fall

within k standard deviations of the mean

(for k > 1)

Examples:

(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)

(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)

(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)


withinAt least

(continued)

If the data distribution is bell-shaped, then

the interval:

contains about 68% of the values in

the population or the sample

The Empirical Rule

1σμ

μ

68%

1σμ

contains about 95% of the values in

the population or the sample

contains about 99.7% of the values

in the population or the sample

The Empirical Rule

2σμ

3σμ

3σμ

99.7%95%

2σμ

Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Can be used to compare two or more sets of

data measured in different units

100%x

sCV

A random sample of data has Mean = 75, variance

= 25.

Use Chebychev’s theorem to determine the

percent of observations between 65 and 85.

If the data are mounded use the emprical rule to

find the approximate percent of observations

between 65 and 85.

Review Example

A random sample of data has Mean = 75, variance

= 25.

Use Chebychev’s theorem. +/- 2 standard

deviations:

proportion must be at least

= = at least 75%

Review Example

2100[1 (1/ )]%k 2100[1 (1/ 2 )]%

The empirical rule. +/- 2 standard deviations:

Approximately 95% of the observations are within 2

standard deviations from the mean

Comparing Coefficient of Variation

Stock A:

Average price last year = $50

Standard deviation = $5

Stock B:

Average price last year = $100

Standard deviation = $5

Both stocks

have the same

standard

deviation, but

stock B is less

variable relative

to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Weighted Mean

The weighted mean of a set of data is

Where wi is the weight of the ith observation

Use when data is already grouped into n classes, with wi values in the ith class

i

nn2211

n

1i

ii

w

xwxwxw

w

xw

x

Approximations for Grouped

Data

Suppose a data set contains values m1, m2, . . ., mk,

occurring with frequencies f1, f2, . . . fK

For a population of N observations the mean is

For a sample of n observations, the mean is

N

mf

μ

K

1i

ii

n

mf

x

K

1i

ii

K

1i

ifNwhere

K

1i

ifnwhere

Approximations for Grouped

Data

Suppose a data set contains values m1, m2, . . ., mk,

occurring with frequencies f1, f2, . . . fK

For a population of N observations the variance is

For a sample of n observations, the variance is

N

μ)(mf

σ

K

1i

2

ii2

1n

)x(mf

s

K

1i

2

ii2

The Sample Covariance

The covariance measures the strength of the linear relationship between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship

No causal effect is implied

N

))(y(x

y),(xCov

N

1i

yixi

xy

1n

)y)(yx(x

sy),(xCov

n

1i

ii

xy

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Interpreting Covariance

Coefficient of Correlation

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:

YX ss

y),(xCovr

YXσσ

y),(xCovρ

Features of Correlation Coefficient, r

Unit free

Ranges between –1 and 1

The closer to –1, the stronger the negative linear

relationship

The closer to 1, the stronger the positive linear

relationship

The closer to 0, the weaker any positive linear

relationship

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Interpreting the Result

r = .733

There is a relatively

strong positive linear

relationship between

test score #1

and test score #2

Students who scored high on the first test tended to score high on second test

Scatter Plot of Test Scores

70

75

80

85

90

95

100

70 75 80 85 90 95 100

Test #1 ScoreT

est

#2 S

co

re

Obtaining Linear Relationships

An equation can be fit to show the best linear

relationship between two variables:

Y = β0 + β1X

Where Y is the dependent variable and X is the

independent variable

Least Squares Regression

Estimates for coefficients β0 and β1 are found to

minimize the sum of the squared residuals

The least-squares regression line, based on sample

data, is

Where b1 is the slope of the line and b0 is the y-

intercept:

xbby 10ˆ

x

y

2

x

1s

sr

s

y)Cov(x,b xbyb 10

The following data give X, the price charged per

piece of plywood($) and Y, the quantitiy sold ( in

thousands)

(6,80) (7,60) (8,70) (9,40)(10,0)

Compute the covariance

Correlation coefficient

Compute and interpret regression coefficients.

What quantity of plywood is expected to be sold if

the price were $7 per piece?

Review Example

(6,80) (7,60) (8,70) (9,40)(10,0)

Compute the covariance = -45

Correlation coefficient= -.900. The correlation coefficient indicates

the strength of the linear association between the two variables


What quantity of plywood is expected to be sold if the price were $7

per piece?

Review Example

6 80 -2 4 30 900 -60

7 60 -1 1 10 100 -10

8 70 0 0 20 400 0

9 40 1 1 -10 100 -10

10 0 2 4 -50 2500 -100

40 250 0 10 0 4000 -180

= 8.00 = 50.00 = 2.5 =1000 Cov(x,y) = -45

= 1.5811 =31.623

)( xxi )( yyi 2)( xxi

2)( yyi )( yyi )( xxi

(6,80) (7,60) (8,70) (9,40)(10,0)


For a one dollar increase in the price per piece of plywood, the

quantity sold of plywood is estimated to decrease by 18 thousand

pieces

= 50.0 – (-18)(8.0) = 194.00

What quantity of plywood is expected to be sold if the price were $7

per piece?

Review Example

0.185.2

45),(21

xs

yxCovb

xbyb 10

0 1ˆ 194.00 18.0(7) 68y b b x

Summary

Described measures of central tendency Mean, median, mode

Illustrated the shape of the distribution Symmetric, skewed

Described measures of variation Range, interquartile range, variance and standard deviation,

coefficient of variation

Discussed measures of grouped data

Calculated measures of relationships between

variables covariance and correlation coefficient