Lecture 5
Chebyshev’s Theorem
and
Exercises
Introduction to Probability and
Statistics I
Cruise agency – number of weekly specials to the
Caribbean: 20, 73, 75, 80, 82
Review Example
Compute the mean, median and
mode and interpret your
results?
Review Example:Summary Statistics
Mean:
Median: middlemost observation = 75
Mode: no unique mode exists
33066
5
ixx
n
The median best describes the data due to the
presence of the outlier of 20. This skews the
distribution to the left. The manager should first check
to see if the value ‘20’ is correct.
20, 73, 75, 80, 82
Review Example:Summary Statistics
Mean:
Median: middlemost observation = 75
Mode: no unique mode exists
33066
5
ixx
n
The median best describes the data due to the
presence of the outlier of 20. This skews the
distribution to the left. The manager should first check
to see if the value ‘20’ is correct.
20, 73, 75, 80, 82
common stocks
4 14.3 19 -14.7 -26.5 37.2 23.8
treasury bills
6.5 4.4 3.8 6.9 8 5.8 5.1
Review Example
57.128.16
7
i
stocks
x
N
40.5025.786
7
i
Tbills
x
N
The mean annual % return on stocks is higher than the
return for U.S. Treasury bills
common stocks
4 14.3 19 -14.7 -26.5 37.2 23.8
treasury bills
6.5 4.4 3.8 6.9 8 5.8 5.1
Review Example
2
2( )i
stocks
x
N
2 2 2 2 2 2 2(4.0 8.16) (14.3 8.16) (19 8.16) ( 14.7 8.16) ( 26.5 8.16) (37.2 8.16) (23.8 8.16)
7
= 20.648
2
2( )i
Tbills
x
N
2 2 2 2 2 2 2(6.5 5.8) (4.4 5.8) (3.8 5.8) (6.9 5.8) (8.0 5.8) (5.8 5.8) (5.1 5.8)
7
=1.362
The variability of the U.S. Treasury bills is much smaller than the return on stocks.
common stocks
4 14.3 19 -14.7 -26.5 37.2 23.8
treasury bills
6.5 4.4 3.8 6.9 8 5.8 5.1
Review Example
2
2( )i
stocks
x
N
2 2 2 2 2 2 2(4.0 8.16) (14.3 8.16) (19 8.16) ( 14.7 8.16) ( 26.5 8.16) (37.2 8.16) (23.8 8.16)
7
= 20.648
2
2( )i
Tbills
x
N
2 2 2 2 2 2 2(6.5 5.8) (4.4 5.8) (3.8 5.8) (6.9 5.8) (8.0 5.8) (5.8 5.8) (5.1 5.8)
7
=1.362
The variability of the U.S. Treasury bills is much smaller than the return on stocks.
For any population with mean μ and
standard deviation σ , and k > 1 , the
percentage of observations that fall within
the interval
[μ + kσ]Is at least
Chebyshev’s Theorem
)]%(1/k100[1 2
Regardless of how the data are distributed,
at least (1 - 1/k2) of the values will fall
within k standard deviations of the mean
(for k > 1)
Examples:
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)
Chebyshev’s Theorem
withinAt least
(continued)
If the data distribution is bell-shaped, then
the interval:
contains about 68% of the values in
the population or the sample
The Empirical Rule
1σμ
μ
68%
1σμ
contains about 95% of the values in
the population or the sample
contains about 99.7% of the values
in the population or the sample
The Empirical Rule
2σμ
3σμ
3σμ
99.7%95%
2σμ
Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Can be used to compare two or more sets of
data measured in different units
100%x
sCV
A random sample of data has Mean = 75, variance
= 25.
Use Chebychev’s theorem to determine the
percent of observations between 65 and 85.
If the data are mounded use the emprical rule to
find the approximate percent of observations
between 65 and 85.
Review Example
A random sample of data has Mean = 75, variance
= 25.
Use Chebychev’s theorem. +/- 2 standard
deviations:
proportion must be at least
= = at least 75%
Review Example
2100[1 (1/ )]%k 2100[1 (1/ 2 )]%
The empirical rule. +/- 2 standard deviations:
Approximately 95% of the observations are within 2
standard deviations from the mean
Comparing Coefficient of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
Stock B:
Average price last year = $100
Standard deviation = $5
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
10%100%$50
$5100%
x
sCVA
5%100%$100
$5100%
x
sCVB
Weighted Mean
The weighted mean of a set of data is
Where wi is the weight of the ith observation
Use when data is already grouped into n classes, with wi values in the ith class
i
nn2211
n
1i
ii
w
xwxwxw
w
xw
x
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK
For a population of N observations the mean is
For a sample of n observations, the mean is
N
mf
μ
K
1i
ii
n
mf
x
K
1i
ii
K
1i
ifNwhere
K
1i
ifnwhere
Approximations for Grouped
Data
Suppose a data set contains values m1, m2, . . ., mk,
occurring with frequencies f1, f2, . . . fK
For a population of N observations the variance is
For a sample of n observations, the variance is
N
μ)(mf
σ
K
1i
2
ii2
1n
)x(mf
s
K
1i
2
ii2
The Sample Covariance
The covariance measures the strength of the linear relationship between two variables
The population covariance:
The sample covariance:
Only concerned with the strength of the relationship
No causal effect is implied
N
))(y(x
y),(xCov
N
1i
yixi
xy
1n
)y)(yx(x
sy),(xCov
n
1i
ii
xy
Covariance between two variables:
Cov(x,y) > 0 x and y tend to move in the same direction
Cov(x,y) < 0 x and y tend to move in opposite directions
Cov(x,y) = 0 x and y are independent
Interpreting Covariance
Coefficient of Correlation
Measures the relative strength of the linear relationship between two variables
Population correlation coefficient:
Sample correlation coefficient:
YX ss
y),(xCovr
YXσσ
y),(xCovρ
Features of Correlation Coefficient, r
Unit free
Ranges between –1 and 1
The closer to –1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker any positive linear
relationship
Scatter Plots of Data with Various Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3r = +1
Y
Xr = 0
Interpreting the Result
r = .733
There is a relatively
strong positive linear
relationship between
test score #1
and test score #2
Students who scored high on the first test tended to score high on second test
Scatter Plot of Test Scores
70
75
80
85
90
95
100
70 75 80 85 90 95 100
Test #1 ScoreT
est
#2 S
co
re
Obtaining Linear Relationships
An equation can be fit to show the best linear
relationship between two variables:
Y = β0 + β1X
Where Y is the dependent variable and X is the
independent variable
Least Squares Regression
Estimates for coefficients β0 and β1 are found to
minimize the sum of the squared residuals
The least-squares regression line, based on sample
data, is
Where b1 is the slope of the line and b0 is the y-
intercept:
xbby 10ˆ
x
y
2
x
1s
sr
s
y)Cov(x,b xbyb 10
The following data give X, the price charged per
piece of plywood($) and Y, the quantitiy sold ( in
thousands)
(6,80) (7,60) (8,70) (9,40)(10,0)
Compute the covariance
Correlation coefficient
Compute and interpret regression coefficients.
What quantity of plywood is expected to be sold if
the price were $7 per piece?
Review Example
(6,80) (7,60) (8,70) (9,40)(10,0)
Compute the covariance = -45
Correlation coefficient= -.900. The correlation coefficient indicates
the strength of the linear association between the two variables
Compute and interpret regression coefficients.
What quantity of plywood is expected to be sold if the price were $7
per piece?
Review Example
6 80 -2 4 30 900 -60
7 60 -1 1 10 100 -10
8 70 0 0 20 400 0
9 40 1 1 -10 100 -10
10 0 2 4 -50 2500 -100
40 250 0 10 0 4000 -180
= 8.00 = 50.00 = 2.5 =1000 Cov(x,y) = -45
= 1.5811 =31.623
)( xxi )( yyi 2)( xxi
2)( yyi )( yyi )( xxi
(6,80) (7,60) (8,70) (9,40)(10,0)
Compute and interpret regression coefficients.
For a one dollar increase in the price per piece of plywood, the
quantity sold of plywood is estimated to decrease by 18 thousand
pieces
= 50.0 – (-18)(8.0) = 194.00
What quantity of plywood is expected to be sold if the price were $7
per piece?
Review Example
0.185.2
45),(21
xs
yxCovb
xbyb 10
0 1ˆ 194.00 18.0(7) 68y b b x
Summary
Described measures of central tendency Mean, median, mode
Illustrated the shape of the distribution Symmetric, skewed
Described measures of variation Range, interquartile range, variance and standard deviation,
coefficient of variation
Discussed measures of grouped data
Calculated measures of relationships between
variables covariance and correlation coefficient
Top Related