Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary...
-
Upload
kaelyn-gallup -
Category
Documents
-
view
221 -
download
0
Transcript of Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary...
![Page 1: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/1.jpg)
Chapter 3, Numerical Descriptive Measures
• Data analysis is objective– Should report the summary measures that best
meet the assumptions about the data set
• Data interpretation is subjective– Should be done in fair, neutral and clear
manner
![Page 2: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/2.jpg)
Summary Measures
Arithmetic Mean
Median
Mode
Describing Data Numerically
Variance
Standard Deviation
Coefficient of Variation
Range
Interquartile Range
Geometric Mean
Skewness
Central Tendency Variation Shape
Quartiles
![Page 3: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/3.jpg)
Arithmetic Mean• The arithmetic mean (mean) is the most common measure of
central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
Sample size
n
XXX
n
XX n21
n
1ii
Observed values
![Page 4: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/4.jpg)
Geometric Mean• Geometric mean
– Used to measure the rate of change of a variable over time
• Geometric mean rate of return– Measures the status of an investment over time
– Where Ri is the rate of return in time period I
n/1n21G )XXX(X
1)]R1()R1()R1[(R n/1n21G
![Page 5: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/5.jpg)
Median: Position and Value• In an ordered array, the median is the “middle”
number (50% above, 50% below)• The location (position) of the median:
• The value of median is NOT affected by extreme values
dataorderedtheinposition2
1npositionMedian
![Page 6: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/6.jpg)
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may may be no mode
• There may be several modes
![Page 7: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/7.jpg)
Quartiles• Quartiles split the ranked data into 4 segments
with an equal number of values per segment• Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 =2 (n+1)/4 (the median position)
Third quartile position: Q3 = 3(n+1)/4
where n is the number of observed values
![Page 8: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/8.jpg)
Same center,
different variation
Measures of VariationVariation
Variance Standard Deviation
Coefficient of Variation
Range Interquartile Range
• Measures of variation give information on the spread or variability of the data values.
![Page 9: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/9.jpg)
Range and Interquartile Rage
• Range– Simplest measure of variation
– Difference between the largest and the smallest observations:
Range = Xlargest – Xsmallest
– Ignores the way in which data are distributed
– Sensitive to outliers
• Interquartile Range– Eliminate some high- and low-valued observations and calculate
the range from the remaining values
– Interquartile range = 3rd quartile – 1st quartile
= Q3 – Q1
![Page 10: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/10.jpg)
• Average (approximately) of squared deviations of values from the mean
– Sample variance:
Variance
1-n
)X(XS
n
1i
2i
2
Where = arithmetic mean
n = sample size
Xi = ith value of the variable X
X
![Page 11: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/11.jpg)
Standard Deviation• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data
• It is a measure of the “average” spread around the mean
• Sample standard deviation:
1-n
)X(XS
n
1i
2i
![Page 12: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/12.jpg)
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data measured in different units
100%X
SCV
![Page 13: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/13.jpg)
Shape of a Distribution
• Describes how data are distributed
• Measures of shape– Symmetric or skewed
Mean = Median Mean < Median Median < Mean
Right-SkewedLeft-Skewed Symmetric
![Page 14: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/14.jpg)
Using the Five-Number Summary to Explore the Shape
• Box-and-Whisker Plot: A Graphical display of data using 5-number summary:
• The Box and central line are centered between the endpoints if data are symmetric around the median
Minimum, Q1, Median, Q3, Maximum
Min Q1 Median Q3 Max
![Page 15: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/15.jpg)
Distribution Shape and Box-and-Whisker Plot
Right-SkewedLeft-Skewed Symmetric
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
![Page 16: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/16.jpg)
• If the data distribution is bell-shaped, then the interval:– contains about 68% of the values in the population or
the sample
– contains about 95% of the values in the population or the sample
– contains about 99.7% of the values in the population or the sample
Relationship between Std. Dev. And Shape: The Empirical Rule
1σμ
2σμ
3σμ
![Page 17: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/17.jpg)
Population Mean and Variance
N
μ)(Xσ
N
1i
2i
2
Population variance
N
XXX
N
XN21
N
1ii
Population Mean
![Page 18: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/18.jpg)
Covariance and Coefficient of Correlation
• The sample covariance measures the strength of the linear relationship between two variables (called bivariate data)
• The sample covariance:
• Only concerned with the strength of the relationship
• No causal effect is implied
1n
)YY)(XX()Y,X(cov
n
1iii
![Page 19: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/19.jpg)
• Covariance between two random variables:• cov(X,Y) > 0 X and Y tend to move in the same direction
• cov(X,Y) < 0 X and Y tend to move in opposite directions
• cov(X,Y) = 0 X and Y are independent
• Covariance does not say anything about the relative strength of the relationship.
• Coefficient of Correlation measures the relative strength of the linear relationship between two variables
YXn
1i
2i
n
1i
2i
n
1iii
SS
)Y,X(cov
)YY()XX(
)YY)(XX(r
![Page 20: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/20.jpg)
• Coefficient of Correlation:
– Is unit free
– Ranges between –1 (perfect negative) and 1(perfect
positive)
– The closer to –1, the stronger the negative linear
relationship
– The closer to 1, the stronger the positive linear
relationship
– The closer to 0, the weaker any positive linear
relationship
– At 0 there is no relationship at all
![Page 21: Chapter 3, Numerical Descriptive Measures Data analysis is objective –Should report the summary measures that best meet the assumptions about the data.](https://reader036.fdocuments.us/reader036/viewer/2022062417/551aa129550346761a8b5fe3/html5/thumbnails/21.jpg)
Correlation vs. Regression• A scatter plot (or scatter diagram) can be used
to show the relationship between two variables
• Correlation analysis is used to measure strength of the association (linear relationship) between two variables– Correlation is only concerned with strength of the
relationship
– No causal effect is implied with correlation