Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and...
Transcript of Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and...
![Page 1: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/1.jpg)
BUSINESS STATISTICS
Bijay Lal Pradhan, Ph.D.
MBA, Pokhara University
![Page 2: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/2.jpg)
WHY BUSINESS STATISTICS
Most successful Manager and Decision
makers understand the information and
know how to use it effectively
![Page 3: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/3.jpg)
COURSE CONTENT
Introduction and Data Collection
Summarization of Data
Grouping and Displaying Data
Basic Probability: Concepts and Applications.
Probability Distributions
Sampling Distribution and Estimation
Hypothesis Testing
Chi-Square Test and Analysis of Variance
Correlation and Regression Analysis
![Page 4: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/4.jpg)
DATA
Data Type
Qualitative data
Quantitative data
Discrete data
Continuous data
Presenting data
Individual form
Discrete frequency form
Continuous frequency form
Upper limit included form
Upper limit excluded form
![Page 5: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/5.jpg)
NUMERICAL DESCRIPTIVE MEASURE
Arithmetic Mean
Geometric Mean
Harmonic Mean
Median
Mode
![Page 6: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/6.jpg)
SOME OTHERS NUMERICAL DESCRIPTIVE MEASURE
Midhinge: average of first and third quartiles
Midrange: average of largest and smallest value
Quartiles: first and third quartiles
Range: Difference between largest and smallest
item
Standard Deviation: Positive square root of mean
of square of deviation from its AM
Variance: Square of SD
Coefficient of Variation: CV = 𝜎
𝑥∗ 100
Shape (Symmetric and Skewed) : Skewness
(Difference between mean and mode)
![Page 7: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/7.jpg)
DETECTING OUTLIERS
If relative measure ( z= 𝑥−𝜇
𝜎) of any value is less
than -3 and more than 3 then they are said to be
outliers and taken out from the study.
![Page 8: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/8.jpg)
EXPLORATORY DATA ANALYSIS
Five Number Summary: Three quartiles together
with the low and high data values give us a very
useful look at the data and their spread.
Box and Whisker Plot: uses a Five-Number
Summary to create a graphic sketch of the data.
Q1XSmallest Median XlargestQ3
MedianXsmallest Q1 Q3
25% 25%50%
Xlargest
![Page 9: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/9.jpg)
BOX AND WHISKER PLOT
Box and Whisker Plot use a Five-Number
Summary to create a graphic sketch of the data.
Box and Whisker Plot gives a graphical
representation of the data set contained between
the upper and lower limits. This plot determines
the degree of symmetry (or skewness) based on
the distances that separate the five numbers.
![Page 10: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/10.jpg)
THE DISTRIBUTION IS POSITIVELY SKEWED IF
The distance from the median (md) to the third
Quartile(Q3) is greater than the distance from the
median (md) to the first quartile (Q1).
The distance from the median (md) to the largest
value is greater than the distance from the
median (md) to the smallest value of the data.
The distance from the third Quartile (Q3) to the
largest value is greater than the distance from
the first quartile (Q1) to the smallest value.
![Page 11: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/11.jpg)
THE DISTRIBUTION IS NEGATIVELY SKEWED IF
The distance from the median (md) to the third
Quartile(Q3) is less than the distance from the
median (md) to the first quartile (Q1).
The distance from the median (md) to the largest
value is less than the distance from the median
(md) to the smallest value of the data.
The distance from the third Quartile(Q3) to the
largest value is less than the distance from the
first quartile (Q1) to the smallest value.
![Page 12: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/12.jpg)
THE DISTRIBUTION IS PERFECTLY SYMMETRICAL IF
The distance from the median (md) to the third
Quartile (Q3) is equal to the distance from the
median (md) to the first quartile (Q1).
The distance from the median (md) to the largest
value is equal to the distance from the median
(md) to the smallest value of the data.
The distance from the third Quartile (Q3) to the
largest value is equal to the distance from the
first quartile (Q1) to the smallest value.
![Page 13: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/13.jpg)
EXAMPLE:1
Size 0-10 10-20 20-30 30-40 40-50 50-60
Frequency 10 12 25 35 40 50
Size f c f
0-10 10 10
10-20 12 22
20-30 25 47
30-40 35 82
40-50 40 122
50-60 50 172
N 172
SolutionLargest value XL = 60Smallest value Xs = 0
List the five number summary and prepare a box and whisker plot from the following information. Are the data skewed?
Size of Md = 𝑁
2= 172
2= 86th item
The c.f. just >86 is 122. Hence class 40 - 50 is themedian class.
Md = L +𝑵
𝟐− 𝒄.𝒇.
𝒇x h = 40 +
𝟖𝟔 − 𝟖𝟐
𝟒𝟎x 10 = 40+1 = 41
Size of Q1 =𝑁
4=172
4= 43
The c.f. just > 43 is 47. Hence, Q1 lies in class 20 - 30
Q1= L +𝑵
𝟒− 𝒄.𝒇.
𝒇x h = 20 +
𝟒𝟑 − 𝟐𝟐
𝟐𝟓x 10 = 20 + 8.4 = 28.4
Size of Q3 =3𝑁
4=
3𝑥172
4= = 129
The c.f. just >129 is 172. Hence, Q3 lies in class 50-60
Q3= L + 𝟑𝑵
𝟒− 𝒄.𝒇.
𝒇x h= 50 +
𝟏𝟐𝟗 − 𝟏𝟐𝟐
𝟓𝟎x 10 = 50 + 1.4 = 51.4
![Page 14: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/14.jpg)
EXAMPLE:1
Sf
Xs Q1 Md Q3 Xl
The five number summary is {0, 28.4, 41, 51.4, 60}
Since the length of left whisker is longer than the length of
right whisker the given distribution is negatively (left) skewed.
0 10 20 30 40 50 60
Solution
box and whisker plot
![Page 15: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/15.jpg)
PU2016
18 15 25 18 29 28 23 20 13 26
29 24 13 13 28 26 18 30 24 23
26 17 18 30 28 18 19 16 15 35
21 15 17 24 26 22 32 38 20 15
35 18 14 25 19 28 28 12 30 36
To help determine the need for more golf courses, a survey was undertaken.
A sample of 50 self-declared golfers was asked how many rounds of golf they
played last year. These data are as follows:
Summarize these data using stem and leaf display.
Divide the data set into five classes of equal width and construct a frequency
distribution and relative frequency distribution.
Construct frequency histogram and frequency polygon.
Compute mean and standard deviation of the grouped data set constructed
in (b).
Construct a box and whisker plot of the grouped data set prepared in (b).
Then, describe the shape of the data.
![Page 16: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/16.jpg)
Number Ascen Stem Leaf
18 12 1 23334555567788888899
15 13 2 0012334445566668888899
25 13 3 0002556818 13 interval *=L-S/n=38-12/5= 5.2
29 14
28 15 Bin Class CF FREQ RF m fm fm2
23 15 17 12-18 12 12 0.24 15 180 2700
20 15 23 18-24 26 14 0.28 21 294 6174
13 15 29 24-30 42 16 0.32 27 432 11664
26 16 35 30-36 48 6 0.12 33 198 6534
29 17 42 36-42 50 2 0.04 39 78 3042
24 17
13 18 1182 30114
13 18 mean 23.64 sd 6.59
28 18 smalles 12
26 18 largest 38
18 18 Q1 17.5 15+((12.5-5)/15)*5
30 18 Q2 22.78 20+((25-20)/9)*5
24 19 Q3 28.27 25+((37.5-29)/13)*523 19
26 20 Five number 12 17.5 22.8 28.27 3817 20
18 21
30 22
28 23
18 23
19 24
16 24
15 24
![Page 17: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/17.jpg)
lower upper frequency percent
10 < 15 5 10.0
15 < 20 15 30.0
20 < 25 9 18.0
25 < 30 13 26.0
30 < 35 4 8.0
35 < 40 4 8.0
0
5
10
15
20
25
30
35
Perc
ent
Data
Histogram
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
10 15 20 25 30 35
Perc
ent
Data
Frequency Polygon
![Page 18: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/18.jpg)
BOX WHISKER PLOT
12 17.5 22.8 28.3 38
10 15 20 25 30 35 40
Box Whisker Plot
Since the length of right whisker is longer than the length of
left whisker the given distribution is positively (Right) skewed.
![Page 19: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/19.jpg)
2014 PUThe following data represent the cost of electricity during June 2014 for a random sample of 50 one-bedroom apartments in a large city.
Raw Data on Utility Charges (Rs)
96 121 202 178 147 102 153 197 127 82
157 185 90 116 172 111 148 213 130 165
141 149 206 175 123 128 144 168 109 167
95 163 150 154 130 143 187 166 139 149
108 119 183 151 114 135 191 137 129 158
a) Construct a frequency distribution using interval as 80-100,100-120 and so on.
b) Construct frequency histogram and frequency polygon. Around
what amount does the monthly electricity cost seem to be
concentrated?
c) Construct an ogive to find the value of median.
d) Construct a box and whisker plot of the grouped data setprepared in (a). Then, describe the shape of the data.
e) Compare the median value of (c) and (d). Are they equal? If so,
what can be concluded?
![Page 20: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/20.jpg)
2016 SPRING PUThe president of Ocean Airlines is trying to estimate when the Federal Aviation Administration (FAA) is most likely to rule on the company's application for a new flight between Charlotte and Nashville. Assistants to the president have assembled the following waiting times for applications filed during the past year. The data are given in days from the date of application until an FAA ruling.
(i) Arrange above data in ascending order by preparing stem and leaf display.
(i) Construct frequency distribution using 6 intervals of equally spaced. Also
construct histogram and comment on the shape of the distribution.
(i) Prepare a box and whisker plot from group data set prepared in (ii) and then
describe nature of the distribution of data points.
(i) Compute the mean and coefficient of the variation from group data prepare
in (ii).
(i) Detect the outlier if any using exploratory data analysis.
14 40 13 48 31 40 25 33 62 12
44 34 68 11 33 42 26 55 47 11
29 40 41 30 34 31 64 35 57 63
44 44 17 52 32 36 34 53 41 39
29 22 28 44 51 31 44 28 56 53
![Page 21: Data Collection and Analysis · PDF file5/2/2017 · Sampling Distribution and Estimation Hypothesis Testing ... Divide the data set into five classes of equal width and construct](https://reader031.fdocuments.us/reader031/viewer/2022030420/5aa7e5af7f8b9a6d5a8cef56/html5/thumbnails/21.jpg)