02_FrequencyHistogram
Transcript of 02_FrequencyHistogram
-
8/14/2019 02_FrequencyHistogram
1/38
STAT 201Introduction to Business Statistics
Class 2: Describing Data by Graphs, Charts,Tables
-
8/14/2019 02_FrequencyHistogram
2/38
Takeaways from Class 1
Statisticsis the study of the collection, organization,
analysis, interpretation, and presentation of data gives managers a better understanding of the
business environment
enables them to make more informed and betterdecisions
Descriptive Statistics: Summaries of data which may
be graphical, tabular, or numerical Inferential Statistics:Procedures that help draw
conclusions about a set of data from a subset of thatdata
2
-
8/14/2019 02_FrequencyHistogram
3/38
Takeaways from Class 1
Population:the set of all items or individuals of
interest A parameteris a summary measure computed to
describe a characteristic of the population
Sample:a subset of the population
A statisticis a summary measure computed to
describe a characteristic of the sample drawn from
the population
3
-
8/14/2019 02_FrequencyHistogram
4/38
Takeaways from Class 1
Types of Data:
Qualitative (Categorical):Data grouped by specificcategories (e.g., eye color, marital status), generally
Nominal Scale: only labels
Ordinal Scale: can be ordered
Quantitative (Numerical):Data grouped bynumerical values (e.g., number of children, weight),generally
Interval Scale: meaningful intervals in addition to ordinalscale
Ratio Scale: meaningful ratios in addition to interval scale(there is a zero value)
Time Series Data:Collected over several timeperiods
4
-
8/14/2019 02_FrequencyHistogram
5/38
Todays Focus
Introducing tabular and graphical methods
commonly used to summarize both categorical andquantitative data.
Tabular and graphical summaries of data are foundin:
Annual reports
Newspaper articles
Research studies
5
Sources: The Economist, RealClearPolitics.com
-
8/14/2019 02_FrequencyHistogram
6/38
-
8/14/2019 02_FrequencyHistogram
7/38
Summarizing Categorical Data
Frequency Distribution:
A frequency distribution is a tabular summary of datashowing the number (frequency) of items of severalnon-overlapping classes.
7
-
8/14/2019 02_FrequencyHistogram
8/38
Example: Soft Drink Purchases*
Data from a sample of 50 soft drink purchases
8
Source: Modern Business Statistics by Anderson, Sweeney, Williams
Coke Classic Sprite Pepsi
Diet Coke Coke Classic Coke Classic
Pepsi Diet Coke Coke Classic
Diet Coke Coke Classic Coke Classic
Coke Classic Diet Coke Pepsi
Coke Classic Coke Classic Dr. Pepper
Dr. Pepper Sprite Coke ClassicDiet Coke Pepsi Diet Coke
Pepsi Coke Classic Pepsi
Pepsi Coke Classic Pepsi
Coke Classic Coke Classic Pepsi
Dr. Pepper Pepsi Pepsi
Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper
Diet Coke Dr. Pepper Pepsi
Coke Classic Pepsi Sprite
Coke Classic Diet Coke
-
8/14/2019 02_FrequencyHistogram
9/38
Summarizing Categorical Data
To develop a frequency distribution, count the
number of times each item type appears in data.
9
Soft Drink
Frequenc
y
Coke ClassicDiet Coke
Dr. Pepper
Pepsi
Sprite
-
8/14/2019 02_FrequencyHistogram
10/38
-
8/14/2019 02_FrequencyHistogram
11/38
Summarizing Categorical Data
To develop a frequency distribution, count the
number of times each item type appears in data.
11
Soft Drink
Frequenc
y
Coke Classic 19Diet Coke
Dr. Pepper
Pepsi
Sprite
-
8/14/2019 02_FrequencyHistogram
12/38
Example: Soft Drink Purchases*
Data from a sample of 50 soft drink purchases
12
Coke Classic Sprite Pepsi
Diet Coke Coke Classic Coke Classic
Pepsi Diet Coke Coke Classic
Diet Coke Coke Classic Coke Classic
Coke Classic Diet Coke Pepsi
Coke Classic Coke Classic Dr. Pepper
Dr. Pepper Sprite Coke ClassicDiet Coke Pepsi Diet Coke
Pepsi Coke Classic Pepsi
Pepsi Coke Classic Pepsi
Coke Classic Coke Classic Pepsi
Dr. Pepper Pepsi Pepsi
Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper
Diet Coke Dr. Pepper Pepsi
Coke Classic Pepsi Sprite
Coke Classic Diet Coke
-
8/14/2019 02_FrequencyHistogram
13/38
Summarizing Categorical Data
To develop a frequency distribution, count the
number of times each item type appears in data.
13
Soft Drink
Frequenc
y
Coke Classic 19Diet Coke 8
Dr. Pepper
Pepsi
Sprite
-
8/14/2019 02_FrequencyHistogram
14/38
Example: Soft Drink Purchases*
Data from a sample of 50 soft drink purchases
14
Coke Classic Sprite Pepsi
Diet Coke Coke Classic Coke Classic
Pepsi Diet Coke Coke Classic
Diet Coke Coke Classic Coke Classic
Coke Classic Diet Coke Pepsi
Coke Classic Coke Classic Dr. Pepper
Dr. Pepper Sprite Coke ClassicDiet Coke Pepsi Diet Coke
Pepsi Coke Classic Pepsi
Pepsi Coke Classic Pepsi
Coke Classic Coke Classic Pepsi
Dr. Pepper Pepsi Pepsi
Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper
Diet Coke Dr. Pepper Pepsi
Coke Classic Pepsi Sprite
Coke Classic Diet Coke
-
8/14/2019 02_FrequencyHistogram
15/38
Summarizing Categorical Data
To develop a frequency distribution, count the
number of times each item type appears in data.
15
Soft Drink
Frequenc
y
Coke Classic 19Diet Coke 8
Dr. Pepper 5
Pepsi 13
Sprite 5
-
8/14/2019 02_FrequencyHistogram
16/38
-
8/14/2019 02_FrequencyHistogram
17/38
Summarizing Categorical Data
What can we say by looking at this data? Who is the
market leader? Coke Classic is the market leader
Pepsi is second
Diet Coke is third
17
Soft Drink
Frequenc
yCoke Classic 19
Diet Coke 8
Dr. Pepper 5
Pepsi 13
Sprite 5
-
8/14/2019 02_FrequencyHistogram
18/38
Summarizing Categorical Data
The summary provides more insight than the rawdata!
18
Soft Drink
Frequenc
y
Coke Classic 19Diet Coke 8
Dr. Pepper 5Pepsi 13
Sprite 5
Coke Classic Sprite Pepsi
Diet Coke Coke Classic Coke Classic
Pepsi Diet Coke Coke ClassicDiet Coke Coke Classic Coke Classic
Coke Classic Diet Coke Pepsi
Coke Classic Coke Classic Dr. Pepper
Dr. Pepper Sprite Coke Classic
Diet Coke Pepsi Diet Coke
Pepsi Coke Classic Pepsi
Pepsi Coke Classic Pepsi
Coke Classic Coke Classic Pepsi
Dr. Pepper Pepsi Pepsi
Sprite Coke Classic Coke Classic
Coke Classic Sprite Dr. Pepper
Diet Coke Dr. Pepper Pepsi
Coke Classic Pepsi SpriteCoke Classic Diet Coke
-
8/14/2019 02_FrequencyHistogram
19/38
Summarizing Categorical Data
Relative frequency of a class equals the fraction or
proportion of items belonging to a class.
For a data with n observations:
Relative frequency of a class = Frequency of class / n
19
-
8/14/2019 02_FrequencyHistogram
20/38
Summarizing Categorical Data
Relative frequency of a class equals the fraction or
proportion of items belonging to a class.
20
Soft Drink Frequency Relative Frequency
Percent
Frequency
Coke Classic 19 0.38 38
Diet Coke 8 0.16 16
Dr. Pepper 5 0.10 10
Pepsi 13 0.26 26
Sprite 5 0.10 10
Total 50 1.00 100
-
8/14/2019 02_FrequencyHistogram
21/38
Summarizing Quantitative Data
A frequency distribution is a tabular summary of datashowing the number (frequency) of items of severalnon-overlappingclasses.
Three steps necessary to define the classes for afrequency distribution with quantitative data:
Determine the number of non-overlapping classes
Determine the width of each class Determine the class limits
21
-
8/14/2019 02_FrequencyHistogram
22/38
-
8/14/2019 02_FrequencyHistogram
23/38
Summarizing Quantitative Data
Too many classes:
May yield a very jagged distribution with gaps from emptyclasses
Can give a poor indication of how frequency varies acrossclasses
23
0
0.5
1
1.5
2
2.5
3
3.5
4 812
16
20
24
28
32
36
40
44
48
52
56
60
More
Temperature
Frequency
-
8/14/2019 02_FrequencyHistogram
24/38
Summarizing Quantitative Data
Too few classes:
May compress variation too much and yield a blockydistribution
Can obscure important patterns of variation
24
0
2
4
6
8
10
12
0 30 60 More
Temperature
Frequency
-
8/14/2019 02_FrequencyHistogram
25/38
Summarizing Quantitative Data
Number of classes:
Number of Data Points Number of
Classes
under 50 5 - 750100 6 - 10
100250 7 - 12over 250 10 - 20
Class widths can typically be reduced as the number of
observations increases
Distributions with numerous observations are more likelyto be smooth and have gaps filled since data are plentiful
25
-
8/14/2019 02_FrequencyHistogram
26/38
Summarizing Quantitative Data
Width of classes:
If possible, use the same width for each class.
Range of data = Largest data pointSmallest datapoint
Approximate class width = Range / (Number of
classes)
Generally round to a convenient number
26
-
8/14/2019 02_FrequencyHistogram
27/38
Summarizing Quantitative Data
Class Limits:
Class limits must be chosen so that each data itembelongs to one and only one class.
Lower class limit is the smallest possible data valueassigned to the class. Upper class limit is the largestpossible data value assigned to the class.
Class midpoint: The value halfway between thelower and upper class limits.
27
-
8/14/2019 02_FrequencyHistogram
28/38
Summarizing Quantitative Data
Important Considerations for Selecting Classes:
Must be mutually exclusive
Must be all-inclusive
Categories (classes) should be of equal width
Avoid empty categories
28
-
8/14/2019 02_FrequencyHistogram
29/38
-
8/14/2019 02_FrequencyHistogram
30/38
Example: Credit Card Balances
(See Class 02ExampleCredit Card Balances.xls)
Minimum Data value: $99.00
Maximum Date value: $1493.00
Range: $1493.00 - $99.00 = $1394.00
Approximate Class Size:
Approx Class Size = Range/(# of
classes)=1394/9=154.89 For convenience and better representation, will pick
200.00
30
-
8/14/2019 02_FrequencyHistogram
31/38
Example: Credit Card Balances
Determining the class limits:
31
Class Lower Limit Class Upper Limit0 199.99
200 399.99
400 599.99600 799.99800 999.99
1000 1199.991200 1399.99
1400 1599.99
Omitted the 9thclass $1600 and under $1799.99 as nodata falls in the class (maximum data value is $1493).So, will use 8 classes in total.
-
8/14/2019 02_FrequencyHistogram
32/38
Example: Credit Card Balances
(See Class 02ExampleCredit Card Balances.xls)
32
Excel
ArrayFormula !
-
8/14/2019 02_FrequencyHistogram
33/38
Example: Credit Card Balances
(See Class 02ExampleCredit Card Balances.xls)
Using PhStat2
33
-
8/14/2019 02_FrequencyHistogram
34/38
Example: Credit Card Balances
(See Class 02ExampleCredit Card Balances.xls)
Using PhStat2
34
-
8/14/2019 02_FrequencyHistogram
35/38
Summarizing Quantitative Data
Histogram:A histogram is constructed by placing thevariable of interest on the horizontal axis and thefrequency, relative frequency, or percent frequencyon the vertical axis.
Rectangles with bases determined by the class limitson the horizontal axis and heights corresponding tofrequency, relative frequency, or percent frequency.
35
-
8/14/2019 02_FrequencyHistogram
36/38
Summarizing Quantitative Data
Histogram:
Adjacent rectangles of a histogram touch oneanother. (Unlike a bar graph, no separation betweenthe rectangles.)
Histograms provide information about the shape or
form of a distribution.
36
-
8/14/2019 02_FrequencyHistogram
37/38
Summarizing Quantitative Data
Histogram:
37
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
0
10
20
30
40
50
60
70
80
90
199.99 399.99 599.99 799.99 999.99 1199.99 1399.99 1599.99
Frequency
Upper Class Limit
Histogram
Frequency
Cumulative %
-
8/14/2019 02_FrequencyHistogram
38/38
Summarizing Quantitative Data
Cumulative Distributions (Ogives): shows thenumber of data items with values less than or equalto the upper class limit of each class.
38