02_FrequencyHistogram

download 02_FrequencyHistogram

of 38

Transcript of 02_FrequencyHistogram

  • 8/14/2019 02_FrequencyHistogram

    1/38

    STAT 201Introduction to Business Statistics

    Class 2: Describing Data by Graphs, Charts,Tables

  • 8/14/2019 02_FrequencyHistogram

    2/38

    Takeaways from Class 1

    Statisticsis the study of the collection, organization,

    analysis, interpretation, and presentation of data gives managers a better understanding of the

    business environment

    enables them to make more informed and betterdecisions

    Descriptive Statistics: Summaries of data which may

    be graphical, tabular, or numerical Inferential Statistics:Procedures that help draw

    conclusions about a set of data from a subset of thatdata

    2

  • 8/14/2019 02_FrequencyHistogram

    3/38

    Takeaways from Class 1

    Population:the set of all items or individuals of

    interest A parameteris a summary measure computed to

    describe a characteristic of the population

    Sample:a subset of the population

    A statisticis a summary measure computed to

    describe a characteristic of the sample drawn from

    the population

    3

  • 8/14/2019 02_FrequencyHistogram

    4/38

    Takeaways from Class 1

    Types of Data:

    Qualitative (Categorical):Data grouped by specificcategories (e.g., eye color, marital status), generally

    Nominal Scale: only labels

    Ordinal Scale: can be ordered

    Quantitative (Numerical):Data grouped bynumerical values (e.g., number of children, weight),generally

    Interval Scale: meaningful intervals in addition to ordinalscale

    Ratio Scale: meaningful ratios in addition to interval scale(there is a zero value)

    Time Series Data:Collected over several timeperiods

    4

  • 8/14/2019 02_FrequencyHistogram

    5/38

    Todays Focus

    Introducing tabular and graphical methods

    commonly used to summarize both categorical andquantitative data.

    Tabular and graphical summaries of data are foundin:

    Annual reports

    Newspaper articles

    Research studies

    5

    Sources: The Economist, RealClearPolitics.com

  • 8/14/2019 02_FrequencyHistogram

    6/38

  • 8/14/2019 02_FrequencyHistogram

    7/38

    Summarizing Categorical Data

    Frequency Distribution:

    A frequency distribution is a tabular summary of datashowing the number (frequency) of items of severalnon-overlapping classes.

    7

  • 8/14/2019 02_FrequencyHistogram

    8/38

    Example: Soft Drink Purchases*

    Data from a sample of 50 soft drink purchases

    8

    Source: Modern Business Statistics by Anderson, Sweeney, Williams

    Coke Classic Sprite Pepsi

    Diet Coke Coke Classic Coke Classic

    Pepsi Diet Coke Coke Classic

    Diet Coke Coke Classic Coke Classic

    Coke Classic Diet Coke Pepsi

    Coke Classic Coke Classic Dr. Pepper

    Dr. Pepper Sprite Coke ClassicDiet Coke Pepsi Diet Coke

    Pepsi Coke Classic Pepsi

    Pepsi Coke Classic Pepsi

    Coke Classic Coke Classic Pepsi

    Dr. Pepper Pepsi Pepsi

    Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper

    Diet Coke Dr. Pepper Pepsi

    Coke Classic Pepsi Sprite

    Coke Classic Diet Coke

  • 8/14/2019 02_FrequencyHistogram

    9/38

    Summarizing Categorical Data

    To develop a frequency distribution, count the

    number of times each item type appears in data.

    9

    Soft Drink

    Frequenc

    y

    Coke ClassicDiet Coke

    Dr. Pepper

    Pepsi

    Sprite

  • 8/14/2019 02_FrequencyHistogram

    10/38

  • 8/14/2019 02_FrequencyHistogram

    11/38

    Summarizing Categorical Data

    To develop a frequency distribution, count the

    number of times each item type appears in data.

    11

    Soft Drink

    Frequenc

    y

    Coke Classic 19Diet Coke

    Dr. Pepper

    Pepsi

    Sprite

  • 8/14/2019 02_FrequencyHistogram

    12/38

    Example: Soft Drink Purchases*

    Data from a sample of 50 soft drink purchases

    12

    Coke Classic Sprite Pepsi

    Diet Coke Coke Classic Coke Classic

    Pepsi Diet Coke Coke Classic

    Diet Coke Coke Classic Coke Classic

    Coke Classic Diet Coke Pepsi

    Coke Classic Coke Classic Dr. Pepper

    Dr. Pepper Sprite Coke ClassicDiet Coke Pepsi Diet Coke

    Pepsi Coke Classic Pepsi

    Pepsi Coke Classic Pepsi

    Coke Classic Coke Classic Pepsi

    Dr. Pepper Pepsi Pepsi

    Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper

    Diet Coke Dr. Pepper Pepsi

    Coke Classic Pepsi Sprite

    Coke Classic Diet Coke

  • 8/14/2019 02_FrequencyHistogram

    13/38

    Summarizing Categorical Data

    To develop a frequency distribution, count the

    number of times each item type appears in data.

    13

    Soft Drink

    Frequenc

    y

    Coke Classic 19Diet Coke 8

    Dr. Pepper

    Pepsi

    Sprite

  • 8/14/2019 02_FrequencyHistogram

    14/38

    Example: Soft Drink Purchases*

    Data from a sample of 50 soft drink purchases

    14

    Coke Classic Sprite Pepsi

    Diet Coke Coke Classic Coke Classic

    Pepsi Diet Coke Coke Classic

    Diet Coke Coke Classic Coke Classic

    Coke Classic Diet Coke Pepsi

    Coke Classic Coke Classic Dr. Pepper

    Dr. Pepper Sprite Coke ClassicDiet Coke Pepsi Diet Coke

    Pepsi Coke Classic Pepsi

    Pepsi Coke Classic Pepsi

    Coke Classic Coke Classic Pepsi

    Dr. Pepper Pepsi Pepsi

    Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper

    Diet Coke Dr. Pepper Pepsi

    Coke Classic Pepsi Sprite

    Coke Classic Diet Coke

  • 8/14/2019 02_FrequencyHistogram

    15/38

    Summarizing Categorical Data

    To develop a frequency distribution, count the

    number of times each item type appears in data.

    15

    Soft Drink

    Frequenc

    y

    Coke Classic 19Diet Coke 8

    Dr. Pepper 5

    Pepsi 13

    Sprite 5

  • 8/14/2019 02_FrequencyHistogram

    16/38

  • 8/14/2019 02_FrequencyHistogram

    17/38

    Summarizing Categorical Data

    What can we say by looking at this data? Who is the

    market leader? Coke Classic is the market leader

    Pepsi is second

    Diet Coke is third

    17

    Soft Drink

    Frequenc

    yCoke Classic 19

    Diet Coke 8

    Dr. Pepper 5

    Pepsi 13

    Sprite 5

  • 8/14/2019 02_FrequencyHistogram

    18/38

    Summarizing Categorical Data

    The summary provides more insight than the rawdata!

    18

    Soft Drink

    Frequenc

    y

    Coke Classic 19Diet Coke 8

    Dr. Pepper 5Pepsi 13

    Sprite 5

    Coke Classic Sprite Pepsi

    Diet Coke Coke Classic Coke Classic

    Pepsi Diet Coke Coke ClassicDiet Coke Coke Classic Coke Classic

    Coke Classic Diet Coke Pepsi

    Coke Classic Coke Classic Dr. Pepper

    Dr. Pepper Sprite Coke Classic

    Diet Coke Pepsi Diet Coke

    Pepsi Coke Classic Pepsi

    Pepsi Coke Classic Pepsi

    Coke Classic Coke Classic Pepsi

    Dr. Pepper Pepsi Pepsi

    Sprite Coke Classic Coke Classic

    Coke Classic Sprite Dr. Pepper

    Diet Coke Dr. Pepper Pepsi

    Coke Classic Pepsi SpriteCoke Classic Diet Coke

  • 8/14/2019 02_FrequencyHistogram

    19/38

    Summarizing Categorical Data

    Relative frequency of a class equals the fraction or

    proportion of items belonging to a class.

    For a data with n observations:

    Relative frequency of a class = Frequency of class / n

    19

  • 8/14/2019 02_FrequencyHistogram

    20/38

    Summarizing Categorical Data

    Relative frequency of a class equals the fraction or

    proportion of items belonging to a class.

    20

    Soft Drink Frequency Relative Frequency

    Percent

    Frequency

    Coke Classic 19 0.38 38

    Diet Coke 8 0.16 16

    Dr. Pepper 5 0.10 10

    Pepsi 13 0.26 26

    Sprite 5 0.10 10

    Total 50 1.00 100

  • 8/14/2019 02_FrequencyHistogram

    21/38

    Summarizing Quantitative Data

    A frequency distribution is a tabular summary of datashowing the number (frequency) of items of severalnon-overlappingclasses.

    Three steps necessary to define the classes for afrequency distribution with quantitative data:

    Determine the number of non-overlapping classes

    Determine the width of each class Determine the class limits

    21

  • 8/14/2019 02_FrequencyHistogram

    22/38

  • 8/14/2019 02_FrequencyHistogram

    23/38

    Summarizing Quantitative Data

    Too many classes:

    May yield a very jagged distribution with gaps from emptyclasses

    Can give a poor indication of how frequency varies acrossclasses

    23

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4 812

    16

    20

    24

    28

    32

    36

    40

    44

    48

    52

    56

    60

    More

    Temperature

    Frequency

  • 8/14/2019 02_FrequencyHistogram

    24/38

    Summarizing Quantitative Data

    Too few classes:

    May compress variation too much and yield a blockydistribution

    Can obscure important patterns of variation

    24

    0

    2

    4

    6

    8

    10

    12

    0 30 60 More

    Temperature

    Frequency

  • 8/14/2019 02_FrequencyHistogram

    25/38

    Summarizing Quantitative Data

    Number of classes:

    Number of Data Points Number of

    Classes

    under 50 5 - 750100 6 - 10

    100250 7 - 12over 250 10 - 20

    Class widths can typically be reduced as the number of

    observations increases

    Distributions with numerous observations are more likelyto be smooth and have gaps filled since data are plentiful

    25

  • 8/14/2019 02_FrequencyHistogram

    26/38

    Summarizing Quantitative Data

    Width of classes:

    If possible, use the same width for each class.

    Range of data = Largest data pointSmallest datapoint

    Approximate class width = Range / (Number of

    classes)

    Generally round to a convenient number

    26

  • 8/14/2019 02_FrequencyHistogram

    27/38

    Summarizing Quantitative Data

    Class Limits:

    Class limits must be chosen so that each data itembelongs to one and only one class.

    Lower class limit is the smallest possible data valueassigned to the class. Upper class limit is the largestpossible data value assigned to the class.

    Class midpoint: The value halfway between thelower and upper class limits.

    27

  • 8/14/2019 02_FrequencyHistogram

    28/38

    Summarizing Quantitative Data

    Important Considerations for Selecting Classes:

    Must be mutually exclusive

    Must be all-inclusive

    Categories (classes) should be of equal width

    Avoid empty categories

    28

  • 8/14/2019 02_FrequencyHistogram

    29/38

  • 8/14/2019 02_FrequencyHistogram

    30/38

    Example: Credit Card Balances

    (See Class 02ExampleCredit Card Balances.xls)

    Minimum Data value: $99.00

    Maximum Date value: $1493.00

    Range: $1493.00 - $99.00 = $1394.00

    Approximate Class Size:

    Approx Class Size = Range/(# of

    classes)=1394/9=154.89 For convenience and better representation, will pick

    200.00

    30

  • 8/14/2019 02_FrequencyHistogram

    31/38

    Example: Credit Card Balances

    Determining the class limits:

    31

    Class Lower Limit Class Upper Limit0 199.99

    200 399.99

    400 599.99600 799.99800 999.99

    1000 1199.991200 1399.99

    1400 1599.99

    Omitted the 9thclass $1600 and under $1799.99 as nodata falls in the class (maximum data value is $1493).So, will use 8 classes in total.

  • 8/14/2019 02_FrequencyHistogram

    32/38

    Example: Credit Card Balances

    (See Class 02ExampleCredit Card Balances.xls)

    32

    Excel

    ArrayFormula !

  • 8/14/2019 02_FrequencyHistogram

    33/38

    Example: Credit Card Balances

    (See Class 02ExampleCredit Card Balances.xls)

    Using PhStat2

    33

  • 8/14/2019 02_FrequencyHistogram

    34/38

    Example: Credit Card Balances

    (See Class 02ExampleCredit Card Balances.xls)

    Using PhStat2

    34

  • 8/14/2019 02_FrequencyHistogram

    35/38

    Summarizing Quantitative Data

    Histogram:A histogram is constructed by placing thevariable of interest on the horizontal axis and thefrequency, relative frequency, or percent frequencyon the vertical axis.

    Rectangles with bases determined by the class limitson the horizontal axis and heights corresponding tofrequency, relative frequency, or percent frequency.

    35

  • 8/14/2019 02_FrequencyHistogram

    36/38

    Summarizing Quantitative Data

    Histogram:

    Adjacent rectangles of a histogram touch oneanother. (Unlike a bar graph, no separation betweenthe rectangles.)

    Histograms provide information about the shape or

    form of a distribution.

    36

  • 8/14/2019 02_FrequencyHistogram

    37/38

    Summarizing Quantitative Data

    Histogram:

    37

    0.00%

    20.00%

    40.00%

    60.00%

    80.00%

    100.00%

    120.00%

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    199.99 399.99 599.99 799.99 999.99 1199.99 1399.99 1599.99

    Frequency

    Upper Class Limit

    Histogram

    Frequency

    Cumulative %

  • 8/14/2019 02_FrequencyHistogram

    38/38

    Summarizing Quantitative Data

    Cumulative Distributions (Ogives): shows thenumber of data items with values less than or equalto the upper class limit of each class.

    38