1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs...

20
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs hapter Three ooking At Data: istributions

Transcript of 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs...

Page 1: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

1

Chapter 3 Looking at Data:Distributions

Introduction

3.1 Displaying Distributions with Graphs

Chapter ThreeLooking At Data:Distributions

Page 2: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

3.1 Displaying Distributions with Graphs

Variables

Examining Distributions of Variables

Graphs for Categorical Variables Bar graphs Pie charts

Graphs for Quantitative Variables Histograms Stemplots Time plots

2

Page 3: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Statistics

Statistics is the science of learning from data.

The first step in dealing with data is to organize your thinking about the data. An exploratory data analysis is the process of using statistical tools and ideas to examine data in order to describe their main features.

3

Exploring Data

Begin by examining each variable by itself. Then move on to study the relationships among the variables.

Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.

Exploring Data

Begin by examining each variable by itself. Then move on to study the relationships among the variables.

Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.

Page 4: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

VariablesWe construct a set of data by first deciding which cases or observations or individuals or units we want to study. For each case, we record information about characteristics that we call variables.

IndividualAn object described by data

IndividualAn object described by data

VariableCharacteristic of

the individual

VariableCharacteristic of

the individual

Categorical variablePlaces individual into one of several groups or categories.

Categorical variablePlaces individual into one of several groups or categories.

Quantitative variableTakes numerical values for which arithmetic operations

make sense.

Quantitative variableTakes numerical values for which arithmetic operations

make sense.

4

Page 5: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Quantitative Variables

5

Quantitative variables can either be counts or measurements or rates.

SEE EXAMPLE 3.6 ON PAGE 84 (IN THE CHAPTER 3 INTRODUCTION) FOR WHY RATES ARE IMPORTANT…

RATE = # of occurrences of the event per X in the population of all possible occurrences (where X is a large number (10,000 ; 100,000 e.g.)

Murder rate in NH County = (#murders in NHC/#possible murders(i.e., population) * large number (like 100,000)

(in 2012, 9 murders, population estimate=209234; so the murder rate in NH County in 2012 is 9/209234 =0.00004301. Multiply by 100,000 to get the rate per 100,000. 0.00004301 * 100000 = 4.3 murders per 100,000 people (or per capita) – Guilford County: 26/500879 * 100000 = 5.2; etc….

Quantitative variables can either be counts or measurements or rates.

SEE EXAMPLE 3.6 ON PAGE 84 (IN THE CHAPTER 3 INTRODUCTION) FOR WHY RATES ARE IMPORTANT…

RATE = # of occurrences of the event per X in the population of all possible occurrences (where X is a large number (10,000 ; 100,000 e.g.)

Murder rate in NH County = (#murders in NHC/#possible murders(i.e., population) * large number (like 100,000)

(in 2012, 9 murders, population estimate=209234; so the murder rate in NH County in 2012 is 9/209234 =0.00004301. Multiply by 100,000 to get the rate per 100,000. 0.00004301 * 100000 = 4.3 murders per 100,000 people (or per capita) – Guilford County: 26/500879 * 100000 = 5.2; etc….

Page 6: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Distribution of a Variable

6

To examine a single variable, we graphically display its distribution.

The distribution of a variable tells us what values it takes and with what frequency it takes on these values.

Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the kind of the variable and how easy it is to draw them…. JMP makes easy work of graphing!

The distribution of a variable tells us what values it takes and with what frequency it takes on these values.

Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the kind of the variable and how easy it is to draw them…. JMP makes easy work of graphing!

Categorical variablePie chart – I don’t recommend

pie charts!Bar graph – these are fine!

Categorical variablePie chart – I don’t recommend

pie charts!Bar graph – these are fine!

Quantitative variableHistogramStemplot

Quantitative variableHistogramStemplot

Page 7: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Categorical Variables

7

The distribution of a categorical variable lists the categories and gives the count or percent of individuals who fall into that category.

Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories – hard to draw and hard to interpret!

Bar graphs represent each category as a bar whose heights show the category counts or percents.

Page 8: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Pie Charts and Bar Graphs

8

Material Weight

(million tons)

Percent of total

Food scraps 25.9 11.2%

Glass 12.8 5.5%

Metals 18.0 7.8%

Paper, paperboard 86.7 37.4%

Plastics 24.7 10.7%

Rubber, leather, textiles

15.8 6.8%

Wood 12.7 5.5%

Yard trimmings 27.7 11.9%

Other 7.5 3.2%

Total 231.9 100.0%

Page 9: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Quantitative Variables

9

The distribution of a quantitative variable tells us what values the variable takes on and the frequency with which it takes on those values.

Histograms show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class.

Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable – original values are not hidden as in the histogram.

Time plots plot each observation (on the vertical axis) against the time at which it was measured (on the horizontal axis).

Page 10: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

10

To construct a stemplot:

Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number).

Write the stems in a vertical column; draw a vertical line to the right of the stems.

Write each leaf in the row to the right of its stem; order leaves if desired.

Stemplots

Page 11: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Example: Weight data―Introductory Statistics class

Key

20|3 means203 pounds

Stems = 10sLeaves = 1s

10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0

Stems Leaves

2

2

5

Stemplots

11

Page 12: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

12

If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splitting the original stems.

Example: If all of the data values were between 150 and 179, then we may choose to use the following stems:

151516161717

Leaves 0–4 would go on each upper stem (first “15”), and leaves 5–9 would go on each lower stem (second “15”).

Stemplots

Page 13: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

13

For quantitative variables that take many values and/or large datasets:

Divide the possible values into classes (equal widths).

Count how many observations fall into each interval (may change to percents).

Draw picture representing the distribution―bar heights are equivalent to the number (percent) of observations in each interval.

JMP does all three of the above with a couple of clicks: Analyze -> Distribution -> choose the variable to be plotted.

Histograms

Page 14: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

14

Example: Weight data―Introductory Statistics class

Histograms

Page 15: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

15

In any graph of data, look for the overall pattern and for striking deviations from that pattern.

You can describe the overall pattern by its shape, center, and spread.

An important kind of deviation is an outlier, an individual that falls outside the overall pattern.

Examining Distributions

Page 16: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

16

A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other.

A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side.

A distribution is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side.

SymmetricSymmetric Skewed-leftSkewed-left Skewed-rightSkewed-right

Examining Distributions - Shape

Page 17: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

Alaska Florida

An important kind of deviation is an outlier. Outliers are observations that lie

outside the overall pattern of a distribution. Always look for outliers and try to explain

them.

This overall pattern is fairly

symmetrical, except for two

states that clearly do not

belong to the main trend.

Alaska and Florida have

unusual representation of

the elderly in their

populations.

Large gaps in the

distribution are places to

look for outliers….

Outliers

Page 18: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

18

A time plot shows behavior of a quantitative variable over time.

Time is always on the horizontal axis, and the variable being plotted is on the vertical axis.

Look for an overall pattern (trend), and deviations from this trend. Connecting the data points by lines may emphasize this trend.

Look for patterns that repeat at known regular intervals (seasonal variations)

Go over the US Regular Retail Gas Prices in JMP

Time Plots

Page 19: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

19

Time Plots

Look at the gas price data…

Page 20: 1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.

20

HW – use JMP whenever possible to draw the graphs…

HW: Begin reading Intro. to Ch.3 and section 3.1;

work through the Examples in 3.1;

do the Exercises #3.7, 3.10-3.14, 3.21, 3.24, 3.25, 3.27, 3.32, 3.33-3.36, 3.38 (JMP), 3.39