Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The...

31
Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics , Fourth Edition. Starnes, Yates, Moore

Transcript of Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The...

Page 1: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Analyzing Categorical Data&

Displaying Quantitative DataSection 1.1 & 1.2

Reference Text:

The Practice of Statistics, Fourth Edition.

Starnes, Yates, Moore

Page 2: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Starter Problem

• Antoinette plays a lot of golf. This summer she got a new driver and kept track of how far she hit her tee shots in several rounds. Look at these data (drive lengths in yards) and then write a few sentences that describe the lengths of her drives:

246 260 230 233 254 203 223 193 238 220 210 237

270 240 192 204 250 274 220 240 235 250 222

230 225 241 225 230 250 200 250 226 240

Page 3: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Today’s Objectives• Analyze pie charts and bar graphs

• Two way tables:– Marginal Distribution– Conditional Distribution

• A Titanic Disaster

• Analyze Dot Plots

• Describe CUSS your new best friend

• Stem and Leaf Plots: single, and back to back

• Histograms

Page 4: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Types of Variables• Categorical variables record which group or

category an individual belongs to.– What color is your hair?– What year are you in school?– What city do you live in?– Did the tee shot land in the fairway?– It does NOT make sense to average the results.

• Quantitative variables take on numeric values.– How tall is a person?– What score did a person get on the SAT?– How many desks are in a room?– How long was the tee shot?– It DOES make sense to average the results.

Page 5: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Visual Representation of Categorical Variables

• Categorical variables are typically represented by pie charts (for percents) or bar charts (percents or counts).

0

50

100

150

S M W D

Millions

Single

Married

Widowed

Divorced

Married? Count (M) Percent

Single 41.8 22.6

Married 113.3 61.1

Widowed 13.9 7.5

Divorced 16.3 8.8

Page 6: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Dilbert comics

Use a pie chart only when you want to emphasize each category’s relation to the whole. Pie charts are awkward to make by hand, but technology will do the job for you.

Page 7: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

What Makes a Good Bar Graph?

• Good– All bars have the same

width– X & Y axis’ labeled – Units– Title of Graph

• Bad– Bars have different

widths– Pictures replacing the

bars (see example next slide)

– No labels

Page 8: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Why Is This A Bad Bar Graph?

This ad for DIRECTV has multiple problems. How many can you point out?

Page 9: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Two Way Tables

Two – way tables are a visual representation of the possible relationships between two set of categorical data. The categories are labeled at the top and the left side of the table, with the frequency info appearing in the interior cells of the table. The “totals” of each row appear at the right, and the “totals” of each column appear at the bottom.

Page 10: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

“If you could have a new vehicle, would you want a sport utility vehicle or a sports car?

Entries in the body of the table are called joint

frequencies. The cells that contain the sum are called marginal frequencies.

Page 11: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,
Page 12: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Probability• When looking at a relative frequency

table the percent or ratio is also the probability of that event happening over the ENTIRE TOTAL.

If a random selection was made, What's the probability a male selects an SUV?

21/240

If a random selection was made, What's the probability a female selects an SUV?

135/240

If a random selection was made, What's the probability that a SUV is selected?

156/240

Page 13: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Probability• Notice how all the probabilities have a denominator of 240!

Its out of the entire table total! • Moral of the story… When asked for a probability that does

not have a preexisting condition… look for the specific characteristics desired in the table divided by the table total.

Or you can look at it this way…

Page 14: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Conditional probability• When we are calculating the probability of an event occurring given that

another event has occurred, we are describing conditional probability.• Certain conditions have been preselected, and now we much calculate the

probability based on that condition already happening.• When we have conditional probability our denominator value becomes the

column total or the row total depending on which condition is given.• Example:

What is the probability of selecting

a sports car given a male?

V.S.

What's the probability a male

selects an SUV?

Page 15: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Conditional Probability

• What if we knew one of the variables already? What is the probability that it’s a sports car GIVEN that it’s a male? Then our probability changes!!

= Probability( sports car given that it’s a male) =

Page 16: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Comparing Two Different Questions

What's the probability a male selects an sports car?

male selecting a sports car)=

What is the probability of selecting a sports car given a male?

=

Page 17: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Flashback!Titanic Disaster

• On April 15, 1912, the Titanic struck an iceberg and rapidly sank with only 710 of her 2,204 passengers and crew surviving. Data on survival of passengers are summarized in the table below

Survival Status

Class of Travel Survived Died Total

First Class 201 123

Second Class 118 166

Third Class 181 528

Total

Page 18: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Conditional Probability

201/324

528/709

201/500• P(survived)

500/1317 1) What is the percent of people who survived?

» Is this a marginal or conditional

2) Given the passenger survived, what are the percentages for each class?

» Is this a marginal or conditional– 3) Of those who died in first class, what percent of them were males?

Females?

Survival Status

Class of Travel Survived Died Total

First Class 201 123 324

Second Class 118 166 284

Third Class 181 528 709

Total 500 817 1317

Page 19: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Break!

- 5 Minutes

Page 20: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Section 1.2: Quantitative Data w/ Graphs

• Dotplots• CUSS• Histograms• Stemplots

Page 21: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Types of Variables• Categorical variables record which group or

category an individual belongs to.– What color is your hair?– What year are you in school?– What city do you live in?– Did the tee shot land in the fairway?– It does NOT make sense to average the results.

• Quantitative variables take on numeric values.– How tall is a person?– What score did a person get on the SAT?– How many desks are in a room?– How long was the tee shot?– It DOES make sense to average the results.

Page 22: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Visual representation of Quantitative Variables: Dotplots

• The most basic method is a dotplot.– Every data point can be seen on the plot.

• Construction method:– Draw a horizontal axis with a scale that covers the full range of

values for the variable.– Put a dot on (or above) the axis for each data point.– If data duplicate, stack them vertically.

• Construct a dotplot now of Antoinette’s drives:

246 260 230 233 254 203 223 193 238 220 210 237

270 240 192 204 250 274 220 240 235 250 222

230 225 241 225 230 250 200 250 226 240

Page 23: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Dotplot of Drive Data

• Based on the dotplot, estimate the center.– We see it around 230 or 240 yards.

• Estimate the spread.– Roughly from 190 to almost 280, so spread is about 90 yards.

• Describe the shape.– It appears “mound-shaped” with most of the data clustered at the

center and with tails at each end.

CalDrives200 220 240 260 280

Collection 1 Dot Plot

Page 24: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

C.U.S.S• C: Center

• Median, where is it?– Mean can also describe the center, but is not resistant…

• U: Unusual data points• Outliers! Are there any? We can calculate them…later in 1.3

• S: Spread• Describe the variability of the graph

(largest value – smallest value)• S: Shape

• How many peaks? Is the data clumped in a general location? Is data stretching to the right (skewed right). Is the data stretching to the left (skewed left).

• LASTLY…Always, ALWAYS C.U.S.S it out when describing graphs of data

Page 25: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Histograms

• Another important method is a histogram.– Individual data points cannot be seen on the plot.– Many data points are grouped together in vertical bars.

• Construction method:– Draw a horizontal axis with a scale that covers the full range of

values for the variable.– Decide bar width (also called class width) so that 5 to 10 bars will

cover the full range of data.– Set borders for bars, count frequencies, draw bars.– Use a vertical axis to show the bar height.

Page 26: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Histogram of Drive Data

Co

un

t

1

2

3

4

5

6

7

8

CalDrives180 200 220 240 260 280 300

Collection 1 Histogram

• From a visual examination, estimate the center, unusual points, spread and the shape. (CUSS)– As before, you should see the center around 230 to

240, no unusual points, the spread looks like 90, and the shape still looks like a mound.

Page 27: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

StemplotsAKA: Stem & Leaf Plots

• One way to organize numerical data is to make a stemplot.

• Lets turn to the board and walk through how to make a stemplot of the following data, found on pg 33

• 50 26 26 31 57 19 24 22 23 38

13 50 13 34 23 30 49 13 15 51

Page 28: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Stemplots check list

• Did we make a stemplot?• Did we talk about splitting stems

– 1122334455… upper and lower bounds• Did we talk about back to back stemplots?

• Good…now we can move on

Page 29: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Percent of Population Over 65 by State

4 9

5

6

7

8 8

9

10 0 0 2 9

11 0 1 1 3 4 4 4 6 9

12 0 0 3 4 4 5 5 5 6 6 6 6

13 0 1 3 3 4 4 5 6 7 7 9 9 9

14 2 3 4 5 5

15 2 3 7 9

16

17

18 6

Note: 4|9 = 4.9%

Page 30: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Today’s Objectives• Analyze pie charts and bar graphs• Two way tables:

– Marginal Distribution– Conditional Distribution

• A Titanic Disaster• Analyze Dot Plots• Describe CUSS your new best friend• Stem and Leaf Plots: single, and back to back• Histograms

California Standard 14.0Students organize and describe distributions of data by using a number of different methods, including frequency tables, histograms, standard line graphs and bar graphs, stem-and-leaf displays, scatterplots, and box-and-whisker plots.

Page 31: Analyzing Categorical Data & Displaying Quantitative Data Section 1.1 & 1.2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates,

Homework