Introduction to the Practice of Statistics Fourth Edition Chapter 1: Exploring Data.

Post on 12-Jan-2016

223 views 1 download

Tags:

Transcript of Introduction to the Practice of Statistics Fourth Edition Chapter 1: Exploring Data.

Introduction to thePractice of Statistics

Fourth Edition

Chapter 1:Exploring Data

RESOURCES/SUPPLIES:

Our textbook is The Practice of Statistics (Starnes, Yates, Moore, 4th ed.). Pay careful attention to the examples, the calculator procedures, and the AP Exam tips that are located in the page margins.

• I will refer to this book as TPS4e throughout the year. This textbook is well-aligned to the AP Statistics curriculum and the sample problems and activities will prepare you well for the AP Statistics exam.

• Companion Web Site: www.whfreeman.com/tps4e

RESOURCES/SUPPLIES:

• You will need a TI-84+ graphing calculator. (I have a class set to use in the classroom). I will be demonstrating problems using the TI-84 all year and tips on how to use this calculator are provided throughout the TPS4e textbook. The textbook also explains how to use the TI-89 as well as the TI Inspire.

• You will receive a packet with instructions for the TI-84 graphing calculator . Keep it on your binder since you will refer to it throughout the year.

• I recommend a large 2 ½” binder since I will provide a large number of AP Practice problems, handouts , and additional documents that you will find at my website www.hialeahhigh.org and that you may find helpful to print.

RESOURCES/SUPPLIES:

• Vocab Flash Cards

• Free Study Resources for AP Tests

• Textbook Website

• Free Response Questions

• Online Writing Lab, Quick Writing Reference

• Matching types of inference

• Khan Academy

I will be preparing you for the Advanced Placement Statistics Exam taking place on Thursday May 12, 2016 at 12 m.

This Exam is made of 2 Sections for a total of 3 hours. Section I: 40 MC, 90 minutes, 50 % of the exam score. No

penalty for guessing. Section II: 6 Free Response (FR), 90 minutes, 50 % of the

exam score. Questions 1-5 take about 13 minutes each and count for 75% of the Section II. The last question is an “Investigative Task” should take about 25 min and is worth 25% of the Section II score.

THE AP EXAM

THE AP EXAM

You CAN be successful on this exam IF you put forth the effort ALL YEAR LONG.

I will provide you with LOTS of preparation materials as well as insight from the grading of the exam.

I need you to provide the effort...

TOPIC OUTLINE:

THE TOPICS FOR THE AP STATISTICS ARE DIVIDED INTO 4 MAJOR THEMES:1. EXPLORATORY ANALYSIS( 20-30 %)2. PLANNING AND CONDUCTING A STUDY (10-15%)3. PROBABILITY ( 20-30%)4. STATISTICAL INFERENCE (30-40%)

WHAT IS STATISTICS?

The Science of Learning from Data The Collection and Analysis of Data

Experimental DesignChapter 4

Descriptive Statistics(Data Exploration)

Chapters 1, 2, 3

Inferential StatisticsChapters 8-12

ProbabilityChapter 5, 6, 7

BRANCHES OF STATISTICS:

THE PRACTICE OF STATISTICS, 4TH EDITION - FOR AP*

STARNES, YATES, MOORE

Chapter 1: Exploring DataIntroductionData Analysis: Making Sense of Data

CHAPTER 1EXPLORING DATA

Introduction: Data Analysis: Making Sense of Data

1.1 Analyzing Categorical Data

1.2 Displaying Quantitative Data with Graphs

1.3 Describing Quantitative Data with Numbers

INTRODUCTIONDATA ANALYSIS: MAKING

SENSE OF DATA

After this section, you should be able to…

DEFINE “Individuals” and “Variables”

DISTINGUISH between “Categorical” and “Quantitative” variables

DEFINE “Distribution”

DESCRIBE the idea behind “Inference”

LEARNING OBJECTIVES

DA

TA

AN

ALY

SIS

Statistics is the science of data. Data Analysis is the process of organizing,

displaying, summarizing, and asking questions about data.

Definitions:

Individuals – objects (people, animals, things) described by a set of data

Variable - any characteristic of an individual

Categorical Variable– places an individual into one of several groups or categories.

Quantitative Variable – takes numerical values for which it makes sense to find an average.

DA

TA

AN

ALY

SIS

A variable generally takes on many different values. In data analysis, we are interested in how often a variable takes on each value.

Definition:

Distribution – tells us what values a variable takes and how often it takes those values

2009 Fuel Economy Guide

MODEL MPG

1

2

3

4

5

6

7

8

9

Acura RL 22

Audi A6 Quattro 23

Bentley Arnage 14

BMW 5281 28

Buick Lacrosse 28

Cadillac CTS 25

Chevrolet Malibu 33

Chrysler Sebring 30

Dodge Avenger 30

2009 Fuel Economy Guide

MODEL MPG <new>

9

10

11

12

13

14

15

16

17

Dodge Avenger 30

Hyundai Elantra 33

Jaguar XF 25

Kia Optima 32

Lexus GS 350 26

Lincolon MKZ 28

Mazda 6 29

Mercedes-Benz E350 24

Mercury Milan 29

2009 Fuel Economy Guide

MODEL MPG <new>

16

17

18

19

20

21

22

23

24

Mercedes-Benz E350 24

Mercury Milan 29

Mitsubishi Galant 27

Nissan Maxima 26

Rolls Royce Phantom 18

Saturn Aura 33

Toyota Camry 31

Volkswagen Passat 29

Volvo S80 25

MPG14 16 18 20 22 24 26 28 30 32 34

2009 Fuel Economy Guide Dot Plot

Variable of Interest:MPG

Variable of Interest:MPG

Dotplot of MPG Distribution

Dotplot of MPG Distribution

ExampleExample

MPG14 16 18 20 22 24 26 28 30 32 34

2009 Fuel Economy Guide Dot Plot

2009 Fuel Economy Guide

MODEL MPG <new>

9

10

11

12

13

14

15

16

17

Dodge Avenger 30

Hyundai Elantra 33

Jaguar XF 25

Kia Optima 32

Lexus GS 350 26

Lincolon MKZ 28

Mazda 6 29

Mercedes-Benz E350 24

Mercury Milan 29

2009 Fuel Economy Guide

MODEL MPG <new>

16

17

18

19

20

21

22

23

24

Mercedes-Benz E350 24

Mercury Milan 29

Mitsubishi Galant 27

Nissan Maxima 26

Rolls Royce Phantom 18

Saturn Aura 33

Toyota Camry 31

Volkswagen Passat 29

Volvo S80 25

2009 Fuel Economy Guide

MODEL MPG

1

2

3

4

5

6

7

8

9

Acura RL 22

Audi A6 Quattro 23

Bentley Arnage 14

BMW 5281 28

Buick Lacrosse 28

Cadillac CTS 25

Chevrolet Malibu 33

Chrysler Sebring 30

Dodge Avenger 30

Add numerical summaries

DA

TA

AN

ALY

SIS

Examine each variable by itself.

Then study relationships among

the variables.

Start with a graph or graphs

How to Explore DataHow to Explore Data

DA

TA

AN

ALY

SIS

• A population is the collection of all outcomes, responses, measurements, or counts that are of interest. A sample is a subset of a population

PopulationPopulation

SampleSample

Collect data from a representative Sample...

Perform Data Analysis, keeping probability in mind…

Make an Inference about the Population.

ACTIVITY: HIRING DISCRIMINATIONFollow the directions on Page 5

Perform 5 repetitions of your simulation.

Turn in your results to your teacher.

Teacher: Right-click (control-click) on the graph to edit the counts.

Data

Analy

sis

INTRODUCTIONDATA ANALYSIS: MAKING SENSE OF DATA

In this section, we learned that…

A dataset contains information on individuals.

For each individual, data give values for one or more variables.

Variables can be categorical or quantitative.

The distribution of a variable describes what values it takes and how often it takes them.

Inference is the process of making a conclusion about a population based on a sample set of data.

SUMMARY

LOOKING AHEAD…

We’ll learn how to analyze categorical data.Bar GraphsPie ChartsTwo-Way TablesConditional Distributions

We’ll also learn how to organize a statistical problem.

In the next Section…

CW # 1. PG. 7 EXC. 2, 4, 6HW # 2. PG. 7 EXC. 1,3,5,7,8

Practice:

RECALL OUR EARLIER QUESTION 1

1. What percent of the 60 randomly chosen fifth grade students have an IQ score of at least 120?

Numerically?

How to Represent

Graphically?18.3%+15%+3.3%=36.6%

(11+9+2)/60=.367 or 36.7%

Grey Shaded Region corresponds to this 36.6% of data

What is Different Fromthe Histogram we Generated

In Class??

Let’s Look at the Distribution we Just Created:•Overall Pattern:

Shape (modes, tails (skewness), symmetry) Center (mean, median)Spread (range, IQR, standard deviation)

•Deviations:Outliers

Descriptors we will be interested

in for data and population

distributions.

•Overall Pattern:Shape, Center, Spread?

•Deviations:Outliers?

Example 1.9 page 18-19

Data Analysis – An Interesting Example (Example 1.10, p. 9-10)

80 Calls

•Overall Pattern:Shape, Center, Spread?

•Deviations:Outliers?

Time Plots – For Data Collected Over Time…

Example: Mississippi River Discharge p.19 (data p. 21)

Example – Dealing with Seasonal Variation

EXTRA SLIDES FROM HOMEWORK

Problem 1.19

Problem 1.20

Problem 1.21

Problem 1.31

Problem 1.36

Problem 1.37-1.38

Problem 1.19, page 30

Problem 1.20, page 31

Problem 1.21, page 31

Problem 1.31, page 36

Problem 1.36, page 38

Problems 1.37 – 1.39

Section 1.2Describing

Distributions with Numbers

TYPES OF MEASURES

Measures of Center:Mean, Median, Mode

Measures of Spread:Range (Max-Min), Standard Deviation, Quartiles, IQR

MEANS AND MEDIANS

Consider the following sample of test scores from one of Dr. L.’s recent classes (max score = 100):

65, 65, 70, 75, 78, 80, 83, 87, 91, 94

What is the Average (or Mean) Test Score?

What is the Median Test Score?

Consider the following sample of test scores from one of Dr. L.’s recent classes (max score = 100):

65, 65, 70, 75, 78, 80, 83, 87, 91, 94

Draw a Stem and Leaf Plot (Shape, Center, Spread?)Find the Mean and the MedianLet’s Use our TI-83 Calculators! Enter data into a list via Stat|EditStat|Calc|1-Var StatsWhat happens to the Mean and Median if the lowest

score was 20 instead of 65?What happens to the Mean and Median if a low score

of 20 is added to the data set (so we would now have 11 data points?)

What can we say about the Mean versus the Median?

Quartiles: Measures of Position

A Graphical Representation of Position of Data(It really gives us an indication of how the data is spread

among its values!)

Using Measures of Position to Get Measures of Spread

And what was the range again???

5 NUMBER SUMMARY, IQR, BOX PLOT, AND WHERE OUTLIERS WOULD BE FOR TEST SCORE DATA:

65, 65, 70, 75, 78, 80, 83, 87, 91, 94

What do we notice about symmetry?

HISTOGRAMS OF FLOWER LENGTHSPROBLEM 1.58GENERATED VIA MINITAB

length

Perc

ent

514845423936

48

36

24

12

0

514845423936

48

36

24

12

0

bihai red

yellow

Panel variable: variety

Histogram of Flower Length

Box Plots for Flower Lengths

30

35

40

45

50

55

Bihai Red Yellow

Flower Color

Len

gth

s (i

n m

m)

Bihai Red Yellow

Median 47.12 39.16 36.11

Q1 46.71 38.07 35.45

Min or In Fence 46.34 37.4 34.57

Max or In Fence 50.26 43.09 38.13

Q348.24

5 41.69 36.82

BOX PLOT AND 5-NUMBER SUMMARY FOR FLOWER LENGTH DATAGENERATED VIA BOX PLOT MACRO FOR EXCEL

Outliers?

Remember this histogram from the Service Call Length Data on page 9? How do you expect the Mean and Median to compare for this data?

Mean 196.6, Median 103.5

Box Plot for Call Length Data

MORE ON MEASURES OF SPREAD

Data Range (Max – Min)IQR (75% Quartile minus 25% Quartile

2, range of middle 50% of data)Standard Deviation (Variance)Measures how the data deviates from the mean….hmm…how can we do this?

Recall the Sample Test Score Data: 65, 65, 70, 75, 78, 80, 83, 87, 91, 94

Recall the Sample Mean (X bar) was 78.8…

COMPUTING VARIANCE AND STD. DEV. BY HAND AND VIA THE TI83:Recall the Sample Test Score Data:

65, 65, 70, 75, 78, 80, 83, 87, 91, 94

Recall the Sample Mean (X bar) was 78.8

x

65 70 75 80 9085 95

65 83

78.8

-13.8 4.2

What does the number 4.2 measure? How

about -13.8?

Consider (again!) the following sample of test scores from one of Dr. L.’s recent classes (max score = 100):

65, 65, 70, 75, 78, 80, 83, 87, 91, 94What happens to the standard deviation and the location of the 1st and 3rd quartiles if the lowest score was 20 instead of 65?

What happens to the standard deviation and the location of the 1st and 3rd quartiles if a low score of 20 is added to the data set (so we would now have 11 data points?)

What can we say about the effect of outliers on the standard deviation and the quartiles of a data set?

Effects of Outliers on the Standard Deviation

Example 1.18:Stemplots of Annual Returns forStocks (a) and Treasury bills (b)On page 53 of text. What are the

stem and leaf units????

Consider (again!) the following sample of test scores from one of Dr. L.’s recent classes (max score = 100):

65, 65, 70, 75, 78, 80, 83, 87, 91, 94Xbar=78.8 s=10.2 (rounded)

Suppose we “curve” the grades by adding 5 points to every test score (i.e. Xnew=Xold+5). What will be new mean and standard deviation?

Suppose we “curve” the grades by multiplying every test score times 1.5 (i.e. Xnew=1.5*Xold). What will be the new mean and standard deviation?

Suppose we “curve” the grades by multiplying every test score times 1.5 and adding 5 points (i.e. Xnew=1.5*Xold+5). What will be the new mean and standard deviation?

Effects of Linear Transformations on the MeanAnd Standard Deviation

Box Plots for Problems 1.62-1.64

Section 1.3Density Curves and Normal Distributions

BASIC IDEAS

One way to think of a density curve is as a smooth approximation to the irregular bars of a histogram.

It is an idealization that pictures the overall pattern of the data but ignores minor irregularities.

Oftentimes we will use density curves to describe the distribution of a single quantitative continuous variable for a population (sometimes our curves will be based on a histogram generated via a sample from the population).

Heights of American WomenSAT Scores

The bell-shaped normal curve will be our focus!

Shape?Center?Spread?

Density Curve

Page 64

Sample Size =105

Shape?Center?Spread?

Density Curve

Page 65

Sample Size=72 Guinea pigs

1. What proportion (or percent) of seventh graders from Gary,Indiana scored below 6?

2. What is the probability (i.e. how likely is it?)that a randomly chosenseventh grader from Gary, Indiana will have a test score less than 6?

Two Different butRelated Questions!

Example 1.22Page 66

Sample Size = 947

Relative “area under the curve”

VERSUSRelative “proportion of

data” in histogrambars.

Page 67 of text

Shape?Center?Spread?

The classic “bell shaped” Density curve.

A “skewed” density curve.Median separates area under curve into two equal areas

(i.e. each has area ½)

What is the geometric interpretationof the mean?

The mean as “center of mass” or “balance point” of the density curve

The normal density curve!Shape? Center? Spread?

Area Under Curve?

How does the magnitude of the standard deviation affect a density curve?

How does the standard deviation affect the shape of the normal density curve?

Assume Same Scale onHorizontal and Vertical

(not drawn) Axes.

The distribution of heights of young women (X) aged 18 to 24 is approximately normal with mean mu=64.5 inches and standard deviation sigma=2.5 inches (i.e. X~N(64.5,2.5)). Lets draw the density curve for X and observe the empirical rule!

(aka the “Empirical Rule”)

Example 1.23, page 72How many standard deviations from the mean height is the height of a woman who is 68 inches? Who is 58 inches?

The Standard Normal Distribution

(mu=0 and sigma=1)

Horizontal axis in units of z-score!

Notation:Z~N(0,1)

Let’s find some proportions (probabilities) using normal distributions!

Example 1.25 (page 75)Example 1.26 (page 76)(slides follow)

Let’s draw the distributions by hand

first!

Example 1.25, page 75

TI-83 Calculator Command: Distr|normalcdfSyntax: normalcdf(left, right, mu, sigma) = area under curve from left to right

mu defaults to 0, sigma defaults to 1Infinity is 1E99 (use the EE key), Minus Infinity is -1E99

Example 1.26, page 76

Let’s find the same probabilities using z-scores!

On the TI-83: normalcdf(720,820,1026,209)

THE INVERSE PROBLEM:GIVEN A NORMAL DENSITY PROPORTION OR PROBABILITY, FIND THE CORRESPONDING Z-SCORE!

What is the z-score such that 90% of the data has a z-score less than that z-score?

(1) Draw picture!(2) Understand what you are solving for!(3) Solve approximately! (we will also use

the invNorm key on the next slide)

Now try working Example 1.30 page 79!(slide follows)

TI-83: Use Distr|invNorm

Syntax:invNorm(area,mu,sigma) gives value of x with area to left of x under normal curve with mean mu and standard deviation sigma.

invNorm(0.9,505,110)=?invNorm(0.9)=?

Page 79

How can we use our TI-83s to solve this??

How can we tell if our data is “approximately normal?”

Box plots and histograms should show essentially symmetric, unimodal data. Normal Quantile plots are also used!

Histogram and Normal Quantile Plot for Breaking Strengths (in pounds) of Semiconductor Wires

(Pages 19 and 81 of text)

Histogram and Normal Quantile Plot for Survival Time of Guinea Pigs (in days) in a Medical Experiment

(Pages 38 (data table), 65 and 82 of text)

USING EXCEL TO GENERATE PLOTS

Example Problem 1.30 page 35Generate Histogram via MegastatGet Numerical Summary of Data via Megastat or Data Analysis AddinGenerate Normal Quantile Plot via Macro (plot on next slide)

Normal Quantile plot for Problem 1.30 page 35

EXTRA SLIDES FROM HOMEWORKProblem 1.80

Problem 1.82

Problem 1.119

Problem 1.120

Problem 1.121

Problem 1.222

Problem 1.129

Problem 1.135

Problem 1.80 page 84

Problem 1.83 page 85

Problem 1.119 page 90

Problem 1.120 page 90

Problem 1.121 page 92

Problem 1.122 page 92

Problem 1.129 page 94

Problem 1.135 page 95-96