Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16...

111
Cedar Crest College Statistics 110 Lecture Notes Author: James Hammer E-Mail Address: [email protected] June 23, 2015

Transcript of Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16...

Page 1: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Cedar Crest College

Statistics 110

Lecture Notes

Author:James Hammer

E-Mail Address:[email protected]

June 23, 2015

Page 2: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Preface

This is meant as a teaching aid. It can be freely distributed andedited in any way. For a copy of the LATEX document, please emailthe author.

ii

Page 3: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Contents

1 Nature of Probability and Statistics 1

1.1 Descriptive and Inferential Statistics . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Variables and Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Data Collection and Sampling Techniques . . . . . . . . . . . . . . . . . . . 5

1.4 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Frequency Distribution and Graphs 7

2.1 Organizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Histograms, Frequency Polygons, and Ogives . . . . . . . . . . . . . . . . . . 10

2.3 Other Types of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Data Description 19

3.1 Measures of Central Tendencies . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Measures of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Measures of Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Probability and Counting Rules 41

4.1 Sample Spaces and Probability . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 The Addition Rules for Probability . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 The Multiplication Rules & Conditional Probability . . . . . . . . . . . . . . 49

4.4 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.5 Probability and Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Discrete Probability Distributions 65

5.1 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2 Mean, Variance, Standard Deviation, and Expectation . . . . . . . . . . . . 67

5.3 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6 The Normal Distribution 75

6.1 Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Applications of the Normal Distribution . . . . . . . . . . . . . . . . . . . . 77

6.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

iii

Page 4: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

7 Confidence Intervals and Sample Size 837.1 Confidence Intervals for the Mean Shen σ is Known . . . . . . . . . . . . . . 837.2 Confidence Intervals for the Mean When σ is Unknown . . . . . . . . . . . . 877.3 Confidence Intervals and Sample Size for Populations . . . . . . . . . . . . . 89

8 Hypothesis Testing 938.1 Traditional Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.2 z Test for a Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958.3 t Test for a Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 988.4 z Test for a Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

10 Correlation and Regression 10310.1 Scatter Plots and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 103

iv

Page 5: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 1

Nature of Probability and Statistics

1.1 Descriptive and Inferential Statistics

Definition 1 (Statistics). Statistics is the science of conducting studies to collect, organize,summarize, analyze, and draw conclusions from data.

Definition 2 (Variable). A variable is a characteristic of attribute that can assume differentvalues.

Definition 3 (Data). The data are the values (measurements or observations) that thevariables can assume. Variables whose values are determined by chance are called randomvariables.

Definition 4 (Population). A population consists of all subjects (human or otherwise)that are being studied.

Definition 5 (Sample). A sample is a group of subjects selected from a population.

Definition 6 (Descriptive Statistics). Descriptive statistics consists of the collection,organization, summarization, and presentation of data.

Definition 7 (Inferential Statistics). Inferential statistics consists of generalizing fromsample to populations, performing estimations and hypothesis tests, determining relationshipamong variables and making predictions.

Definition 8 (Probability). Probability is the measure of the likeliness that an event willoccur. Probability is quantified as a number between 0 and 1. The higher the probability ofan event, the more likely the event will occur.

Definition 9 (Hypothesis Testing). The area of inferential statistics called hypothesistesting is a decision-making process for evaluating claims about a population based oninformation obtained from samples.

1

Page 6: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 2 Statistics 110 Lecture Notes

Problem 10. Determine whether descriptive statistics or inferential statistics were used.Explain why.

a. The average jackpot for the top five lottery winners was $367.6 million.

b. A study done by the American Academy of Neurology suggested that older people whohad a high caloric diet more than doubled their risk of memory loss.

c. Based on a survey of 9317 consumers done by the National Retail Federation, theaverage amount that consumers spent on Valentines Day in 2011 was $116.

d. Scientists at the University of Oxford in England found that a good laugh significantlyraises a person’s pain level tolerance.

Suggested Problems: 1-19.

Cedar Crest College June 23, 2015

Page 7: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

1.2. Variables and Types of Data Hammer 3

1.2 Variables and Types of Data

Definition 11 (Qualitative Variables). Qualitative variables are variables that have dis-tinct categories according to some characteristic or attribute.

Definition 12 (Quantitative Variables). Quantitative variables are variables that can becounted or measured.

Definition 13 (Discrete Variables). Discrete variables assume values that can be counted.

Definition 14 (Continuous Variables). Continuous variables can assume an infinite num-ber of values between any two specific values. They are obtained by measuring. They ofteninclude fractions and decimals.

Problem 15. Classify each of the variables as a discrete variable or as a continuous variable.

a. The highest wind speed of a hurricane.

b. The weight of baggage on an airplane.

c. The number of pages in these lecture notes.

d. The amount of money a person spends per year for online purchases.

June 23, 2015 Cedar Crest College

Page 8: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 4 Statistics 110 Lecture Notes

Definition 16 (Boundary). The boundary of a number is defined as a class in which adata value would be placed before the data values are rounded.

Problem 17. Find the boundaries of each variable.

a. 8.4 quarts

b. 138 mmHg

c. 137.63 mg/dL

Definition 18 (Nominal Level of Measurement). The nominal level of measurementclassifies data into mutually exclusive (nonoverlapping) categories in which no order or rank-ing can be imposed on the data.

Definition 19 (Ordinal Level of Measurement). The ordinal level of measurement clas-sifies data into categories that can be ranked; however, precise differences between the ranksdo not exist.

Definition 20 (Interval Level of Measurement). The interval level of measurementranks data, and precise differences between units of measure do exist; however, there is nomeaningful zero.

Definition 21 (Ratio Level of Measurement). The ratio level of measurement possessesall of the characteristics of interval measurement, and there exists a true zero. In addi-tion, true ratios exist when the same variable is measured on two different members of thepopulation.

Problem 22. What level of measurement would be used to measure each variable?

a. The age of patients in a local hospital.

b. The rating of movies released this month.

c. Colors of athletic shirts sold by the Iron Pigs.

d. Temperatures of hot tubs in local health clubs.

Suggested Problems: 5− 22

Cedar Crest College June 23, 2015

Page 9: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

1.3. Data Collection and Sampling Techniques Hammer 5

1.3 Data Collection and Sampling Techniques

Definition 23 (Random Sample). A random Sample is a sample in which all membersof the population have an equal chance of being selected.

Definition 24 (Systematic Sample). A systematic sample is a sample obtained by select-ing every kth member of the population.

Definition 25 (Stratified Sample). A stratified sample is a sample obtained by dividingthe population into subgroups or strata according to some characteristic relevant to thestudy. The subjects are selected from each subgroup.

Definition 26 (Cluster Sample). A cluster sample is obtained by dividing the populationinto sections or clusters and then selecting one or more clusters and using all members inthe cluster(s) as the members of the sample.

Definition 27 (Sampling Error). Sampling error is the difference between the resultsobtained from a sample and the results obtained from the population from which the samplewas selected.

Definition 28 (Nonsampling Error). A nonsampling error occurs when the data areobtained erroneously or the sample is biased (non-representative).

Problem 29. State which sampling method was used in the following:

a. Out of 10 hospitals in the municipality, a researcher selects one and collects recordsfor a 24-hour period on the types of emergencies that were treated here.

b. A researcher divides a group of students according to gender; major field; and low,average, and high grade point average. Then she randomly selects six students fromeach group to answer questions in the survey.

c. The subscribers to a magazine are numbered. Then a sample of these people is selectedusing random numbers.

d. Every 10th bottle of Super-Duper Cola is selected, and the amount of liquid in thebottle is measured. The purpose is to see if the machines that fill the bottles areworking properly.

Suggested Problems: 11− 16

June 23, 2015 Cedar Crest College

Page 10: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 6 Statistics 110 Lecture Notes

1.4 Experimental Design

Definition 30 (Observational Study). In an observational study, the researcher merelyobserves what is happening or what has happened in the past and tries to draw conclusionsbased on these observations.

Definition 31 (Experimental Study). In an experimental study, the researcher manip-ulates one of the variables and tries to determine how the manipulation influences othervariables.

Definition 32 (Variable Types). The independent variable in an experimental study isthe one that is being manipulated by the researcher. The independent variable is also calledthe explanatory variable. The resultant variable is called the dependent variable orthe outcome variable.

Definition 33 (Confounding Variables). A confounding variable or sometimes called anuisance variable is one that influences the dependent or outcome variables but was notseparated from the independent variable.

Problem 34. Researchers randomly assign 10 people to each of three different groups.Group 1 was instructed to write an essay about the hassles in their lives. Group 2 wasinstructed to write an essay about circumstances that made them feel thankful. Group 3was asked to write an essay about events that they felt neutral about. After the exercise,they were given a questionnaire on their outlook on life. The researchers found that thosewho wrote about circumstances that made them feel thankful had a more optimistic outlookon life. The conclusion is that focusing on the positive makes you more optimistic about lifein general. Based on this study, answer the following questions:

a. Was this an observational or an experimental study?

b. What is the independent variable?

c. What is the dependent variable?

d. What may be a confounding variable in this study?

e. What can you say about the sample size?

f. Do you agree with the conclusion? Please explain your answer.

Suggested Problems: 19− 31, 37− 42

Cedar Crest College June 23, 2015

Page 11: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 2

Frequency Distribution and Graphs

2.1 Organizing Data

Definition 35 (Raw Data). When the data are in original form, they are called raw data.

Definition 36 (Frequency Distribution). A frequency distribution is the organization ofraw data in table form, using classes and frequencies.

Problem 37. Twenty-five army inductees were given a blood test to determine their bloodtype. The data set is:

A B B AB OO O B AB BB B O A OA O O O AB

AB A O B A.

Construct a frequency distribution for the data.

7

Page 12: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 8 Statistics 110 Lecture Notes

Definition 38 (Lower Class Limit). The lower class limit is the smallest data value thatcan be included in the class.

Definition 39 (Upper Class Limit). The upper class limit is the largest data value thatcan be included in the class.

Definition 40 (Class Boundaries). The class boundaries are ranges used to separate theclasses so that there are no gaps in the frequency distribution.

Definition 41 (Class Width). The class width for a class in a frequency distribution isfound by subtracting the lower (or upper) class limit of one class from the lower (or upper)class limit of the next class.

Definition 42 (Class Midpoint). The class midpoint Xm is obtained by adding the lowerand upper boundaries and dividing by two or adding the lower and upper limits and dividingmy two.

Definition 43 (Open-Ended Distribution). An open-ended distribution is a frequencydistribution where either the first class has no lower bound or the last class has no upperbound.

Problem 44. These data represent the record high temperatures in degrees Fahrenheit foreach of the 50 states. Construct a grouped frequency distribution for the data, using 7classes.

112 100 127 120 134 118 105 110 109 112110 118 117 116 118 122 114 114 105 109107 112 114 115 118 117 118 122 106 110116 108 110 121 113 120 119 111 104 111120 113 120 117 105 110 118 112 114 114

Cedar Crest College June 23, 2015

Page 13: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

2.1. Organizing Data Hammer 9

Definition 45 (Cumulative Frequency Distribution). A cumulative frequency distribu-tion is a distribution that shows the number of data values less than or equal to a specificvalue (usually an upper boundary).

Definition 46 (Ungrouped Frequency Distribution). An ungrouped frequency distribu-tion is a distribution where the range of data values is relatively small; therefore, a frequencydistribution can be constructed using single data values for each class.

Problem 47. The data shown here represent the number of miles per gallon (mpg) that30 selected four-wheel-drive sports utility vehicles obtained in city driving. Construct afrequency distribution, and analyze the distribution.

12 17 12 14 16 1816 18 12 16 17 1515 16 12 15 16 1612 14 15 12 15 1519 13 16 18 16 14

Suggested Problems: 5− 8, 16− 26

June 23, 2015 Cedar Crest College

Page 14: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 10 Statistics 110 Lecture Notes

2.2 Histograms, Frequency Polygons, and Ogives

Definition 48 (Histogram). A histogram is a graph that displays the data by using con-tiguous vertical bars (unless the frequency of a class is 0) of various heights to represent thefrequencies of the classes.

Definition 49 (Frequency Polygon). The frequency polygon is a graph that displays thedata by using lines that connect points plotted for the frequencies at the midpoints of theclasses. The frequencies are represented by the heights of the points.

Definition 50 (Ogive). The ogive is a graph that represents the cumulative frequencies forthe classes in a frequency distribution

Definition 51 (Lower Class Limit). The lower class limit represents the smallest datavalue that can be included in the class.

Definition 52 (Upper Class Limit). The upper class limit represents the largest datavalue that can be included in the class.

Definition 53 (Class Midpoint). The class midpoint, Xm, is obtained by adding the lowerand the upper boundaries and dividing by 2.

Problem 54. Construct a histogram, frequency polygon, and an ogive to represent the datashown for the record high temperatures for each of the 50 states.

Class Bondaries Frequency99.5− 104.5 2104.5− 109.5 8109.5− 114.5 18114.5− 119.5 13119.5− 124.5 7124.5− 129.5 1129.5− 134.5 1

Cedar Crest College June 23, 2015

Page 15: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

2.2. Histograms, Frequency Polygons, and Ogives Hammer 11

Definition 55 (Relative Frequency Graphs). Histograms, frequency polygons, and ogivescan be constructed from raw data as done above or by using proportions. When proportionsare used instead of raw data, the graphs are called relative frequency graphs.

Problem 56. Construct a histogram, frequency polygon, and ogive using relative frequenciesfor the distribution of miles that 20 randomly selected runners ran during a given week.

Class Bondaries Frequency5.5− 10.5 110.5− 15.5 215.5− 20.5 320.5− 25.5 525.5− 30.5 430.5− 35.5 335.5− 40.5 2

Suggested Problems: 1− 11

June 23, 2015 Cedar Crest College

Page 16: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 12 Statistics 110 Lecture Notes

2.3 Other Types of Graphs

Definition 57 (Bar Graph). A bar graph represents the data by using vertical or horizontalbars whose heights or lengths represent the frequencies of the data.

Problem 58. The table shows the average money spent by first-year college students. Drawa horizontal and vertical bar graph for the data.

Electronics $728Dorm Decor $344Clothing $141Shoes $72

Cedar Crest College June 23, 2015

Page 17: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

2.3. Other Types of Graphs Hammer 13

Definition 59 (Pareto Chart). A pareto chart is used to represent a frequency distributionfor a categorical variable, and the frequencies are displayed by the heights of vertical bars,which are arranged in order from highest to lowest.

Problem 60. The data shown here consists of the number of police calls for specific cate-gories that a local municipality received in 2011. Draw and analyze a pareto chart for thedata.

Category NumberJuvenile complaint 92Loud noise/music/party 27Drug offenses 79Driving under the influence 38Disabled vehicle 65

June 23, 2015 Cedar Crest College

Page 18: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 14 Statistics 110 Lecture Notes

Definition 61 (Time Series Graph). A time series graph represents data that occur overa specific period of time.

Definition 62 (Compound Time Series Graph). When two or more data sets are comparedon the same graph, the graph is called a compound time series graph.

Problem 63. The data show in the percentage of U.S. adults who smoke. Draw and analyzea time series graph for the data.

Year 1970 1980 1990 2000 2010Percent 37 33 25 23 19

Cedar Crest College June 23, 2015

Page 19: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

2.3. Other Types of Graphs Hammer 15

Definition 64 (Pie Graph). A pie graph is a circle that is divided into sections or wedgesaccording to the percentage of frequencies in each category of the distribution.

Problem 65. This frequency distribution shows the number of pounds of each snack foodeaten during the Super Bowl. Construct a pie graph for the data.

Snack Pounds (frequency)Potato Chips 11.2millionTortilla Chips 8.2millionPretzels 4.3millionPopcorn 3.8millionSnack Nuts 2.5million

June 23, 2015 Cedar Crest College

Page 20: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 16 Statistics 110 Lecture Notes

Definition 66 (Dotplot). A dotplot is a statistical graph in which each data value is plottedas a point (dot) above the horizontal axis.

Problem 67. The data show the number of named storms each year for the last 40 years.Construct and analyze a dotplot for the data.

19 15 14 7 6 11 119 16 8 8 11 9 816 12 13 14 13 12 715 15 19 11 4 6 1310 15 7 12 6 1028 12 8 7 12 9

Cedar Crest College June 23, 2015

Page 21: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

2.3. Other Types of Graphs Hammer 17

Definition 68 (Stem and Leaf Plot). A stem and leaf plot is a data plot that uses part ofthe data values as the stem and part of the data values as the leaf to form groups or classes.

Problem 69. At an outpatient testing center, the number of cardiograms performed eachday for 20 days is shown. Construct a stem and leaf plot for the data.

25 31 20 32 1314 43 02 57 2336 32 33 32 4432 52 44 51 45

Problem 70. The number of stories in two selected samples of tall buildings in Atlantaand Philadelphia is shown. Construct a back to back stem and leaf plot and compare thedistributions.

Atlanta Philadelphia55 70 44 36 40 61 40 38 32 3063 40 44 34 38 58 40 40 25 3060 47 52 32 32 54 40 36 30 3050 53 32 28 31 53 39 36 34 3352 32 34 32 50 50 38 36 39 3226 29

Suggested Problems: 1− 20

June 23, 2015 Cedar Crest College

Page 22: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 18 Statistics 110 Lecture Notes

Cedar Crest College June 23, 2015

Page 23: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 3

Data Description

3.1 Measures of Central Tendencies

Definition 71 (Statistic). A Statistic is a characteristic or measure obtained by using thedata values from a sample

Definition 72 (Parameter). A parameter is a characteristic or measure obtained by usingall the data values from a specific population.

Definition 73 (Mean). The mean is the sum of the values, divided by the total number ofvalues. This is also called the arithmetic average.

Definition 74 (Sample Mean). Let n denote the number of data points. The samplemean, X, is calculated by using the sample data. The sample mean is a statistic.

X =

∑ni=1Xi

n.

In a frequency distribution, the sample mean can be found by adding up the product of thefrequencies of each group, fi, with the midpoint of each group, Xm, and then dividing bythe number of participants. That is to say

X =

∑#groupsi=1 fi ·Xm

n.

Definition 75 (Population Mean). let N denote the total number of values in the popula-tion. The population mean, µ, is calculated by using all of the values in the population.The population mean is a parameter.

µ =

∑Ni=1Xi

N.

19

Page 24: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 20 Statistics 110 Lecture Notes

Problem 76. The number of calls that a local police department responded to for a sampleof 9 months is shown. Find the mean.

475, 447, 440, 761, 993, 1052, 783, 671, 621.

Problem 77. The data show the number of patients in a sample of six hospitals who acquiredan infection while hospitalized. Find the mean.

110, 76, 29, 38, 105, 31.

Problem 78. Using the following frequency distribution, find the mean. The data representthe number of miles run during one week for a sample of 20 runners.

Class Bondaries Frequency5.5− 10.5 110.5− 15.5 215.5− 20.5 320.5− 25.5 525.5− 30.5 430.5− 35.5 335.5− 40.5 2

Cedar Crest College June 23, 2015

Page 25: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.1. Measures of Central Tendencies Hammer 21

Definition 79 (Median). The median, MD, is the midpoint of the data array. Let n bethe number of data points. If n is odd, select the middle data value as the median. If n iseven, find the mean of the two middle values.

Problem 80. The number of police officers killed in the line of duty over the last 11 yearsis shown. Find the median.

177, 153, 122, 141, 189, 155, 162, 165, 149, 157, 240.

Problem 81. The number of tornadoes that have occurred in the U.S. over an 8-year periodfollows. Find the median.

684, 764, 656, 702, 856, 1133, 1132, 1303.

June 23, 2015 Cedar Crest College

Page 26: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 22 Statistics 110 Lecture Notes

Definition 82 (Mode). The value that occurs most often in a data set is called the mode. Adata set that only has one value that occurs with the greatest frequency is called unimodal,a data set that has two values that occur with the same greatest frequency is called bimodal,and a data set that has more than two values that occur with the same greatest frequency iscalled multimodal. A data set where every value occurs only once is said to have no mode.

Problem 83. Find the mode of the signing bonuses of eight NFL players for a specific year.The bonuses are in millions of dollars.

18.0, 14.0, 34.5, 10.0, 11.3, 10.0, 12.4, 10.0.

Problem 84. The data show the number of licensed nuclear reactors in the U.S. for a recent15-year period. Find the mode.

104 104 104 104 104107 109 109 109 110109 111 112 111 109

.

Problem 85. The number of accidental deaths due to firearms for a six-year period is shown.Find the mode.

649, 789, 642, 613, 610, 600.

Cedar Crest College June 23, 2015

Page 27: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.1. Measures of Central Tendencies Hammer 23

Definition 86 (Mode Class). The mode for grouped data is the modal class. The modalclass is the class with the largest frequency.

Problem 87. Find the modal class for the frequency distribution of miles that 20 runnersran in one week.

Class Bondaries Frequency5.5− 10.5 110.5− 15.5 215.5− 20.5 320.5− 25.5 525.5− 30.5 430.5− 35.5 335.5− 40.5 2

Problem 88. A small company consists of the owner, the manager, the salesperson, andtwo technicians, all of whose annual salaries are listed here. Find the mean, median, andmode.

Staff SalaryOwner $100, 000Manager $40, 000Salesperson $24, 000Technician $18, 000Technician $18, 000

.

June 23, 2015 Cedar Crest College

Page 28: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 24 Statistics 110 Lecture Notes

Definition 89 (Midrange). The midrange is defined as the sum of the lowest and highestvalues in the data set, divided by 2. The symbol MR is used for the midrange.

MR =lowest value + highest value

2.

Problem 90. The number of bank failures for a recent five-year period is shown. Find themidrange.

3, 30, 148, 157, 71.

Problem 91. Find the midrange of data for NFL singning bonuses. The bonuses in millionsof dollars are

18.0, 14.0, 34.5, 10.0, 11.3, 10.0, 12.4, 10.0.

Find the midrange.

Cedar Crest College June 23, 2015

Page 29: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.1. Measures of Central Tendencies Hammer 25

Definition 92 (Weighted Mean). Let w1, w2, . . . , wn be weights for the given values X1, X2,. . . , Xn. The weighted mean of a variable X is found by multiplying each value by itscorresponding weight and dividing the sum of the products by the sum of the weights.

X =

∑ni=1wiXi∑ni=1wi

.

Problem 93. A student received an A in English Composition I (3 credits), a C in Introduc-tion to Psychology (3 credits), a B in Biology I (4 credits), and a D in Physical Education (2credits). Assuming that an A is 4 grade points, a B is 3 grade points, a C is 2 grade points,a D is 1 grade point, and an F is 0 grade points, find the student’s grade point average.

Suggested Problems: 1− 11, 23− 28

June 23, 2015 Cedar Crest College

Page 30: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 26 Statistics 110 Lecture Notes

3.2 Measures of Variation

Problem 94. A testing lab wishes to test two experimental brands of outdoor paint to seehow long each will last before fading. The testing lab makes 6 gallons of each paint to test.Since different chemical agents are added to each group and only six cans are involved, thesetwo groups constitute two small populations. The results (in months) are shown. Find themean of each group. Then analyze the data with a dot plot.

Brand A Brand B10 3560 4550 3030 3540 4020 25

Definition 95 (Range). The range is the highest value minus the lowest value. The symbolR is used for the range.

R = highest value− lowest value.

Problem 96. Find the range for the paints in Problem 94.

Cedar Crest College June 23, 2015

Page 31: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.2. Measures of Variation Hammer 27

Definition 97 (Population Variance). Let X denote the individual value, µ denote thepopulation mean, and N denote the population size. The population variance is theaverage of the squares of the distance each value is from the mean. This statistic comparesthe distance each data point is from the mean. σ2.

σ2 =

∑Ni=1 (Xi − µ)2

N.

Definition 98 (Population Standard Deviation). The population standard deviation isthe square root of the variance. The standard deviation measures the amount of variationor dispersion of a set of data values. The symbol for the population standard deviation is σ.

σ =√σ2 =

√∑Ni=1 (Xi − µ)2

N.

Note 99. When the means are equal, the larger the variance or standard deviation is, themore variable the data are.

Problem 100. Find the variance and the standard deviation for the data set for brand Apaint in Problem 94. The number of months brand A lasted before fading was

10, 60, 50, 30, 40, 20.

Problem 101. Find the variance and the standard deviation for the data set for brand Bpaint in Problem 94. The number of months brand B lasted before fading was

35, 45, 35, 30, 40, 25.

June 23, 2015 Cedar Crest College

Page 32: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 28 Statistics 110 Lecture Notes

Definition 102 (Sample Variance). Let X denote the individual value, X denote the samplemean, and n denote the sample size. The formula for sample variance is

s2 =

∑(X −X

)2n− 1

.

Definition 103 (Sample Standard Deviation). Let X denote the individual value, X de-note the sample mean, and n denote the sample size. The formula for sample standarddeviationis

s =

√∑(X −X

)2n− 1

.

Problem 104. The number of public school teacher strikes in Pennsylvania for a randomsample of school years is shown. Find the sample variance and the sample standard deviation.

9, 10, 14, 7, 8, 3.

Cedar Crest College June 23, 2015

Page 33: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.2. Measures of Variation Hammer 29

Note 105. The shortcut formulas for computing the variance and standard deviation fordata obtained from samples are as follows.

s2 =n (∑X2)− (

∑X)2

n (n− 1)s =

√n (∑X2)− (

∑X)2

n (n− 1).

Problem 106. The number of public school teacher strikes in Pennsylvania for a randomsample of school years is shown. Use the shortcut formulas to find the sample variance andthe sample standard deviation.

9, 10, 14, 7, 8, 3.

June 23, 2015 Cedar Crest College

Page 34: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 30 Statistics 110 Lecture Notes

Definition 107 (Grouped Data Variance and Standard Deviation). Let Xm denote the classmidpoint. Then the sample variance for grouped data is

s2 =n (∑f ·X2

m)− (∑f ·Xm)2

n (n− 1)

and the sample standard deviation for grouped data is

s =

√n (∑f ·X2

m)− (∑f ·Xm)2

n (n− 1).

Problem 108. Find the sample variance and the sample standard deviation for the fre-quency distribution for the following data obtained from the number of miles that 20 runnersran during one week.

Class Bondaries Frequency5.5− 10.5 110.5− 15.5 215.5− 20.5 320.5− 25.5 525.5− 30.5 430.5− 35.5 335.5− 40.5 2

Cedar Crest College June 23, 2015

Page 35: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.2. Measures of Variation Hammer 31

Definition 109 (Coefficient of Variance or Covariance). The coefficient of variance orcovariance is the standard deviation divided by the mean. The result is expressed as apercentage. This statistic allows you to compare standard deviations when the units aredifferent. The covariance for sample data is

CVar =s

X· 100

and the covariance for population is

CVar =σ

µ· 100.

Problem 110. The mean of the number of sales of cars over a 3-month period is 87, and thestandard deviation is 5. The mean of the commissions is $5, 225, and the standard deviationis $773. Compare the variations of the two.

June 23, 2015 Cedar Crest College

Page 36: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 32 Statistics 110 Lecture Notes

Theorem 111 (Chebyshev). The proportion of values from a data set that will fall withink standard deviations of the mean will be at least 1− 1

k2, where k is a number greater than

1 (not necessarily integer).

Problem 112. The mean price of houses in a certain neighborhood id $50, 000, and thestandard deviation is $10, 000. Find the price range for which at least 75% of the houses willsell.

Problem 113. A survey of local companies found that the mean amount of travel allowancefor couriers was $0.25 per mile. The standard deviation was $0.02. Find the minimumpercentage of the data values that will fall within $0.20 and $0.30.

Suggested Problems: 1− 20, 31− 35

Cedar Crest College June 23, 2015

Page 37: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.3. Measures of Position Hammer 33

3.3 Measures of Position

Definition 114 (standard score or z-score). A standard score or z-score for a value isobtained by subtracting the mean from the value and dividing the result by the standarddeviation.

z =value−mean

standard deviation.

For samples, the formula is

z =X −Xs

.

For population, the formula is

z =X − µσ

.

The z-score represents the number of standard deviations that a data value falls above orbelow the mean.

Problem 115. A student scored 65 on a calculus test that had a mean of 50 and a standarddeviation of 10; she scored 30 on a history test with a mean of 25 and a standard deviationof 5. Compare her relative positions on the two tests.

Problem 116. Find the z-score for each test, and state which is higher.

Test A X = 40 s = 5

Test B X = 100 s = 10

June 23, 2015 Cedar Crest College

Page 38: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 34 Statistics 110 Lecture Notes

Definition 117 (Percentiles). Percentiles divide the data set into 100 equal groups.

Problem 118. The frequency distribution for the systolic blood pressure readings (in mil-limeters of mercury, mmHg) of 200 randomly selected college students is shown here. Con-struct a percentile graph.

Class Boundary Frequency Cumulative Frequency Cumulative Percentile89.0− 104.5 24104.5− 119.5 62119.5− 134.5 72134.5− 149.5 26149.5− 164.5 12164.5− 179.5 4

Cedar Crest College June 23, 2015

Page 39: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.3. Measures of Position Hammer 35

Definition 119 (Percentile Rank Formula). The percentile corresponding to a given valueX is computed by using the formula

Percentile Rank =(number of values below X) + 0.5

total number of values· 100.

Problem 120. A teacher gives a 20-point test to 10 students. The scores are shown here.Find the percentile rank of a score of 12.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10.

Problem 121. Using the data in Problem 120, find the percentile rank for a score of 6.

June 23, 2015 Cedar Crest College

Page 40: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 36 Statistics 110 Lecture Notes

Problem 122. Using the scores in Problem 120, find the value corresponding to the 25th

percentile.

Problem 123. Using the data set in Problem 120, find the value that corresponds to the60th percentile.

Definition 124 (Quartile). Quartiles divide the distribution into four equal groups denotesQ1, Q2, and Q3.

Problem 125. Find Q1, Q2, and Q3 for the data set

15, 13, 6, 5, 12, 50, 22, 18.

Cedar Crest College June 23, 2015

Page 41: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.3. Measures of Position Hammer 37

Definition 126 (Interquartile Range). The interquartile range (IQR) is the differencebetween the third and the first quartiles.

IQR = Q3 −Q1.

Problem 127. Find the interquartile range for the data set in Problem 125.

Definition 128 (Outlier). An outlier is an extremely high or an extremely low data valuewhen compared with the rest of the data values.

Problem 129. Check the following data set for outliers.

5, 6, 12, 13, 15, 18, 22, 50

Suggested Problems: 1− 20.

June 23, 2015 Cedar Crest College

Page 42: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 38 Statistics 110 Lecture Notes

3.4 Exploratory Data Analysis

Definition 130 (Boxplot). A boxplot is a graph of a data set obtained by drawing ahorizontal line from the minimum data value to Q1, drawing a horizontal line from Q3 tothe maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3

with a vertical line inside the box passing through the median or Q2.

Problem 131. The number of meteorites found in 10 states of the United States is

89, 47, 164, 296, 30, 215, 138, 78, 48, 39.

Construct a boxplot for the data.

Cedar Crest College June 23, 2015

Page 43: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

3.4. Exploratory Data Analysis Hammer 39

Problem 132. A dietitian is interested in comparing the sodium content of real cheese withthe sodium content of a cheese substitute. The data for two random samples are shown.Compare the distributions, using boxplots.

Real Cheese Cheese Substitute310 420 45 40 270 180 250 290220 240 180 90 130 260 340 310

Suggested Problems: 1− 15

June 23, 2015 Cedar Crest College

Page 44: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 40 Statistics 110 Lecture Notes

Cedar Crest College June 23, 2015

Page 45: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 4

Probability and Counting Rules

4.1 Sample Spaces and Probability

Definition 133 (Probability Experiment). A probability experiment is a chance processthat leads to well-defined results called outcomes

Definition 134 (Outcome). An outcome is the result of a single trial of a probabilityexperiment.

Definition 135 (Sample Space). A sample space is the set of all possible outcomes of aprobability experiment.

Problem 136. Find the sample space for rolling two dice.

Problem 137. Find the sample space for drawing one card from an ordinary deck of cards.

41

Page 46: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 42 Statistics 110 Lecture Notes

Definition 138 (Tree Diagram). A tree diagram is a device consisting of line segmentsemanating from a starting point and also from an outcome point. It is used to determine allpossible outcomes of a probability experiment.

Problem 139. Use a tree diagram to find the sample space for the gender of three childrenin a family.

Cedar Crest College June 23, 2015

Page 47: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.1. Sample Spaces and Probability Hammer 43

Definition 140 (Event). An event consists of a set of outcomes of a probability experiment.

Definition 141 (Equally Likely Events). Equally likely events are events that have thesame probability of occurring.

Definition 142 (Cardinality). The Cardinality of an event or a sample space is the sizeof that event or sample space. That is to say that it is the number of things in that eventor sample space. Let E be an event. The cardinality of an event is denoted by |E| .

Formula 143 (Classical Probability). The probability of any event E happening in a samplespace S is

Pr(E) = P (E) =|E||S|

.

Problem 144. Find the probability of getting a red face card (Jack, Queen, or King) whenrandomly drawing a card from an ordinary deck.

Problem 145. If a family has three children, find the probability that exactly two of thethree children are girls.

June 23, 2015 Cedar Crest College

Page 48: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 44 Statistics 110 Lecture Notes

True Fact 146 (Rules of Probability). The rules of probability are as follows:

• Let E denote an event. Then 0 ≤ Pr(E) ≤ 1.

• The sum of all of the outcomes in a sample space is 1.

• If an event E cannot occur, Pr(E) = 0.

• If an event E is certain, then Pr(E) = 1.

Problem 147. When a single die is rolled, find the probability of getting a 9.

Problem 148. When a single die is rolled, what is the probability of getting a number lessthan 7?

Cedar Crest College June 23, 2015

Page 49: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.1. Sample Spaces and Probability Hammer 45

Definition 149 (Event Complement). The complement of an event E is the set ofoutcomes in the sample space that are not included in the outcomes of event E. Thecomplement of E is denoted by E.

Formula 150 (Rule for Event Complement). Pr(E)

= 1− Pr(E).

Problem 151. In a study, it was found that 23% of the people surveyed said that vanillawas their favorite flavor of ice cream. If a person is selected randomly, find the probabilitythat the person’s favorite ice cream is not vanilla.

June 23, 2015 Cedar Crest College

Page 50: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 46 Statistics 110 Lecture Notes

Definition 152 (Empirical Probability). Empirical probability relies on actual experienceto determine the likelihood of an outcome as opposed to classical probability which assumesthat certain outcomes are equally likely.

Formula 153 (Rule for Empirical Probability). Given the frequency distribution, let fdenote the frequency for a class and n denote the total frequencies in the distribution. Theempirical probability (which is based on observation) for a given class is

Pr(E) =f

n.

Problem 154. A researcher for the American Automobile Association (AAA) asked 50 peo-ple who plan to travel over the Thanksgiving holiday how they will get to their destination.Find the probability that a person will travel by airplane over the thanksgiving break.

Method FrequencyDrive 41Fly 6

Train or bus 3

Problem 155. In a sample of 50 people, 21 had type O blood, 22 had type A blood, 5had type B blood, and 2 had type AB blood. Set up a frequency distribution and find thefollowing probabilities.

a. A person has type O blood

b. A person has type A or B blood

c. A person has neither type A nor type O blood

d. A person does not have type AB blood.

Suggested Problems: 1− 25

Cedar Crest College June 23, 2015

Page 51: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.2. The Addition Rules for Probability Hammer 47

4.2 The Addition Rules for Probability

Definition 156 (Mutually Exclusive). Two events are mutually exclusive events or dis-joint events if if they cannot occur at the same time (i.e. they have no outcomes incommon).

Formula 157 (Addition of Mutually Exclusive Events). When two events A and B aremutually exclusive, the probability that A or B will occur is

Pr (A or B) = Pr(A) + Pr(B).

Problem 158. A city has 9 coffee shops: 3 Starbucks, 2 Caribou Coffees, and 4 CrazyMocho Coffees. If a person selects one shop at random to buy a cup of coffee, find theprobability that it is either a Starbucks or a Crazy Mocho Coffee.

Problem 159. The corporate research and development centers for three local companieshave the following number of employees:

U.S. Steel 110Alcoa 750

Bayer Material Science 250

If a research employee is selected at random, find the probability that the employee isemployed by U.S. Steel or Alcoa.

June 23, 2015 Cedar Crest College

Page 52: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 48 Statistics 110 Lecture Notes

Formula 160 (Addition Rule for Non-Mutually Exclusive Events). If A and B are notmutually exclusive, then

Pr (A or B) = Pr(A) + Pr(B)− Pr (A and B) .

Problem 161. A single card is drawn at random from an ordinary deck of cards. Find theprobability that it is either an ace or a black card.

Problem 162. In a hospital unit there are 8 nurses and 5 physicians; 7 nurses and 3physicians are females. If a staff person is selected, find the probability that the subject isa nurse or a male.

Problem 163. On New Years Eve, the probability of a person driving while intoxicated is0.32, the probability of a person having a driving accident is 0.09, and the probability ofa person having a driving accident while intoxicated is 0.06. What is the probability of aperson driving while intoxicated or having a driving accident?

Cedar Crest College June 23, 2015

Page 53: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.3. The Multiplication Rules & Conditional Probability Hammer 49

4.3 The Multiplication Rules & Conditional Probabil-

ity

Definition 164 (Independent Events). Two events A and B are independent events ifthe fact that A occurs does not affect the probability of B occurring.

Formula 165 (Independent Multiplication Formula). When two events are independent,the probability of both occurring is

Pr (A and B) = Pr(A) · Pr(B).

Problem 166. A coin is flipped and a die is rolled. Find the probability of getting a headon the coin and a 4 on the die.

Problem 167. A card is drawn from a deck and replaced; then a second card is drawn.Find the probability of getting a queen and then an ace.

June 23, 2015 Cedar Crest College

Page 54: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 50 Statistics 110 Lecture Notes

Problem 168. An urn contains 3 red balls, 2 blue balls, and 5 white balls. A ball is selectedand its color noted. Then it is replace. A second ball is selected and its color is noted. Findthe probability of each of the following.

a. Selecting 2 blue balls

b. Selecting 1 blue ball and 1 white ball

c. Selecting 1 red ball and 1 blue ball.

Problem 169. A Harris poll found that 46% of Americans say they suffer great stress atleast once a week. If three people are selected at random, find the probability that all threewill say that they suffer great stress at least once a week?

Problem 170. Approximately 9% of men have a type of color blindness that preventsthem from distinguishing between red and green. If 3 men are selected at random, find theprobability that all of them will have this type of red-green color blindness.

Cedar Crest College June 23, 2015

Page 55: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.3. The Multiplication Rules & Conditional Probability Hammer 51

Definition 171 (Dependent Events). When the outcome or occurrence of the first eventaffects the outcome or occurrence of the second event in such a way that the probability ischanged, the events are said to be dependent events.

Definition 172 (Conditional Probability). The conditional probability of an event Bin relationship to an event A is the probability that event B will occur after event A hasalready occurred. This is denoted as Pr(B|A) and is sometimes read as, “The probabilityof B given A.”

Problem 173. In a recent survey, 33% of the respondents said that they feel that they areoverqualified (O) for their present job. Of these, 24% said that they were looking for a newjob (J). If a person is selected at random, find the probability that the person feels that heor she is overqualified and is also looking for a new job.

Problem 174. World Wide Insurance Company found that 53% of the residents of a cityhad homeowner’s insurance (H) with the company. Of these clients, 27% also had automobileinsurance (A) with the company. If a resident is selected at random, find the probability thatthe resident has both homeowner’s insurance and automobile insurance with World WideInsurance Company.

June 23, 2015 Cedar Crest College

Page 56: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 52 Statistics 110 Lecture Notes

Problem 175. Three cards are drawn from an ordinary deck and not replaced. Find theprobability of

a. Getting 3 Jacks.

b. Getting an Ace, a King, and a Queen in that order.

c. Getting a Club, a Spade, and a Heart in that order.

d. Getting 3 Clubs.

Problem 176. Box 1 contains 2 red balls and 1 blue ball. Box 2 contains 3 blue balls and1 red ball. A coin is tossed. If it falls heads up, Box 1 is selected and a ball is drawn. If itfalls on tails, Box 2 is selected and a ball is drawn. Find the probability of selecting a redball.

Cedar Crest College June 23, 2015

Page 57: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.3. The Multiplication Rules & Conditional Probability Hammer 53

Formula 177 (Conditional Probability). The probability that the second event B occursgiven that the first event A has occurred can be found by dividing the probability that bothevents occurred by the probability that the first event has occurred. The formula is

Pr(B|A) =Pr (A and B)

Pr(A).

Problem 178. A box contains black chips and white chips. A person selects two chipswithout replacement. If the probability of selecting a black chip and a white chip is 15

56and

the probability of selecting a black chip on the first draw is 38, find the probability of selecting

the white chip on the second draw given that the first chip selected was a black chip.

Problem 179. The probability that Sam parks in a no-parking zone and gets a parkingticket is 0.06, and the probability that Sam cannot find a legal parking space and has topark in a no-parking zone is 0.20. On Tuesday, Sam arrives at school and has to park in ano-parking zone. Find the probability that he will get a parking ticket.

June 23, 2015 Cedar Crest College

Page 58: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 54 Statistics 110 Lecture Notes

Problem 180. A recent survey asked 100 people if they thought that women in the armedforces should be permitted to participate in combat. The results of the survey are shown.

Gender Yes No TotalMale 32 18 50

Female 8 42 50Total 40 60 100

Find the probabilities

a. The respondent answered yes given that the respondent was female.

b. The respondent was a male, given that the response answered no.

Cedar Crest College June 23, 2015

Page 59: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.3. The Multiplication Rules & Conditional Probability Hammer 55

Problem 181. A person selects 3 cards from an ordinary deck and replaces each card afterit is drawn. find the probability that the person will get at least one heart. Hint. Use theprobability of the complement.

Problem 182. A coin is tossed 5 times. Find the probability of getting at least 1 tails.

Problem 183. The Neckware Association of America reported that 3% of ties sold in theUnited States are bow ties. If 4 customers who purchased a tie are randomly selected, findthe probability that at least 1 purchased a bow tie?

Suggested Problems: 3− 20, 32, 35− 40

June 23, 2015 Cedar Crest College

Page 60: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 56 Statistics 110 Lecture Notes

4.4 Counting Rules

Formula 184 (Fundamental Counting Rule). In a sequence of n events in which the firstone has k1 possibilities, the second event has k2 possibilities, and so forth, the total numberof possibilities of the sequence will be

k1 · k2 · · · · · kn.

Problem 185. A coin is tossed and a die is rolled. find the number of outcomes for thesequence.

Problem 186. A paint manufacturer wishes to manufacture several different paints. Thecategories include

Color Red, Blue, White, Black, Green, Brown, YellowType Latex, OilTexture Flat Semigloss, High GlossUse Outdoor, Indoor

How many different kinds of paint can be made if you can select one color, one type, onetexture, and one use?

Cedar Crest College June 23, 2015

Page 61: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.4. Counting Rules Hammer 57

Problem 187. There are four blood types, A, B, AB, and O. Blood can also be Rh+ orRh-. Finally, a blood donor can be classified as either male or female. How many differentways can a donor have his or her blood labeled?

Problem 188. The first year that the state of Pennsylvania issued railroad memorial licenseplates, the plate had a picture of a steam engine followed by four digits. Assuming thatrepetitions are allowed, how many railroad memorial plates could be issued?

June 23, 2015 Cedar Crest College

Page 62: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 58 Statistics 110 Lecture Notes

Definition 189 (Factorial). For any counting number n, n! is read n factorial and iscomputed as

n! = n(n− 1)(n− 2) · · · 1, and 0! = 1.

Definition 190 (Permutation). A Permutation is an arrangement of n objects in a specificorder.

Problem 191. Suppose a business owner has a choice of 5 locations in which to establishher business. She decides to rank each location according to certain criteria, such as priceof the store and parking facilities. How many different ways can she rank the 5 locations?

Problem 192. Suppose the business owner in Problem 191 wishes to rank only the top 3of the 5 locations. How many different ways can she rank them?

Cedar Crest College June 23, 2015

Page 63: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.4. Counting Rules Hammer 59

Formula 193 (Permuation). The arrangement of n objects in a specific order using r objectsat a time is written as nPr and is computed as

nPr =n!

(n− r)!.

Problem 194. A radio talk show host can select 3 of 6 special guests for her program. Theorder of appearance of the guests is important. How many different ways can this be done?

Problem 195. A school musical director can select 2 musical plays to present next year.One will be presented in the fall, and one will be presented in the spring. If she has 9 topick from, how many different possibilities are there?

June 23, 2015 Cedar Crest College

Page 64: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 60 Statistics 110 Lecture Notes

Formula 196 (Permutation with Repeats). The number of permutations of n objects whenr1 objects are identical, r2 objects are identical, . . ., rp objects are identical is

n!

r1!r2! · · · rp!

where r1 + r2 + · · ·+ rp = n.

Problem 197. How many permutations of letters can be made from the word statistics?

Cedar Crest College June 23, 2015

Page 65: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.4. Counting Rules Hammer 61

Definition 198 (Combination). A selection of distinct objects without regard to order iscalled a combination.

Formula 199 (Combination). The number of combinations of r objects selected from nobjects is denoted nCr or

(nr

). The formula is

nCr =

(n

r

)=

n!

(n− r)!r!.

Problem 200. How many combinations of 4 objects are there taken 2 at a time?

Problem 201. An advertising executive must select 3 different photographs for an adver-tising flier. If she has 10 different photographs that can be used, how many ways can sheselect 3 of them?

Problem 202. In a club there ate 7 women and 5 men. A committee of 3 women and 2men is to be chosen. How many different possibilities are there?

Suggested Problems: 1− 40

June 23, 2015 Cedar Crest College

Page 66: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 62 Statistics 110 Lecture Notes

4.5 Probability and Counting Rules

Problem 203. Find the probability of getting 4 aces when 5 cards are drawn from anordinary deck of cards.

Problem 204. A box contains 24 transistors, 4 of which are defective. If 4 are sold atrandom, find the following probabilities.

a. Exactly 2 are defective

b. None are defective

c. All are defective

d. At least 1 is defective

Cedar Crest College June 23, 2015

Page 67: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

4.5. Probability and Counting Rules Hammer 63

Problem 205. A store has 6 TV Graphic magazines and 8 Newstime magazines on thecounter. If two customers purchased a magazine, find the probability that one of eachmagazine was purchased.

Problem 206. In the Pennsylvania State Lottery, a person selects a three-digit number andrepetitions are permitted. If a winning number is selected, find the probability that it willhave all three digits the same.

Problem 207. There are 8 married couples in a tennis club. If 1 man and 1 womanare selected at random to plan the summer tournament, find the probability that they aremarried to each other.

Suggested Problems: 1− 12

June 23, 2015 Cedar Crest College

Page 68: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 64 Statistics 110 Lecture Notes

Cedar Crest College June 23, 2015

Page 69: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 5

Discrete Probability Distributions

5.1 Probability Distributions

Definition 208 (Random Variable). A random variable is a variable whose values aredetermined by chance.

Definition 209 (Discrete Probability Distribution). A discrete probability distributionconsists of the values a random variable can assume and the corresponding probabilities ofthe values. The probabilities are determined theoretically or by observation.

Problem 210. Construct a probability distribution for rolling a single die.

Problem 211. Represent graphically the probability distribution for the sample space oftossing three coins and landing heads.

65

Page 70: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 66 Statistics 110 Lecture Notes

Problem 212. The baseball World Series is played by the winner of the National Leagueand that of the American League. The first team to win four games wins the World Series.In other words, the series will consist of four to seven games, depending on the individualvictories. The data shown consists of 40 World Series events. The number of games played ineach series represented by the variable X. Find the probability Pr(X) for each X, constructa probability distribution, and draw a graph for the data.

X Number of Series Played4 85 76 97 16

Problem 213. Determine whether each distribution is a probability distribution

a.X 5 8 11 14

Pr(X) 0.2 0.6 0.1 0.3

b.X 1 2 3 4 5

Pr(X) 14

18

38

18

18

c.X 1 2 3 4

Pr(X) 14

14

14

14

d.X 4 8 12

Pr(X) −0.5 0.6 0.4

Suggested Problems: 7− 12, 19− 25

Cedar Crest College June 23, 2015

Page 71: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

5.2. Mean, Variance, Standard Deviation, and Expectation Hammer 67

5.2 Mean, Variance, Standard Deviation, and Expec-

tation

Formula 214 (Mean of Probability Distribution). The mean of a random variable witha discrete probability distribution is

µ =n∑i=1

(Xi · Pr(Xi))

where X1, X2, . . . , Xn are the outcomes and Pr(X1), P r(X2), . . . , P r(Xn) are the correspond-ing probabilities.

Problem 215. Find the mean of the number of spots (pips) that appear when a die istossed.

Problem 216. In families with five children, find the mean number of children who will begirls.

June 23, 2015 Cedar Crest College

Page 72: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 68 Statistics 110 Lecture Notes

Problem 217. If three coins are tossed, find the mean of the number of heads that occur.

Problem 218. The probability distribution shown represents the number of trips of fivenights or more that American adults take per year. (That is, 6% do not take any tripslasting five nights or more, 70% take one trip lasting five nights or more, etc.) Find themean.

Number of trips X 0 1 2 3 4Probability Pr(X) 0.06 0.70 0.20 0.03 0.01

Cedar Crest College June 23, 2015

Page 73: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

5.2. Mean, Variance, Standard Deviation, and Expectation Hammer 69

Formula 219 (Variance of Probability Distribution). The variance of a probability dis-tribution is found as follows

σ2 =n∑i=1

[X2i · Pr(Xi)

]− µ2.

The standard deviation is

σ =√σ2 or σ =

√√√√ n∑i=1

(X2i · Pr(Xi))− µ2

Problem 220. Compute the variance and standard deviation for the probability distributionfound for the number of spots (pips) that appear when a die is tossed.

Problem 221. A box contains 5 balls. Two are numbered 3, one is numbered 4, and twoare numbered 5. The balls are mixed and one is selected at random. After a ball is selected,its number is recorded. Then it is replaced. If the experiment is repeated many times, findthe variance and standard deviation of the numbers on the balls.

June 23, 2015 Cedar Crest College

Page 74: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 70 Statistics 110 Lecture Notes

Problem 222. A talk radio station has four telephone lines. If the host is unable to talk oris talking to a person, the other callers are placed on hold. When all lines are in use, otherswho are trying to call in get a busy signal. The probability that 0, 1, 2, 3, or 4 people will getthrough is shown in the probability distribution. Find the variance and standard deviationof the distribution.

X 0 1 2 3 4Pr(X) 0.18 0.34 0.23 0.21 0.04

Definition 223 (Expected Value). The expected value of a discrete random variable of aprobability distribution is the theoretical average of the variable. The formula is

µ = E(x) =n∑i=1

(Xi · Pr(Xi)) .

Problem 224. One thousand tickets are sold at $1 each for a color television valued at$350. What is the expected value of the gain if you purchase one ticket?

Cedar Crest College June 23, 2015

Page 75: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

5.2. Mean, Variance, Standard Deviation, and Expectation Hammer 71

Problem 225. Six balls numbered 1, 2, 3, 5, 8 and 13 are placed in a box. A ball is selectedat random, and its number is recorded and then it is replaced. Find the expected value ofthe numbers that will occur.

Problem 226. A financial adviser suggests that his client select one of two types of bondsin which to invest $5, 000. Bond X pays a return of 4% and has a default rate of 2%. BondY has a 2.5% return and a default rate of 1%. Find the expected rate of return and decidewhich bond would be a better investment. When the bond defaults, the investor loses all ofthe investment.

Suggested Problems: 1− 10.

June 23, 2015 Cedar Crest College

Page 76: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 72 Statistics 110 Lecture Notes

5.3 The Binomial Distribution

Definition 227 (Binomial Experiment). A binomial distribution is a probability exper-iment that satisfies the following four requirements:

1. There must be a fixed number of trials.

2. Each trial can have only two outcomes. These outcomes can be considered as eithersuccess or failure.

3. The outcomes of each trial must be independent of one another.

4. The probability of a success must remain the same for each trial.

Problem 228. Decide whether each experiment is a binomial experiment. If not, state thereason why.

a. Selecting 20 university students and recording their class rank.

b. Selecting 20 students from a university and recording their gender.

c. Drawing five cards from a deck without replacement and recording whether they arered or black.

d. Selecting five students from a large school and asking them if they are on the deanslist.

e. Recording the number of children in 50 randomly selected families.

Cedar Crest College June 23, 2015

Page 77: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

5.3. The Binomial Distribution Hammer 73

Definition 229 (Binomial Distribution). The outcomes of a binomial experiment and thecorresponding probabilities of these outcomes are called a binomial distribution.

Notation 230. The following notation is used for the binomial distribution:

Pr(S) The probability of SuccessPr(F ) The probability of Failurep The numerical probability of successq The numerical probability of failuren The number of trialsX The number of successes in n trials

Formula 231 (Binomial Probability). In a binomial experiment, the probability of exactlyX successes in n trials is

Pr(X) =

(n

X

)pXqn−X .

Problem 232. A coin is tossed 3 times. Find the probability of getting exactly two heads.

Problem 233. A survey from Teenage Research Unlimited found that 30% of teenageconsumers receive their spending money from part-time jobs. If 5 teenagers are selectedat random, find the probability that at least 3 of them will have part-time jobs.

June 23, 2015 Cedar Crest College

Page 78: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 74 Statistics 110 Lecture Notes

Formula 234 (Binomial Distribution Mean, Variance, and Standard Deviation). The mean,variance, and standard deviation of a variable that has the binomial distribution can be foundby using the following formulas:

Mean: µ = np Variance: σ2 = npq Standard Deviation: σ =√σ2 =

√npq.

Problem 235. A coin is tossed 4 times. Find the mean, variance, and standard deviationof the number of heads that will be obtained.

Problem 236. An 8-sided die is rolled 560 times. Find the mean, variance, and standarddeviation of the number of 7’s that will be rolled.

Suggested Problems: 1− .10, 17− 25

Cedar Crest College June 23, 2015

Page 79: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 6

The Normal Distribution

6.1 Normal Distributions

Definition 237 (Normal Distribution). If a random variable has a probability distributionwhose graph is continuous, bell shaped, and symmetric, it is called a normal distribution.The graph is called a normal distribution curve.

Definition 238 (Standard Normal Distribution). The standard normal distribution isa normal distribution with a mean of 0 and a standard deviation of 1.

Problem 239. Use the table on pages 788 and 789 of your textbook to find the area underthe standard normal distribution curve to the left of z = 2.09.

Problem 240. Find the area under the standard normal distribution curve to the right ofz = −1.14

Problem 241. Find the area under the standard normal distribution curve between z = 1.62and z = −1.34.

75

Page 80: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 76 Statistics 110 Lecture Notes

True Fact 242. The area under the standard normal distribution curve can also be thoughtof as a probability or as the proportion of the population with a given characteristic.

Problem 243. Find the probability for each assuming the standard normal distribution.

a. Pr(0 < z < 2.53)

b. Pr(z < 1.73)

c. Pr(z > 1.98)

Problem 244. Find the z value such that the area under the standard normal distributioncurve between 0 and z value is 0.2123.

Suggested Problems: 7− 20, 27− 35, 41− 43.

Cedar Crest College June 23, 2015

Page 81: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

6.2. Applications of the Normal Distribution Hammer 77

6.2 Applications of the Normal Distribution

Formula 245 (z-value). To solve problems by using the standard normal distribution, trans-form the original variable to a standard normal distribution variable by using the formula

z =X − µσ

.

Problem 246. An adult has on average 5.2 liters of blood. Assume the variable is normallydistributed and has a standard deviation of 0.3. Find the percentage of people who have lessthan 5.4 liters of blood in their system.

Problem 247. Each month, an American household generates an average of 28 pounds ofnewspaper for garbage or recycling. Assume the variable is approximately normally dis-tributed and the standard deviation is 2 pounds. If a household is selected at random, findthe probability of its generating

a. Between 27 and 31 pounds per month.

b. More than 30.2 pounds per month.

June 23, 2015 Cedar Crest College

Page 82: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 78 Statistics 110 Lecture Notes

Formula 248 (Normal Variable). The formula for finding the value of a normal variable Xis

X = zσ + µ.

Problem 249. To qualify for a police academy, candidates must score in the top 10% ona general abilities test. Assume the test scores are normally distributed and the tests has amean of 200 and a standard deviation of 20. Find the lowest possible score to qualify.

Problem 250. For a medical study, a researcher wishes to select people in the middle 60% ofthe population based on blood pressure. Assuming that blood pressure readings are normallydistributed and the mean systolic blood pressure is 120 and the standard deviation is 8, findthe upper and lower readings that would qualify prople to participate in the study.

Cedar Crest College June 23, 2015

Page 83: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

6.2. Applications of the Normal Distribution Hammer 79

Definition 251 (Pearson Coefficient). The skewness of a normal distribution can be checkedwith the Pearson Coefficient of skewness. The formula is

PC =3(X −median

)s

.

If the index is greater than or equal to +1 or less than or equal to −1, it can be concludedthat the data are significantly skewed.

Problem 252. A survey of 18 high-tech firms showed the number of days’ inventory theyhad on hand. Determine if the data are approximately normally distributed.

5 29 34 44 45 63 68 74 7481 88 91 97 98 113 118 151 158

Problem 253. The data shown consists of the number of games played each year in thecareer of Baseball Hall of Famer Bill Mazeroski. Determine if the data are approximatelynormally distributed.

81 148 152 135 151 152159 142 34 162 130 162163 143 67 112 70

Suggested Problems: 1− 10, 18− 25, 39− 43

June 23, 2015 Cedar Crest College

Page 84: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 80 Statistics 110 Lecture Notes

6.3 The Central Limit Theorem

Definition 254 (Sampling Distribution of Sample Means). A sampling distribution ofsample means is a distribution using the means computed from all possible random samplesof a specific size taken from a population.

Definition 255 (Sampling Error). Sampling error is the difference between the samplemeasure and the corresponding population measure due to the fact that the sample is not aperfect representation of the population.

Observation 256. Two observations can be made:

1. The mean of the sample means will be the same as the population mean.

2. The standard deviation of the sample means will be smaller than the standard deviationof the population, and it will be equal to the population standard deviation dividedby the square root of the sample size.

Theorem 257 (Central Limits). As the sample size n increases without limit, the shape ofthe distribution of the sampling means taken with replacement from a population with meanµ and standard deviation σ will approach a normal distribution. As previously shown, thisdistribution will have a mean µ and a standard deviation σ√

n.

Formula 258 (z Values for Sample Mean). The central limit theorem can be used to answerquestions about sample means in the same way that a normal distribution can be used toanswer questions about individual values.

z =X − µ

σ√n

.

Problem 259. A.C. Neilsen reported that children between the ages of 2 and 5 watch anaverage of 25 hours of television per week. Assume the variable is normally distributed andthe standard deviation is 3 hours. If 20 children between the ages of 2 and 5 are randomlyselected, find the probability that the mean of the number of hours they watch televisionwill be greater than 26.3 hours.

Cedar Crest College June 23, 2015

Page 85: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

6.3. The Central Limit Theorem Hammer 81

Problem 260. The average age of a vehicle registered in the United States is 8 year, or 96months. Assume the standard deviation is 16 months. If a random sample of 36 vehicles isselected, find the probability that the mean of their age is between 90 and 100 months.

Problem 261. The average time spent by construction workers who work on weekendsis 7.93 hours (over 2 days). Assume the distribution is approximately normal and has astandard deviation of 0.8 hours.

a. Find the probability that an individual who works at that trade works fewer than 8hours on the weekend.

b. If a sample of 40 construction workers is randomly selected, find the probability thatthe mean of the sample will be less than 8 hours.

Suggested Homework: 7− 15.

June 23, 2015 Cedar Crest College

Page 86: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 82 Statistics 110 Lecture Notes

Cedar Crest College June 23, 2015

Page 87: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 7

Confidence Intervals and Sample Size

7.1 Confidence Intervals for the Mean Shen σ is Known

Definition 262 (Point Estimate). A point estimate is a specific numerical value estimateof a parameter. The best point estimate of the population mean µ is the sample mean X.

Properties 263. Three properties of a good estimator are

1. The estimator should be an unbiased estimator. That is, the expected value or themean of the estimates obtained from samples of a given size is equal to the parameterbeing estimated.

2. The estimator should be consistent. For a consistent estimator, as sample sizeincreases, the value of the estimator approaches the value of the parameter estimated.

3. The estimator should be a relatively efficient estimator. That is, of all the statisticsthat can be used to estimate a parameter, the relatively efficient estimator has thesmallest variance.

Definition 264 (Interval Estimate). An interval estimate of a parameter is an intervalor a range of values used to estimate the parameter. This estimate may or may not containthe value of the parameter being estimated.

Definition 265 (Confidence Level). The confidence level of an interval estimate of aparameter is the probability that the interval estimate will contain the parameter, assumingthat a large number of samples are selected and that the estimation process on the sameparameter is repeated.

Definition 266 (Confidence Interval). A confidence interval is a specific interval estimateof a parameter determined by using data obtained from a sample and by using the specificconfidence level of the estimate.

Formula 267 (Confidence Interval of Mean where σ is known). Let α denote the total areain both tails of the standard normal distribution curve, and α

2denote the area in each of the

83

Page 88: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 84 Statistics 110 Lecture Notes

tails. We call zα/2 a critical value. The interval formula for a confidence interval of theMean for a specific α where σ is known is

X − zα/2(σ√n

)< µ < X + zα/2

(σ√n

).

For a 90% confidence interval, zα/2 = 1.65; for a 95% confidence interval, zα/2 = 1.96; andfor a 99% confidence interval, zα/2 = 2.58.

Definition 268 (Margin of Error). The margin of error, also called the maximum errorof the estimate, is the maximum likely difference between the point estimate of a parameterand the actual value of the parameter.

Problem 269. A researcher wishes to estimate the number of days it takes an automobiledealer to sell a Chevrolet Aveo. A random sample of 50 cars had a mean time on the dealer’slot of 54 days. Assume the population standard deviation to be 6.0 days. Find the bestpoint estimate of the population mean and the 95% confidence interval of the populationmean.

Problem 270. A large department store found that it averages 362 customers per hour.Assume that the standard deviation is 29.6 and a random sample of 40 hours was used todetermine the average. Find the 99% confidence interval of the population mean

Cedar Crest College June 23, 2015

Page 89: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

7.1. Confidence Intervals for the Mean Shen σ is Known Hammer 85

Problem 271. The following data represent a random sample of the assets (in millions ofdollars) of 30 credit unions in southwestern Pennsylvania. Assume that the ppopulationstandard deviation is 14.405. Find the 90% confidence interval of the mean.

12.23 16.56 4.392.89 1.24 2.1713.19 9.16 1.4273.25 1.91 14.6411.59 6.69 1.068.74 3.17 18.137.92 4.78 16.8540.22 2.42 21.585.01 1.47 12.242.27 12.77 2.76

June 23, 2015 Cedar Crest College

Page 90: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 86 Statistics 110 Lecture Notes

Formula 272 (Minimum Sample Size needed for Interval Estimate of the Population Mean).Let E be the margin of error and n denote the sample size. Then the minimum sample sizen needed for an interval estimate of population mean is

n =(zα/2 · σ

E

)2Problem 273. A scientist wishes to estimate the average depth of a river. He wants tobe 99% confident that the estimate is accurate within 2 feet. From a previous study, thestandard deviation of the depths measured was 4.33 feet.

Suggested Problems: 7− 15, 23− 26

Cedar Crest College June 23, 2015

Page 91: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

7.2. Confidence Intervals for the Mean When σ is Unknown Hammer 87

7.2 Confidence Intervals for the Mean When σ is Un-

known

Definition 274 (t Distribution). The t distribution is siimilar to the standard normal dis-tribution in these ways:

1. It is bell-shaped.

2. It is symmetric about the mean.

3. The mean, median, and mode are equal to 0 and are located at the center of thedistribution.

4. The curve approaches but never touches the x-axis.

The t distribution differs from the standard normal distribution in the following ways:

1. The variance is greater than 1.

2. The t distribution is actually a family of curves based on the concept of degrees offreedom, which is related to sample size.

3. As the sample size increases, the t distribution approaches the standard normal distri-bution.

Problem 275. Find the tα/2 value for a 95% confidence interval when the sample size is 22.

June 23, 2015 Cedar Crest College

Page 92: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 88 Statistics 110 Lecture Notes

Formula 276 (Specific Confidence Interval for the Mean When σ is Unknown). The degreesof freedom are n− 1 and the confidence interval for the mean when σ is unknown is

X − tα/2(

s√n

)< µ < X + tα/2

(s√n

).

Problem 277. A random sample of 10 children found that their average growth for the firstyear was 9.8 inches. Assume the variable is normally distributed and the sample standarddeviation is 0.96 inches. Find the 95% confidence interval of the population mean for growthduring the first year.

Problem 278. The data represent a random sample of the number of home fires started bycandles for the past several years (Data are from the National Fire Protection Association).Find the 99% confidence interval for the mean number of home fires started by candles eachyear.

5460, 5900, 6090, 6310, 7160, 8440, 9930

Suggested Problems: 5− 10

Cedar Crest College June 23, 2015

Page 93: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

7.3. Confidence Intervals and Sample Size for Populations Hammer 89

7.3 Confidence Intervals and Sample Size for Popula-

tions

Notation 279. Let p denote the population proportion and p̂ denote the sample proportion.For a sample proportion,

p̂ =X

nand q̂ =

n−Xn

or q̂ = 1− p̂

Where X is the number of sample units that possess the characteristics of interest and n isthe sample size.

Problem 280. A random sample of 200 workers found that 128 drove to work alone. Findp̂ and q̂, where p̂ is the proportion of workers who drove to work alone.

June 23, 2015 Cedar Crest College

Page 94: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 90 Statistics 110 Lecture Notes

Formula 281 (Specific Confidence Interval for a Proportion). Let np̂ and nq̂ be greaterthan or equal to 5. Then

p̂− zα/2

√p̂q̂

n< p < p̂+ zα/2

√p̂q̂

n.

Problem 282. A survey conducted by Sallie Mae and Gallup of 1404 respondents foundthat 323 students paid for their education by student loans. Find the 90%

Problem 283. A survey of 1898 adults with lawns conducted by Harris Interactive Pollfound that 45% of the adults said that dandelions were the toughest weeds to control in theiryards. Find the 95% confidence interval of the true proportion who said that dandelions werethe toughest weeds to control their yards.

Cedar Crest College June 23, 2015

Page 95: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

7.3. Confidence Intervals and Sample Size for Populations Hammer 91

Formula 284 (Minimum Sample Size Needed for Interval Estimate of Proportion). n =

p̂q̂( zα/2

E

)2Problem 285. A researcher wishes to estimate, with 95% confidence, the proportion ofpeople who own a home computer. A previous study shows that 40% of those interviewedhad a computer at home. The researcher wishes to be accurate within 2% of the trueproportion. Find the minimum sample size necessary.

Problem 286. In Problem 285, assume that no previous study was done. Find the minimumsample size necessary to be accurate within 2% of the true proportion.

June 23, 2015 Cedar Crest College

Page 96: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 92 Statistics 110 Lecture Notes

Cedar Crest College June 23, 2015

Page 97: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 8

Hypothesis Testing

8.1 Traditional Method

Definition 287 (Statistical Hypothesis). A statistical hypothesis is a conjecture abouta population parameter. This conjecture may or may not be true.

Definition 288 (Null Hypothesis). The null hypothesis, symbolized by H0, is a statisticalhypothesis that states that there is no difference between a parameter and a specific value,or that there is no difference between two parameters.

Definition 289 (Alternative Hypothesis). The alternative hypothesis, symbolized by H1,is a statistical hypothesis that states the existence of a between a parameter and a specificvalue, or a state that there is a difference between two parameters.

Problem 290. State the null and alternative hypothesis for each conjecture.

a. A researcher thinks that if expectant mothers use vitamin pills, the birth weight of thebabies will increase. The average birth weight of the population is 8.6 pounds.

b. An engineer hypothesizes that the mean number of defects can be decreased in amanufacturing process of USB drives by using robots instead of humans for certaintasks. The mean number of defective drives per 1000 is 18.

c. A psychologist feels that playing soft music during a test will change the results of thetest. The psychologist is not sure whether the grades will be higher or lower. In thepast, the mean of the scores was 73.

93

Page 98: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 94 Statistics 110 Lecture Notes

Definition 291 (Statistical Test). A statistical test uses the data obtained from a sampleto make a decision about whether the null hypothesis should be rejected.

Definition 292 (Test Value). The numerical value obtained from a statistical test is calledthe test value.

Definition 293 (Error Types). There are two types of error types with respect to the nullhypothesis.

H0 true H0 falseReject H0 Error Type I Correct DecisionDo not reject H0 Correct Decision Error Type II

Definition 294 (Level of Significance). The level of significance is the maximum probabil-ity of committing a type I error. The probability is symbolized by α. That is Pr(type I error) =α.

Definition 295 (Rejection Region). The critical or rejection region is the range of testvalues that indicates that there is a significant difference and that the null hypothesis shouldbe rejected.

Definition 296 (Nonrejection Region). The noncritical or nonrejection region is therange of test values that indicates that the difference was probably due to chance and thatthe null hypothesis should not be rejected.

Definition 297 (Critical Value). The critical value separates the critical region from thenoncritical region. The symbol for critical value is C.V.

Definition 298 (Tail Test). A one-tailed test indicates that the null hypothesis shouldbe rejected when the test value is in the critical region on one side of the mean. A one-tailedtest is either a right-tailed test or a left-tailed test, depending on the direction of theinequality of the alternative hypothesis.

Definition 299 (Two-tailed Test). In a two-tailed test, the null hypothesis should berejected when the test value is in either of the two critical regions.

Problem 300. Find the critical values for each situation and draw the appropriate figure,showing the critical regions.

a. A left-tailed test with α = 0.10

b. A two-tailed test with α = 0.02

c. A right-tailed test with α = 0.005

Suggested Problems: 7− 14

Cedar Crest College June 23, 2015

Page 99: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

8.2. z Test for a Mean Hammer 95

8.2 z Test for a Mean

Formula 301 (z Test). The z test is a statistical test for the mean of a population. It canbe used either when n ≥ 30 or when the population is normally distributed and σ is known.Let X denote the sample mean, µ denote the hypothesized population mean, sigma denotethe population standard deviation, and n denote the sample size. Then the formula is

z =X − µ

σ√n

Problem 302. In Pennsylvania, the average IQ score is 101.5. The variable is normallydistributed, and the population standard deviation is 15. A school superintendent claimsthat the students in her school district have an IQ higher than the average. She selects arandom sample of 30 students and finds the mean of the test scores is 106.4. Test the claimat α = 0.05.

June 23, 2015 Cedar Crest College

Page 100: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 96 Statistics 110 Lecture Notes

Problem 303. For a specific year, the average score on the SAT Math test was 515. Thevariable is normally distributed, and the population standard deviation is 100. The samesuperintendent in Problem 302 wishes to see if her students scored significantly below thenational average on the test. She randomly selected 36 student scores, as shown. At α = 0.10,is there enough evidence to support the claim?

496 506 507 505 438 499505 522 531 762 513 493522 668 543 519 349 506519 516 714 517 511 551287 523 576 516 515 500243 509 523 503 414 504

Problem 304. The Medical Rehabilitation Education Foundation reports that the averagecost of rehabilitation for stroke victims is $24, 672. To see if the average cost of rehabilitationis different at a particular hospital, a researcher selects a random sample of 35 stroke victimsat the hospital and finds that the average cost of their rehabilitation is $26, 343. The standarddeviation of the population is $3, 251. At α = 0.01, can it be concluded that the averagecost of stroke rehabilitation at a particular hospital is different from $24, 672?

Cedar Crest College June 23, 2015

Page 101: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

8.2. z Test for a Mean Hammer 97

Definition 305 (P -value). The P -value (or probability value) is the probability of gettinga sample statistic (such as the mean) or a more extreme sample statistic in the direction ofthe alternative hypothesis when the null hypothesis it true.

Formula 306 (Rejecting or Not Rejecting Null Hypothesis with P -value). If the P -valueis less than or equal to α, then reject the null hypothesis. If the P -value is greater than α,then do not reject the null hypothesis.

Problem 307. A researcher wishes to test the claim that the average cost of tuition andfees at a four-year public college is greater than $5, 700. She selects a random sample of36 four-year public colleges and finds the mean to be $5, 950. The population standarddeviation is $659. Is there evidence to support the claim that α = 0.05? Use the P -valuemethod.

Problem 308. A researcher claims that the average wind speed in a certain city is 8 milesper hour. A sample of 32 days has an average wind speed of 8.2 miles per hour. The standarddeviation of the population is 0.6 miles per hour. At α = 0.05, is there enough evidence toreject the claim? Use the P -value method.

Suggested Problems: 1− 5, 15− 20

June 23, 2015 Cedar Crest College

Page 102: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 98 Statistics 110 Lecture Notes

8.3 t Test for a Mean

Definition 309 (t Test). The t test is a statistical test for the mean of a population andis used when the population is normally or approximately normally distributed and σ isunknown. The test for the t test is

t =X − µ

s√n

The degrees of freedom are d.f. = n− 1.

Problem 310. Find the critical t value for α = 0.05 with d.f. = 16 for a right tailed t test.

Problem 311. Find the critical t value for α = 0.01 with d.f. = 22 for a left tailed t test.

Problem 312. Find the critical values for α = 0.10 with d.f. = 18 for a two-tailed t test.

Problem 313. Find the critical value for α = 0.05 with d.f. = 28 for a right-tailed t test.

Cedar Crest College June 23, 2015

Page 103: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

8.3. t Test for a Mean Hammer 99

Problem 314. A medical investigation claims that the average number of infections perweek at a hospital in southwestern Pennsylvania is 16.3. A random sample of 10 weeks hada mean number of 17.7 infections. The sample standard deviation is 1.8. Is there enoughevidence to reject the investigator’s claim at α = 0.05? Assume the variable is normallydistributed.

Problem 315. According to payscale.com, the average starting salary for a nurse practi-tioner is $79, 500. A researcher wishes to test a claim that the starting salary is less than that.A random sample of 8 starting nurse practitioners is selected, and their starting salaries (indollars) are shown. Is there enough evidence to support the researcher’s claim that α = 0.10?Assume the variable is normally distributed.

82, 000 68, 000 70, 200 75, 60083, 500 64, 300 75, 600 79, 000

Suggested Problems: 7− 15

June 23, 2015 Cedar Crest College

Page 104: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 100 Statistics 110 Lecture Notes

8.4 z Test for a Proportion

Definition 316 (z Test for a Proportion). Let p̂ = Xn

denote the sample proportion, pdenote the population proportion, and n denote the sample size. Then

z =p̂− p√

pqn

Problem 317. A researcher claims that based on the information obtained from the Centersfor Disease Control and Prevention, 17% of young people ages 2− 19 are obese. To test theclaim, she randomly selected 200 young people and found that 42 were obese. At α = 0.05,is there enough evidence to reject the claim?

Problem 318. The gallup Crime Survey stated that 23% of gun owners are women. Aresearcher believes that in the area where he lives, the percentage is less than 23%. Herandomly selects a sample of 100 fun owners and finds that 11% of the gun owners arewomen. At α = 0.01, is the percentage of female gun owners in his area less than 23%?

Cedar Crest College June 23, 2015

Page 105: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

8.4. z Test for a Proportion Hammer 101

Problem 319. A statistician reads that at least 77% of the population oppose replacing $1bills with $1 coins. To see if the claim is valid, the statistician selected a random sample of80 people and found that 55 were opposed to replacing the $1 bills. At α = 0.01, test theclaim that at least 77% of the population are opposed to the chance.

Problem 320. An attorney claims that more than 25% of all lawyers advertise. A randomsample of 200 lawyers in a certain city showed that 63 had used some form of advertising.At α = 0.05, is there enough evidence to support the attorney’s claim? Use the P -method.

Suggested Problems: 5− 10

June 23, 2015 Cedar Crest College

Page 106: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 102 Statistics 110 Lecture Notes

Cedar Crest College June 23, 2015

Page 107: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Chapter 10

Correlation and Regression

10.1 Scatter Plots and Correlation

Definition 321 (Scatter Plot). A scatter plot is a graph of the ordered pairs (x, y) ofnumbers consisting of the independent variable x and the dependent variable y.

Definition 322 (Correlation). The correlation between two or more attributes or mea-surements is the degree to which those attributes or measurements show a tendency to varytogether.

Definition 323 (Positive Correlation). A positive correlation is a relationship betweentwo variables where if one variable increases, the other one also increases.

Definition 324 (Negative Correlation). A positive correlation is a relationship betweentwo variables where if one variable increases, the other one decreases.

Problem 325. Construct a scatter plot for the data shown for car rental companies in theUnited States for a recent year. Describe whether the data set seems to have any correlation.

Company Cars (in ten thousands) Revenue (in billions of dollars)A 63.0 7.0B 29.0 3.9C 20.8 2.1D 19.1 2.8E 13.4 1.4F 8.5 1.5

103

Page 108: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Hammer 104 Statistics 110 Lecture Notes

Problem 326. Construct a scatter plot for the data obtained in a study on the numberof absences and the final grades of seven randomly selected students from a statistics class.Determine if there is any correlation. The data are shown here.

Student Number of absences x Final Grade y%A 6 82B 2 86C 15 43D 9 74E 12 58F 5 90G 8 78

Problem 327. A researcher wishes to see if there is a relationship between the ages of thewealthiest people in the world and their net worth. A random sample of 10 persons wasselected from the Forbes list of 400 richest people for a recent year. The data are shown.Draw a scatter plot for the data and determine if there is any notable correlation.

Persons Age x Net Worth y (in billions of dollars)A 60 11B 72 69C 56 11.9D 55 30E 83 12.2F 67 36G 38 18.7H 62 10.2I 62 23.3J 46 10.6

Cedar Crest College June 23, 2015

Page 109: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

10.1. Scatter Plots and Correlation Hammer 105

Definition 328 (Population Correlation Coefficient). The population correlation coef-ficient, ρ, is the correlation computed by using all possible pairs of data values (x, y) takenfrom a population.

Definition 329 (Linear Correlation Coefficient). The Linear Correlation Coefficientcomputed from the sample data measures the strength and direction of a linear relationshipbetween two quantitative variables. The symbol for the sample correlation coefficient is r.

June 23, 2015 Cedar Crest College

Page 110: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Index

t Distributino, 87z-score, 33z-value, 77

Alternative Hypothesis, 93Arithmetic Average, 19

Bar Graph, 12Bimodal, 22Binomial Distribution, 73Binomial Distribution Mean, 74Binomial Distribution Standard Deviation, 74Binomial Distribution Variance, 74Binomial Experiment, 72Binomial Probability, 73Boundary, 4Boxplot, 38

Cardinality, 43Class Boundaries, 8Class Midpoint, 10Class Width, 8Cluster Sample, 5Coefficient of Variance, 31Compound Time Series Graph, 14Conditional Probability, 51Confidence Interval, 83Confidence Interval σ Unknown, 88Confidence Interval for Proportion, 90Confidence Interval Known Mean, 83Confidence Level, 83Confounding Variable, 6Continuous Variables, 3Covariance, 31Critical Region, 94Critical Value, 94Cumulative Frequency Distribution, 9

Data, 1

Dependent Events, 51Dependent Variable, 6Descriptive Statistics, 1Discrete Probability Distribution, 65Discrete Variables, 3Disjoint Events, 47Dotplot, 16

Empirical Probability, 46Equally Likely Events, 43Error Types, 94Event, 43Event Complement, 45Expected Value, 70Experimental Study, 6Explanatory Variable, 6

Factorial, 58Frequency Distribution, 7Frequency Polygon, 10

Histogram, 10Hypothesis Testing, 1

Independent Events, 49Independent Variable, 6Inferential Statistics, 1Interquartile Range, 37Interval Estimate, 83Interval Level, 4

Level of Significance, 94Lower Class Limit, 8, 10

Margin of Error, 84Maximum Area of the Estimate, 84Mean, 19Mean of Distribution, 67Median, 21

106

Page 111: Lecture Notes - Cedar Crest College · Hammer 4 Statistics 110 Lecture Notes De nition 16 (Boundary). The boundary of a number is de ned as a class in which a data value would be

Index Hammer 107

Midrange, 24Minimum Sample of Population Mean, 86Minimum Sample Size Proportion, 91Mode, 22Mode Class, 23Multimodal, 22Mutually Exclusive, 47

Nominal Level, 4Noncritical Region, 94Nonrejection Region, 94Nonsampling Error, 5Normal Distribution, 75Normal Distribution Curve, 75Normal Variable Formula, 78Nuisance Variable, 6Null Hypothesis, 93

Observational Study, 6Ogive, 10Open-Ended Distribution, 8Ordinal Level, 4Outcome, 41Outcome Variable, 6Outlier, 37

Parameter, 19Pareto Chart, 13Pearson Coefficient, 79Percentile Rank Formula, 35Percentiles, 34Permutation, 58Pie Graph, 15Point Estimate, 83population, 1Population Mean, 19Population Standard Deviation, 27Population Variance, 27Probability, 1Probability Experiment, 41

Qualitative Variables, 3Quantitative Variables, 3Quartile, 36

Random Sample, 5

Random Variable, 65Random Variables, 1Range, 26Ratio Level, 4Raw Data, 7Rejection Region, 94Relative Frequency Graphs, 11

Sample, 1Sample Mean, 19Sample Mean z values, 80Sample Space, 41Sample Standard Deviation, 28Sample Variance, 28Sampling Distribution, 80Sampling Error, 5, 80Standard Deviation of Grouped Data, 30Standard Normal Distribution, 75standard score, 33Statistic, 19Statistical Hypothesis, 93Statistical Test, 94Statistics, 1Stem and Leaf Plot, 17Stratified Sample, 5Systematic Sample, 5

Tail Test, 94Test Value, 94Time Series Graph, 14Tree Diagram, 42Two-tailed Test, 94

Ungrouped Frequency Distribution, 9Unimodal, 22Upper Class Limit, 8, 10

Variable, 1Variance of Distribution, 69Variance of Grouped Data, 30

Weighted Mean, 25

June 23, 2015 Cedar Crest College