Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 ...
Chapter 1
description
Transcript of Chapter 1
Chapter 1
Introduction to Statistics
1Larson/Farber 4th ed.
Chapter Outline
• 1.1 What is Statistics?
• 1.2 Random Samples
• 1.3 Experimental Design
Larson/Farber 4th ed. 2
Section 1.1
What is Statistics?
3Larson/Farber 4th ed.
Section 1.1 Objectives
• Define statistics• Define individual/observational unit• Distinguish between a population and a sample• Distinguish between a parameter and a statistic• Distinguish between descriptive statistics and
inferential statistics• Distinguish between levels of measurement
4Larson/Farber 4th ed.
What is Data?
5Larson/Farber 4th ed.
What is Data?
Data
Consist of information coming from observations, counts, measurements, or responses. Statements are often based on data:
• “People who eat three daily servings of whole grains have been shown to reduce their risk of…stroke by 37%.” (Source: Whole Grains Council)
• “Seventy percent of the 1500 U.S. spinal cord injuries to minors result from vehicle accidents, and 68 percent were not wearing a seatbelt.” (Source: UPI)
6Larson/Farber 4th ed.
What is Statistics?
7Larson/Farber 4th ed.
What is Statistics?
Statistics
The science of collecting, organizing, analyzing, and interpreting data in order to make decisions.
8Larson/Farber 4th ed.
Individuals and Variables
Individual / Observational Unit
Variables
Weight: 1.5 poundsSugars: 16 grams
Age: 6 monthsWeight: 16.53 pounds
Individuals and Variables
• Individuals are people or objects included in the study.
• Variables are characteristics of the individual to be measured or observed.
• Exercise 1: We want to do a study about the people who have climbed Mt. Everest. Identify the individuals and the variables.
Types of Data
Qualitative Data
Consists of attributes, labels, or nonnumerical entries.
Major Place of birth
Eye color
11Larson/Farber 4th ed.
Types of Data
Quantitative data
Numerical measurements or counts.
Guided Ex 1: Is the data quantitative or qualitative?
Age Weight of a letter Temperature
12Larson/Farber 4th ed.
Data Sets
Population The collection of all outcomes, responses, measurements, or counts that are of interest.
Sample A subset of the population.
13Larson/Farber 4th ed.
Exercise 2: Identifying Data Sets
In a recent survey, 1708 adults in the United States were asked if they think global warming is a problem that requires immediate government action. Nine hundred thirty-nine of the adults said yes.
1. Identify the population and the sample.
2. Describe the data set. (Adapted from: Pew Research Center)
14Larson/Farber 4th ed.
Solution: Identifying Data Sets
• The population consists of the responses of all adults in the U.S.
• The sample consists of the responses of the 1708 adults in the U.S. in the survey.
• The sample is a subset of the responses of all adults in the U.S.
• The data set consists of 939 yes’s and 769 no’s.
Responses of adults in the U.S. (population)
Responses of adults in survey (sample)
15Larson/Farber 4th ed.
Parameter and Statistic
Parameter
A number that describes a population characteristic.
Average age of all people in the United States
Statistic A number that describes a sample
characteristic.Average age of people from a sample of three states
16Larson/Farber 4th ed.
Example: Distinguish Parameter and Statistic
Decide whether the numerical value describes a population parameter or a sample statistic.
1. A recent survey of a sample of MBAs reported that the average salary for an MBA is more than $82,000. (Source: The Wall Street Journal)
Solution:Sample statistic (the average of $82,000 is based on a subset of the population)
17Larson/Farber 4th ed.
Guided Exercise 2
Decide whether the numerical value describes a population parameter or a sample statistic.
2. Starting salaries for the 667 MBA graduates from the University of Chicago Graduate School of Business increased 8.5% from the previous year.
Solution:Population parameter (the percent increase of 8.5% is based on all 667 graduates’ starting salaries)
18Larson/Farber 4th ed.
Branches of Statistics
Descriptive Statistics Involves organizing, summarizing, and displaying data.
e.g. Tables, charts, averages
Inferential Statistics Involves using sample data to draw conclusions about a population.
19Larson/Farber 4th ed.
Example: Descriptive and Inferential Statistics
Decide which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics?
A large sample of men, aged 48, was studied for 18 years. For unmarried men, approximately 70% were alive at age 65. For married men, 90% were alive at age 65. (Source: The Journal of Family Issues)
20Larson/Farber 4th ed.
Solution: Descriptive and Inferential Statistics
Descriptive statistics involves statements such as “For unmarried men, approximately 70% were alive at age 65” and “For married men, 90% were alive at 65.”
A possible inference drawn from the study is that being married is associated with a longer life for men.
21Larson/Farber 4th ed.
Example: Classifying Data by Type
The base prices of several vehicles are shown in the table. Which data are qualitative data and which are quantitative data? (Source Ford Motor Company)
22Larson/Farber 4th ed.
Solution: Classifying Data by Type
Quantitative Data (Base prices of vehicles models are numerical entries)
Qualitative Data (Names of vehicle models are nonnumerical entries)
23Larson/Farber 4th ed.
Levels of Measurement
Nominal level of measurement• Qualitative data only• Categorized using names, labels, or qualities• No mathematical computations can be made• Example:
24Larson/Farber 4th ed.
Which of the following food items do you tend to buy at least once per month? (Please tick)
Okra Palm Oil Milled Rice
Peppers Prawns Almond Milk
Levels of Measurement
Ordinal level of measurement • Qualitative or quantitative data • Data can be arranged in order• Differences between data entries is not meaningful• Example:
25Larson/Farber 4th ed.
Order of preference
Brand of Pesticide
1 Rambo2 R.I.P.3 BugsBeGone4 D.O.A.5 BestBugSpray
Example: Classifying Data by Level
Two data sets are shown. Which data set consists of data at the nominal level? Which data set consists of data at the ordinal level? (Source: Nielsen Media Research)
26Larson/Farber 4th ed.
Solution: Classifying Data by Level
Ordinal level (lists the rank of five TV programs. Data can be ordered. Difference between ranks is not meaningful.)
Nominal level (lists the call letters of each network affiliate. Call letters are names of network affiliates.)
27Larson/Farber 4th ed.
Levels of Measurement
Interval level of measurement• Quantitative data• Data can be ordered• Differences between data entries is meaningful• Zero represents a position on a scale (not an inherent
zero – zero does not imply “none”)
Example: Time of day on a 12-hour clock
Example: Body temperatures in degrees Celsius
28Larson/Farber 4th ed.
Levels of Measurement
Ratio level of measurement• Zero entry is an inherent zero (implies “none”)• A ratio of two data values can be formed • One data value can be expressed as a multiple of
another• Examples: RULER: inches or centimeters ,YEARS
of work experience, INCOME: money earned last year , NUMBER of children, GPA: grade point average, TEMPERATURE: degrees Kelvin
• Person who earns $2K/ week earns twice as much as person who earns $1K / week
29Larson/Farber 4th ed.
Levels of Measurement
30Larson/Farber 4th ed.
Measurement at the interval or ratio level is desirable because we can use the more powerful statistical procedures available for Means and Standard Deviations.
To have this advantage, often ordinal data are treated as though they were interval; for example, subjective ratings scales (1 = terrible, 2= poor, 3 = fair, 4 = good, 5 = excellent).
Example: Classifying Data by Level
Two data sets are shown. Which data set consists of data at the interval level? Which data set consists of data at the ratio level? (Source: Major League Baseball)
31Larson/Farber 4th ed.
Solution: Classifying Data by Level
Interval level (Quantitative data. Can find a difference between two dates, but a ratio does not make sense.)
Ratio level (Can find differences and write ratios.)
32Larson/Farber 4th ed.
Guided Exercise 3
33Larson/Farber 4th ed.
:
State the level of measurement for each of the following:
Measurement LevelThe senator’s name is Sam Wilson. The senator is 58 years old. The senator was elected in 1963, 1969, 1981, and 1994.
His taxable income is $278,314.19 The senator is married. The senator had divorces in 1965 and 1982.
A newspaper ranked the senator 7th for his voting record on public education
Exercise 4: Summary of Four Levels of Measurement
Level ofMeasurement
Put data in
categories
Arrangedata inorder
Subtractdata
values
Determine if one data value is a
multiple of another
Nominal
Ordinal
Interval
Ratio
34Larson/Farber 4th ed.
Summary of Four Levels of Measurement
Level ofMeasurement
Put data in
categories
Arrangedata inorder
Subtractdata
values
Determine if one data value is a
multiple of another
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
35Larson/Farber 4th ed.
Section 1.2
Random Samples
36Larson/Farber 4th ed.
Section 1.2 Objectives
• Explain the importance of random samples• Construct a simple random sample using random
numbers• Simulate a random process• Describe stratified sampling, cluster sampling,
systematic sampling, multi-stage and convenience sampling
37Larson/Farber 4th ed.
Sampling TechniquesSimple Random Sample• Every possible sample of the same size has the same chance of being selected.• Every individual of the population has an equal chance of being selected.
x xx
xx
xx
x x
x
xx
xx
x
x x
xxx
x
xx
xx xx x
xx
x
x
xxx
xx
x
x x
xxx
x
xx
xx xx x
xx
x
x
xx
xx
x
x
x x
xxx
x
xx
xx xx x
x x
x
xxx
xx
x
x x
xxx
x
xx
xx xx x
x x
x
x
x xx
xx
xx
x
x
38Larson/Farber 4th ed.
Simple Random Sample
• Random numbers can be generated by a random number table, a software program or a calculator.
• Assign a number to each member of the population.
• Members of the population that correspond to these numbers become members of the sample.
39Larson/Farber 4th ed.
Guided Ex 1: Simple Random Sample
There are 731 students currently enrolled in statistics at your school. You wish to form a sample of eight students to answer some survey questions. Select the students who will belong to the simple random sample.
• Assign numbers 1 to 731 to each student taking statistics.
• On the table of random numbers, choose a starting place at random (suppose you start in the third row, second column.)
40Larson/Farber 4th ed.
Solution: Simple Random Sample
• Read the digits in groups of three• Ignore numbers greater than 731
The students assigned numbers 719, 662, 650, 4, 53, 589, 403, and 129 would make up the sample.
41Larson/Farber 4th ed.
What is a Simulation?
42Larson/Farber 4th ed.
A simulation is a numerical facsimile or representation of a real-world phenomenon
Exercise 1: Simple Random Samples
43Larson/Farber 4th ed.
Use a random number table to simulate each of the following:a. Choose the numbers for the next lottery. That is,
randomly choose six numbers from 1 to 52b. The outcomes of tossing a die 20 times.
Exercise 2: TI-83 Random Number Generation
44Larson/Farber 4th ed.
Exercise 3: Coin Toss Simulation using TI-83
Other Sampling Techniques
Stratified Sample• Divide a population into groups (strata) and select a
random sample from each group.
• To collect a stratified sample of the number of people who live in West Ridge County households, you could divide the households into socioeconomic levels and then randomly select households from each level.
46Larson/Farber 4th ed.
Other Sampling Techniques
Stratified Sample
• We stratify to ensure that our sample represents different groups in the population
• Result: reduced sample variability• Samples taken within a stratum vary less, so our
estimates can be more precise• Example: If we stratify by sex, we can create the
sample so that the proportions of men and women within our sample match the proportions in the population
47Larson/Farber 4th ed.
Other Sampling Techniques
Cluster Sample• Divide the population into groups (clusters) and
select all of the members in one or more, but not all, of the clusters.
• In the West Ridge County example you could divide the households into clusters according to zip codes, then select all the households in one or more, but not all, zip codes.
48Larson/Farber 4th ed.
Other Sampling Techniques
Cluster Sample• If each cluster represents the population fairly,
cluster sampling will be unbiased.• Clusters are internally heterogeneous, each
resembling the overall population.• We select clusters to make sampling more
practical or affordable.
49Larson/Farber 4th ed.
Non-Random Sampling Technique
Convenience Sample• One of the main types of non-probability
sampling methods• Made up of people who are easy to reach • Usually not representative of population• Example: A pollster interviews shoppers at a local
mall. If the mall was chosen because it was a convenient site from which to solicit survey participants, this would be a convenience sample.
50Larson/Farber 4th ed.
Other Sampling Techniques
Systematic Sample• Choose a starting value at random. Then choose
every kth member of the population.
• In the West Ridge County example you could assign a different number to each household, randomly choose a starting number, then select every 100th household.
51Larson/Farber 4th ed.
Exercise 4: Identifying Sampling Techniques
You are doing a study to determine the opinion of students at your school regarding stem cell research. Identify the sampling technique used:
You divide the student population with respect to majors and randomly select and question some students in each major.
Solution:Stratified sampling (the students are divided into strata (majors) and a sample is selected from each major)
52Larson/Farber 4th ed.
Sampling Terminology
Sampling Frame: • a list of individuals from which a sample is actually
selected• ideally, should match the population
Example: When doing a phone survey, the sampling frame might be the phone book
53Larson/Farber 4th ed.
Sampling Undercoverage
Undercoverage: the condition resulting from omitting population members from the sample frame
Example: The phone book might not be representative of all residents of a community
54Larson/Farber 4th ed.
Population
Sampling Frame
Sampling Terminology
Sampling Error is the mismatch between • measurements taken from samples• corresponding measurements taken from the
respective population• sampling errors do not represent mistakes!
Nonsampling Error results from- poor sample design- sloppy data collection techniques- bias in questions- nonsampling errors are inadvertent errors
55Larson/Farber 4th ed.
Multi-Stage Sampling
56Larson/Farber 4th ed.
Multi-stage sampling involves selecting a sample in at least two stages
• Successively smaller groups are created at each stage
• Most surveys conducted by professional polling organizations use some combination of stratified and cluster sampling as well as simple random samples
Example: Multi-Stage Sampling
• Suppose that college freshmen are housed in separate freshman dorms
• Within a freshman dorm, men and women are housed on alternating floors
• You wish to sample their attitudes about the campus food by going to dorms at random, but you are still concerned about possible differences between men and women
• Can you design a suitable sampling plan?
57Larson/Farber 4th ed.
Possible Solution: Multi-Stage Sampling
• Plan: Stratify sample dorms by sex• Stage 1: Select some freshman dorms at random • Stage 2: Within each dorm, select floors (e.g.
sexes) at random • Treat each floor as a cluster and interview
everyone on that floor
58Larson/Farber 4th ed.
Exercise 4: Sampling Strategy
In order to try to gauge freshman opinion about the food served on campus, Food Services suggests that you just stand outside the school cafeteria at lunchtime and stop people to ask them questions.
Critical Thinking:
1. What’s wrong with this sampling strategy?
2. Suggest some possible ways to improve the quality of the data.
59Larson/Farber 4th ed.
Section 1.2 Summary
• Describe simple random samples• Construct a simple random sample using random
numbers• Simulate a random process• Describe stratified sampling, cluster sampling,
systematic sampling, multi-stage and convenience sampling
60Larson/Farber 4th ed.
Section 1.3
Experimental Design
61Larson/Farber 4th ed.
Section 1.3 Objectives
• Discuss what it means to take a census• Describe simulations, observational studies and
experiments• Identify control groups, placebo effects, completely
randomized experiments• Discuss potential pitfalls that might your data
unreliable
62Larson/Farber 4th ed.
Designing a Statistical Study
63Larson/Farber 4th ed.
In a statistical study, the researcher will• Take measurements, or survey the population of
interest• Observe or manipulate the samples in some manner
Exercise 1: Designing a Statistical Study
64Larson/Farber 4th ed.
A group of students is interested in knowing if the number of times they can sink a basketball is related to the color of the basketball. The students shoot a series of baskets and record their success using a regulation colored basketball. They then switch to a blue colored basketball and shoot the same series of baskets. A statistical analysis is performed.
What steps do you think might be needed in order to carry out such a statistical study?
Designing a Statistical Study
3. Collect the data.
4. Describe the data using descriptive statistics techniques.
5. Interpret the data and make decisions about the population using inferential statistics.
6. Identify any possible errors.
1. Identify the variable(s) of interest (the focus) and the population of the study.
2. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population.
65Larson/Farber 4th ed.
Ways to Produce Data
• Census- measurements/ observations of entire population
• Sampling- Measurements/observations from representative
part of population• Simulation• Experiment
- impose treatment, then measure variable of interest• Observational Study• Survey
66Larson/Farber 4th ed.
Data Collection
Observational study • A researcher observes and measures characteristics of
interest of part of a population.• Example:
Researchers observed and recorded the mouthing behavior on nonfood objects of children up to three years old. (Source: Pediatric Magazine)
67Larson/Farber 4th ed.
Data Collection
Experiment• A treatment is applied to part of a population and
responses are observed.• Example:
An experiment was performed in which diabetics took cinnamon extract daily while a control group took none. After 40 days, the diabetics who had the cinnamon reduced their risk of heart disease while the control group experienced no change. (Source: Diabetes Care)
68Larson/Farber 4th ed.
Data Collection
Simulation • Uses a mathematical or physical model to reproduce
the conditions of a situation or process.• Computers are often used• Examples:
Automobile manufacturers use simulations with dummies to study the effects of crashes on humans.
Airlines use simulators to train pilots on different models of aircraft
69Larson/Farber 4th ed.
Example: Methods of Data Collection
Consider the following statistical studies. Which method of data collection would you use to collect data for each study?
1. A study of the effect of changing flight patterns on the number of airplane accidents.
Solution:Simulation (It is impractical to create this situation)
70Larson/Farber 4th ed.
Example: Methods of Data Collection
2. A study of the effect of eating oatmeal on lowering blood pressure.
Solution:Experiment (Measure the effect of a treatment – eating oatmeal)
71Larson/Farber 4th ed.
Example: Methods of Data Collection
Solution:Observational study (observe and measure certain characteristics of part of a population)
3. A study of how fourth grade students solve a puzzle.
72Larson/Farber 4th ed.
Example: Methods of Data Collection
Solution:Survey (Ask “Do you approve of the way the president is handling his job?”)
4. A study of U.S. residents’ approval rating of the U.S. president.
73Larson/Farber 4th ed.
Data Collection
Survey• An investigation of one or more characteristics of a
population.• Commonly done by interview, mail, or telephone.• Example: A survey is conducted on a sample of
female physicians to determine whether the primary reason for their career choice is financial stability.
74Larson/Farber 4th ed.
Surveys: Potential Pitfalls
• Non-response• Truthfulness of Response• Hidden Bias• Vague Wording
e.g. “often”, “seldom”, “occasionally”•Interviewer Influence
Tone of voice, body language, dress• Voluntary / Non-voluntary response
75Larson/Farber 4th ed.
Exercise 2: Elements of Experimental Design
Identify some activities that must be carried out in order to conduct an experiment:
• Randomly assign subjects to treatments• Manipulate treatment factors (e.g. amount of
medication dispensed)• Compare responses of subject groups across different
treatments
76Larson/Farber 4th ed.
Key Elements of Experimental Design: Control
• Control Group Used to account for the influence of other known or
unknown variables that might be the cause of an underlying response in an experimental group
Control groups frequently receive a dummy treatment
77Larson/Farber 4th ed.
Key Elements of Experimental Design:
Confounding variables Occurs when an experimenter cannot tell the
difference between the effects of different factors on a variable.
Example: A coffee shop owner remodels her shop at the same time a nearby mall has its grand opening. If business at the coffee shop increases, it cannot be determined whether it is because of the remodeling or the new mall.
78Larson/Farber 4th ed.
Experimental Design: Lurking Variables
MEASURED VARIABLES EXCLUDED
Gasoline (gallons) Commute Time (minutes) Traffic Congestion [ LURKING ]2 32 low3 45 moderate4 55 high
79Larson/Farber 4th ed.
Lurking Variable
• Variable for which no data have been collected• Variable which has impact on other variables in a
study
Key Elements of Experimental Design: Control
• Placebo effect A subject reacts favorably to a placebo when in
fact he or she has been given no medical treatment at all.
Blinding is a technique where the subject does not know whether he or she is receiving a treatment or a placebo.
Double-blind experiment neither the subject nor the experimenter knows if the subject is receiving a treatment or a placebo.
80Larson/Farber 4th ed.
Key Elements of Experimental Design: Randomization
• Randomization is a process of randomly assigning subjects to different treatment groups.
• Completely randomized design All subjects are assigned to different treatment
groups through random selection.• Randomized block design
Divide subjects with similar characteristics into blocks, and then within each block, randomly assign subjects to treatment groups.
81Larson/Farber 4th ed.
Key Elements of Experimental Design: Randomization
Randomized block design• An experimenter testing the effects of a new weight
loss drink may first divide the subjects into age categories. Then within each age group, randomly assign subjects to either the treatment group or control group.
82Larson/Farber 4th ed.
Key Elements of Experimental Design: Randomization
• Matched Pairs Design Subjects are paired up according to a similarity.
One subject in the pair is randomly selected to receive one treatment while the other subject receives a different treatment.
Example: Subjects exposed to the same toxins at a work site might be paired together
83Larson/Farber 4th ed.
Key Elements of Experimental Design: Replication
• Replication is the repetition of an experiment using a large group of subjects.
• Example: To test a vaccine against a strain of influenza, 10,000 people are given the vaccine and another 10,000 people are given a placebo. Because of the sample size, the effectiveness of the vaccine would most likely be observed. We are making two complete re-runs of the experiment!
84Larson/Farber 4th ed.
Example: Experimental Design
A company wants to test the effectiveness of a new gum developed to help people quit smoking. Identify a potential problem with the given experimental design and suggest a way to improve it.
The company identifies one thousand adults who are heavy smokers. The subjects are divided into blocks according to gender. After two months, the female group has a significant number of subjects who have quit smoking.
85Larson/Farber 4th ed.
Solution: Experimental Design
Problem:
The groups are not similar. The new gum may have a greater effect on women than men, or vice versa.
Correction:
The subjects can be divided into blocks according to gender, but then within each block, they must be randomly assigned to be in the treatment group or the control group.
86Larson/Farber 4th ed.
Example: Bad Experimental Design
87Larson/Farber 4th ed.
Goal: Test effectiveness of fertilizer on different plots of land
Issue: There will be no way to know whether the results are attributable to the treatment or soil type
Example: Randomized Block Design
88Larson/Farber 4th ed.
Within each block, use randomness to determine which trees are treated with fertilizer and which trees are not treated
Example: Completely Randomized Design
89Larson/Farber 4th ed.
Use randomness to determine which trees are treated with fertilizer and which are not
Exercise 3: Identify Best Data Gathering Technique
90Larson/Farber 4th ed.
a. Study the effect of stopping the cooling process in a nuclear reactor.
b. Study the amount of time college students taking a full course load would spend watching TV.
c. Study the effect of a calcium supplement given to young girls on bone mass.
d. Study the number of academic clubs that every MATES student participates in
Exercise 4: Analyze a Statistical Study
91Larson/Farber 4th ed.
In 1778 Captain James Cook introduced goats to the Hawaiian Islands. It was later observed that the Silver Sword plant appeared to be less and less common. Botanists suspected the goats to be the cause and conducted a statistical study. They set up stations around the islands with similar climate and soil conditions. Each station consisted of two plots of land, one with a fence around it to keep the goats out.
Identify thea) Treatment b) Experimental Groupc) Control Group
Section 1.3 Summary
• Sampling techniques• Design of an experiment• Data collection techniques• What is a census?• Describe simulations, observational studies and
experiments• Identify control groups, placebo effects, completely
randomized experiments• Discuss potential pitfalls that might your data
unreliable
92Larson/Farber 4th ed.