The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of...
-
Upload
phungkhanh -
Category
Documents
-
view
220 -
download
1
Transcript of The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of...
![Page 1: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/1.jpg)
2/18/2009
1
Is the treatment appropriate for the response you want
to study?
� Is studying the effects of eating red meat on cholesterol
values in a group of middle-aged men a realistic way to
study factors affecting heart disease problem in humans?
� What about studying the effects of hair spray
on rats to determine what will happen
to women with big hair?
One pitfall of Experimentation: Lack
of realism
1
The pitfalls in Experimentation • Example 1: Suppose researchers want to determine if the
drug Ecstasy causes memory loss. One possible design would be to take a group of volunteers and randomly assign some to take Ecstasy on a regular basis, while the others are given a placebo. Test them periodically to see if the Ecstasy group experiences more memory problems than the placebo group.
• The obvious flaw in this experiment is that it is unethical(and actually also illegal) to administer a dangerous drug like Ecstasy, even if the subjects are volunteers. The only feasible design to seek answers to this particular research question would be an observational study.
2
The pitfalls in Experimentation
• Example 2: Suppose researchers want to
determine whether females wash their hair
more frequently than males.
• It is impossible to assign some subjects to be
female and others male, and so an experiment
is not an option here. Again, an observational
study would be the way to proceed.
3
Summary: Observations
• explanatory variable's values allowed to occur
naturally
• because of the possibility of lurking variables,
it is difficult to establish causation
• if possible, control for suspected lurking
variables by studying groups of similar
individuals separately
• some lurking variables are difficult to control
for; others may not be identified.
4
Summary: Experiments• explanatory variable's values are controlled by researchers (treatment is
imposed)
• randomized assignment to treatments automatically controls for all lurking variables.
• making subjects blind avoids placebo effect.
• making researchers blind avoids conscious or sub-conscious influences on their subjective assessment of responses.
• randomized controlled double-blind experiment is generally optimal for establishing causation.
• lack of realism may prevent researchers from generalizing experimental results to real life situations.
• non-compliance may undermine an experiment. Volunteer sample might solve (at least partially) this problem.
• some treatments are impossible, impractical, or unethical to impose.
5
Explanatory variablesHirsch and Johnston from the Smell & Taste Treatment and Research
Foundation in Chicago believe that the presence of floral scent can
improve a person’s learning ability in some situations. To test this
hypothesis, they set up an experiment in which each of 22 subjects
completed 2 sets of three pencil and paper mazes, one set while wearing a
floral-scented mask. Each subject wore a floral-scented mask and an
unscented mask, and the order was randomized. The researchers measured
the length of time it took each subject to complete the sets of mazes.
What is the explanatory variable?
a) The amount of scent.
b) Presence or absence of the floral scent.
c) Time to complete the pencil and paper mazes.
d) Whether the subject was able to complete the mazes quicker while wearing
the floral-scented mask.
6
![Page 2: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/2.jpg)
2/18/2009
2
Response variables
In the previous example what is the response
variable?
a) The amount of scent.
b)Presence or absence of the floral scent.
c) Time to complete the pencil and paper mazes.
d)Whether the subject was able to complete the
mazes quicker while wearing the floral-
scented mask.7
Individuals
In the previous example what are the
individuals?
a) The masks (floral-scented or unscented).
b)The 22 subjects.
c) The mazes.
8
Control
In the previous example the researchers
incorporated control/comparison by
a) Giving each subject a floral-scented and an
unscented mask.
b)Randomly assigning half of the subjects to
wear a floral-scented mask only and the other
half to wear the unscented mask only.
c) Giving each subject two sets of mazes.
9
RandomizationIn the previous example the researchers
incorporated randomization by
a) Randomly selecting the subjects to participate in the study.
b)Randomly assigning half of the subjects to wear the floral-scented mask and the other half to wear an unscented mask.
c) Randomly assigning the order that each subject receives the floral-scented and unscented masks.
10
Replication
In the previous example the researchers
incorporated replication by
a) Using two masks.
b)Using two sets of mazes.
c) Using three mazes within each set.
d)Using twenty-two subjects.
e) Repeating the entire experiment a second time.
11
Variables
If age affects whether the presence of a floral
scent improves learning ability and was not
included among the variables studied in the
experiment, then age is
a) An explanatory variable.
b)A response variable.
c) A lurking variable.
d)Confounded with floral scent.12
![Page 3: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/3.jpg)
2/18/2009
3
Experimental design
The experimental design used in the floral scent
example is called a
a) Completely randomized design.
b)Randomized block design.
c) Matched pairs design.
13
Statistical significanceIf there is a statistically significant difference between
the average times to complete the mazes while
wearing the floral-scented mask and the unscented
mask, then the difference in average times to
complete the mazes between the floral-scented mask
and the unscented mask is
a) Too large to be due to chance alone.
b) Too small to be due to chance alone.
c) So large that we can reasonably attribute it to chance.
d) So small that it is likely due to chance.14
Experimental designAn Austrian study investigated whether maintaining a surgery
patient’s body temperature close to normal by heating the
patient during surgery decreases wound infection rates.
Patients included in the study were undergoing colon or rectal
surgery and were randomly assigned to one of two treatment
groups. In the normalthermic group, patients’ core
temperatures were maintained near normal 36.5 degrees
Celsius. In the hypothermic group, patients’ core temperatures
were allowed to decrease to about 34.5 degrees Celsius.
The design is called a
a) Completely randomized design
b) Randomized block design
c) Matched pairs design
15
Experimental designIn the previous experiment involving patients’
temperatures, both men and women were the
patients. If the men and women were
separately assigned to treatments, the design
would be a
a) Completely randomized design
b)Randomized block design
c) Matched pairs design
16
Randomized block designIn a randomized block design, a block contains
a) Individuals that are similar with respect to the characteristic that defines the block.
b) Individuals that are assigned to the same treatment.
c) Individuals that are similar with respect to the characteristic that defines the block and that are assigned to the same treatment.
17
Problems with experimentsA study claims that patients who receive surgery for intestinal
cancer live much longer after treatment than patients who are treated without surgery. However, doctors operated only on patients in relatively good condition so we cannot conclude from this study that surgery lengthens intestinal cancer patients’ lives.
This is an example of
a) Confounding.
b) A lurking variable.
c) A double-blind experiment.
d) The placebo effect.
18
![Page 4: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/4.jpg)
2/18/2009
4
Double-blind experimentsMedical experiments are often double-blind. This means
that
a) All individual data are kept confidential.
b) Neither the subject nor the doctor/administrator knows which treatment the subject receives.
c) Doctors are not allowed to decide which treatment a patient will receive; subjects are randomly assigned to treatments.
d) The subjects in the control group receive a placebo treatment.
19
ExperimentsAn advantage of experiments over observational
studies is
a) An experiment can provide evidence of cause and effect.
b)An experiment can compare two or more groups.
c) An experiment can include explanatory and response variables.
20
Experiments
Which of the following principles of good
experimentation does an observational study
not incorporate?
a) Control or comparison
b)Random assignment to treatments
c) Replication
21 22
Chapter 1Chapter 1Section1-2: Types
of Data
Chapter 2Chapter 2Summarizing and
Graphing Data
The Big Picture of Statistics
23
Types of Data
• Any data set contains information about some
group of individuals, and the information is
organized in variables.
– Individuals: people, animals or things
– Variables: weight, speed, age, color, gender,
concentration of a certain chemical, distance,
test scores, etc.
24
![Page 5: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/5.jpg)
2/18/2009
5
Variables
CategoricalPlaces an individual into
one of several categories
Examples:
Gender, color, favorite
movie, type of car,
religion, etc.
QuantitativeTakes numerical values (in a unit of
measurement) for which arithmetic operations make sense.
Examples: Height, weight, MPG, age, salary, etc.
Note: Quantitative variables
continuous discrete
can take on any can take on
numerical value only fixed values
in a range
25
Graphical DisplaysCategorical
• Pie charts
• Bar graphs
Quantitative
• Dotplot
• Stemplots
• Histograms
• Boxplots
(details later)
Show
individual
data points.
Okay for small
data sets.
Better for large
data sets.
26
GraphicalGraphical Displays Displays
for for
Categorical Variables Categorical Variables
27
1. Pie Charts
• Pie charts are useful for summarizing a single categorical variable if not too many categories
• A pie chart must include all the categories that make up a whole.
• Use a pie chart when
you want to emphasize
each category’s relation
to the whole.
28
2. Bar Graphs
• Bar graph uses a horizontal or vertical rectangular
bar that levels off at the appropriate level.
• Bar graphs are useful for summarizing one or two
categorical variables
and particularly useful
for making comparisons
when there are two
categorical variables.
29
GraphicalGraphical Displays Displays
for for
Quantitative Variables Quantitative Variables
30
![Page 6: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/6.jpg)
2/18/2009
6
1. Dotplots
• Work best when
– You have a relatively small data sets
– Want to see (approximately) individual values
– Want to see shape
– Have one group or small number of groups to
compare
31
1. Dotplots
• One Axis: only a horizontal axis
• Scale
– Tick marks with numerical labels
– Equally spaced
• Simply record a dot for each data point above an
appropriate axis.
• If the data value repeats, the dots are piled up at that
location, one dot for each repetition.
32
1. Dotplots
• Example: As part of a study on the effects of
calcium on blood pressure, the following 21
blood pressure readings were recorded
• 107,123,102,110,112,98,136,112,119,109,111,
112, 129,117,110,102,130,112,123,114,107
33
1. Dotplots
34
1. Dotplots
35
2. Stemplots
• are used for relatively small data sets of
quantitative variables.
• show exact values
36
![Page 7: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/7.jpg)
2/18/2009
7
Stemplot example
• Suppose we examine the following data: 55, 65,
66, 69, 71, 73, 79, 81, 83, 84, 84, 85, 86, 88, 89,
90, and 94
• The stems for these data are 5, 6, 7, 8, and 9
since the data start in the 50’s and end in the
90’s
37
Making the stemplot
5
6
7
8
9
Now, we record the leaves,
the ones digit for each value
38
Making the stemplot
Data set:
55, 65, 66, 69, 71, 73,
79, 81, 83, 84, 84,
85, 86, 88, 89, 90,
and 94
Stemplot:
5 5
6 569
7 139
8 13445689
9 04
39
Stemplots with split stem
• Split the stems so that the original stem
becomes two stems
– One for the digits 0, 1, 2, 3, 4 --placed on first
line of the stem
– One for digits 5, 6, 7, 8, 9 --placed on second
line of the stem
40
Making the split stemplot
Data set:
55, 65, 66, 69, 71, 73, 79, 81, 83, 84, 84, 85, 86, 88, 89, 90, and 94
Stemplot Split Stemplot:
5
5 5 5 5
6 569 6
7 139 6 569
8 13445689 7 13
9 04 7 98 1344
8 5689
9 04
9
41
Stemplots to compare distributions• Back-to to-back stemplots
• Example: speed of predators and nonpredators
Predator Nonpredator
1 12
2 05
900 3 025
2 4 00058
0 5
6
0 7
Nonpredator:
11, 12, 20, 25,
30, 32, 35, 40,
40, 40, 45, 48
Predator:
30, 30, 39, 42,
50, 70
42
![Page 8: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/8.jpg)
2/18/2009
8
Stemplots: Example
• The age of Best Actress Oscar winners:
– 34 34 26 37 42 41 35 31 41 33 30 74 33 49 38 61 21 41 26 80 43 29 33 35 45 49 39 34 26 25 35 33
• To make a stemplot:1. Separate each observation into a stem and a leaf.
2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column.
3. Go through the data points, and write each leaf in the row to the right of its stem.
4. Rearrange the leaves in an increasing order.
43
Stemplots: Example• The age of Best Actress Oscar winners:
34 34 26 37 42 41 35 31 41 33 30 74 33 49 38 61
21 41 26 80 43 29 33 35 45 49 39 34 26 25 35 33
Split stemplot
2|1
2|56669
3|013333444
3|555789
4|11123
4|599
5|
5|
6|1
6|
7|4
7|
8|0
44
3. Histograms
• Shows groups of cases as rectangles or bars
• A dot plot with bars
• All bars must be same width
• Bars must be
touching
45
Histogram
Work best
– With a large number of values to plot
– Do not need to see individual values exactly
– Want to see general shape
– One distribution or small number of
distributions to examine
46
Histograms
Two Axes
• Horizontal Axis : The variable that you are analyzing
• Vertical Axis: Frequency (or Relative Frequency--
percent)
47
To make a histogram
To make a histogram we choose the
• classes or bins (usually use 5-15 bins, depending on the number of items, and make bin sizes equal)
• tally the data into the classes or bins (put each data point into one and only one bin)
• count the number of items in each bin
• and then draw each bar with its height proportional to the number of items in its bin
frequency histogram
48
![Page 9: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/9.jpg)
2/18/2009
9
Note about the bins
How many bins should we use?
• No one right choice
– Too few bins will give a “skyscraper” graph
– Too many bins will give a “pancake” graph
• Neither choice will give a good picture of the
shape of the distribution
• Use your judgment
49 50
Example
• Step 1: Choose the bins (classes)
The data in Table 1.1 range from 17.0 to 44.2, so we decide to use 6 bins:
15.1-20.0
20.1-25.0
25.1-30.0
30.1-35.0
35.1-40.0
40.1-45.0
51
Example
• Step 2: count the individuals in each class
Class Count
15.1-20.0 5
20.1-25.0 21
25.1-30.0 14
30.1-35.0 9
35.1-40.0 1
40.1-45.0 1
52
Example
• Step 3: draw the histogram
– Mark the scale for the percent of the state’s adults
with college degree (horizontal axis). The scale
runs from 15-45.
– The vertical axis contains the scale of counts. The
scale runs from 0-21 (21 is the maximum count)
– Each bar represents a class—the bar height is the
class count
– No space between the bars unless a class is empty
53 54
![Page 10: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/10.jpg)
2/18/2009
10
Summary: Histograms• Shows groups of cases as
rectangles or bars
• A dot plot with bars
• All bars must be same width
• Bars must be touching
• No one right choice for the bins
– Too few bins will give a “skyscraper” graph
– Too many bins will give a “pancake” graph
• Use your judgment
Graphical Displays for Quantitative
Variables
5 5
6 569
7 139
8 13445689
9 04
Interpreting Graphical Displays
• Once the distribution has been displayed
graphically, we can describe the overall pattern
of the distribution and mention any striking
deviations from that pattern.
Shape
Look for:
• Symmetry/skewness of the distribution
• Peakedness (modality) - the number of peaks
(modes) the distribution has.
Examples of Symmetric Distributions
• Symmetric,
unimodal
distribution (one
peak)
• Example: test scores
Examples of Symmetric Distributions
• Symmetric,
bimodal
distribution (two
peaks)
• Example: life
expectancy in
Europe and Asia
![Page 11: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/11.jpg)
2/18/2009
11
Examples of Symmetric Distributions
• Symmetric,
uniform
distribution (no
peak)
• Example: random
numbers between 1
and 10 generated by
computer
Examples of Skewed Distributions
• Right –skewed
distribution
– Example: salary
• Left–skewed
distribution
– Example: age of death
from natural causes
Center
• The center of the distribution is its midpoint -
the value that divides the distribution so that
approximately half the observations take
smaller values, and approximately half the
observations take larger values.
– Note that from looking at the histogram we can get
only a rough estimate for the center of the
distribution.
Spread
• The spread (also called variability) of the
distribution can be described by the approximate
range covered by the data. From looking at the
histogram we can approximate the smallest
observation (min), and the largest observation
(Max), and thus approximate the range.
• Range=Max-Min.
– (More exact ways of finding measures of spread will
be discussed in the next section.)
Outliers
• Outliers are
observations that fall
outside the overall
pattern.
• One high outlier
Example
• Shape: Roughly symmetric
• Center is about 70
• Spread: range=approximate max-approximate min=95-45=50
• Outliers: no outliers
![Page 12: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/12.jpg)
2/18/2009
12
Another example
• Shape: right-
skewed
• Center: about
6-7%
• Spread: the
range is 27.5-
2.5=25%
Another example
• Shape: neither symmetric nor skewed. There are three clusters.
• Center: about
$22,000
• Spread:
range=32,500-
5,500=$27,000• Note: the center and
spread nor very useful
here.
Individuals vs. variables
Airport administrators take a sample of airline baggage and record the number of bags that weigh more than 75 pounds. What is the individual?
a) Number of bags weighing more than 75 pounds.
b)Average weight of the bags.
c) Each piece of baggage.
d)The airport administrators.
Individuals vs. variablesAirport administrators take a sample of airline
baggage and record the number of bags that weigh more than 75 pounds. What is the variable of interest?
a) Number of bags weighing more than 75 pounds.
b)Average weight of the bags.
c) Each piece of baggage.
d)The airport administrators.
Individuals vs. variables
In a study of commuting patterns of people in a large metropolitan area, respondents were asked to report the time they took to travel to their work on a specific day of the week. What is the individual?
a) Travel time.
b) A person.
c) Day of the week.
d) City in which they lived.
Individuals vs. variablesIn a study of commuting patterns of people in a
large metropolitan area, respondents were asked to report the time they took to travel to their work on a specific day of the week. What is the variable of interest?
a) Travel time.
b) A person.
c) Day of the week.
d) City in which they lived.
![Page 13: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/13.jpg)
2/18/2009
13
Categorical vs. quantitative variables
Would the variable “monthly rainfall in
Michigan” be considered a categorical or
quantitative variable?
a) categorical
b)quantitative
Categorical vs. quantitative variables
If we asked people to report their “weight,”
would that variable be considered a categorical
or quantitative variable?
a) categorical
b)quantitative
Categorical vs. quantitative variables
We then asked people to classify their weight as
underweight, normal, overweight, or obese.
Would this variable now be categorical or
quantitative variable?
a) categorical
b)quantitative
Categorical vs. quantitative variables
What type of data is produced by the answer choices for this question?
a) categorical
b) quantitative
How many times have you
accessed the Internet this
week?
1) None
2) Once or twice
3) Three or four times
4) More than four times
Graphing
For the Internet access data in the previous
question, what is the BEST method of
displaying the data?
a) bar graph
b)boxplot
c) histogram
d)scatterplot
Stemplots
In the dataset represented by the following stemplot, how many times does the number “28” occur? Leaf unit = 1.0.
a) 0
b) 1
c) 3
d) 4
0 9
1 246999
2 111134567888999
3 000112222345666699
4 001445
5 0014
6 7
7 3
![Page 14: The pitfalls in Experimentation Summary: Observationsan73773/SlidesClass2.pdfvalues in a group of middle-aged men a realistic way to study factors affecting heart disease problem in](https://reader034.fdocuments.us/reader034/viewer/2022051602/5b005ef57f8b9a84338c8483/html5/thumbnails/14.jpg)
2/18/2009
14
Histograms
Look at the following histogram. How many baseball players report a salary of less than $1,441,000?
a) 50
b)170
c) 220
d)350
HistogramsLook at the following histogram for salaries of
baseball players. What shape would you say the data take?
a) Bi-modal
b) Left-skewed
c) Right-skewed
d) Symmetric
e) Uniform