Introduction
description
Transcript of Introduction
![Page 1: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/1.jpg)
IntroductionPopulation – the entire group of concernSample – only a part of the wholeBased on sample, we’ll make a prediction about
the population.
Bad sampling: convenience, bias, voluntary
Good sampling: simple random sample(SRS).
Inferential Stats: making predictions orinferences about a population based on a sample
![Page 2: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/2.jpg)
ExperimentsObservation – no attempt to influenceExperiment– deliberately imposes some treatment
Basic design principles: Control the effects of lurking variables Randomize which subject gets which treatment Use large sample size to reduce chance variation
Statistical Significance: An observed effect so big that it would rarely occur just by chance.
![Page 3: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/3.jpg)
Picturing Distributions with Graphs
• Individuals – objects described by data– can be
• Variables– characteristic of individuals of particular interest– different values possible for different people
What makes up any set of data?
![Page 4: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/4.jpg)
Two kinds of variables
Categorical (Qualitative)– describes an individual by category or quality.– examples like
Numerical (Quantitative)– describes an individual by number or quantity.– discrete for variables that are– continuous for variables that are– examples like
![Page 5: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/5.jpg)
Describing Categorical Variables
Tables summarize the data set by– listing possible categories. – giving the number of objects in each category.– or show the count as a percentage.
Picture the distribution of a cat. var. with– Pie charts–Bar graphs
![Page 6: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/6.jpg)
Pie Chartswhole is split into appropriate pieces.
![Page 7: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/7.jpg)
Bar GraphHorizontal line keeps track of categorical values.
Vertical bars at each value keeps track of # or %.
A B C D E F
4
12
20#
5
15
25%
![Page 8: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/8.jpg)
Example 180 AASU students in an Elem. Stats class come from
one of four colleges (S & T, Edu, Health, Lib. Arts). The breakdown of these 80 students is given below.CollegeLiberal ArtsEducationHealth ProfessionsScience & TechnologyUndeclared
Count
17
432
23
480
Percent
![Page 9: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/9.jpg)
Ex1 - Pie ChartCollegeLib ArtsEduHealth
S & T
Undeclared
Count17
4
32
23
4
80
Percent21.25%
5%
40%
28.75%
5%
100%
![Page 10: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/10.jpg)
Ex1 – Bar Graph
LA E H ST
10
20
30
%CollegeLib Arts
Edu
Health
S& T
Undeclared
Count17
4
32
23
4
80
Percent21.25%
5%
40%
28.75%
5%
100%
U
![Page 11: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/11.jpg)
Describing Quantitative Variables
Tables summarize the data set by– listing possible intervals (ranges, classes).– giving the number of individuals in each class– or showing the number as a percentage.
Picture the distribution of a quant. var. with–Histogram (similar to bar graph but now vertical bars of neighboring classes touch)Where one class ends, the next begins.
![Page 12: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/12.jpg)
Example 2Consider the ages of the full-time faculty in the math dept. The breakdown of these 19 individuals is given in the table.
Age Class
20-3030-4040-5050-60
60-70
Count
5354
219
Percent
26.3%
15.8%
26.3%21.1%
10.5%
100%
10 30 50 70
10
20
30%
![Page 13: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/13.jpg)
Info from histograms
Helps to describe a distribution with– pattern (shape, center, spread)
– deviations (outliers) from the rest of the data• Could result from unusual observation or typo
– For shape, look at symmetric vs. skewed
![Page 14: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/14.jpg)
Examples 3 and 4
2 4 6 8
%
10
20 40 60 80
%
100
12
v
![Page 15: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/15.jpg)
Example 4 without outliers%
v
20 40 60 80 100
%
v
20 40 60 80 100v
105
105
30
20
v
![Page 16: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/16.jpg)
Describing Distributions with Numbers
Center: mean, median, mode
Spread: quartiles, standard deviation
There are better ways to describe a quantitative data set than by an estimation from a graph.
![Page 17: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/17.jpg)
Center: Mean
The mean of a data set is the arithmetic average ofall the observations.
nxxx ,,, 21 Given a data set:
![Page 18: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/18.jpg)
Mean – Example 1
Your test scores in a Stats Class are: 60, 75, 92, 80
Your mean score is:
![Page 19: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/19.jpg)
Mean – Example 2Compare high temperatures in Savannah for July
2010 and July 2011.
July 2010 high temps: 83, 87, 84, …, 97, 100, 92
31
9287832010
x
July 2011 high temps: 94, 91, 93, …, 97, 99, 99
31
9991942011
x
![Page 20: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/20.jpg)
Center: Median
The median of a data set is the middle value ofall the (ordered) observations.
nxxx ,,, 21 Given a data set:
![Page 21: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/21.jpg)
Median – Examples 3/4
11 tests: 60, 77, 92, 80, 84, 93, 80, 95, 65, 66, 75Ordered data set: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95
10 dice rolls: 2, 4, 5, 5, 6, 7, 7, 8, 9, 10
![Page 22: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/22.jpg)
Center: ModeThe mode of a data set is the value that appears the most.
Tests data set: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95
Dice rolls: 2, 4, 5, 5, 6, 7, 7, 8, 9, 10
2010 July High Temps mode:2011 July High Temps mode:
![Page 23: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/23.jpg)
Spread: Quartiles
A measure of center is not useful by itself–Are other observations close or far from center?
Take an ordered data set and find:–M, – Q1, – Q3, – IQR =
Summary of data in the “Five-Number Summary”:
![Page 24: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/24.jpg)
Quartiles – Example 511 tests: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95
5-num-sum:
Visualize 5-num-sum with a boxplot.• Draw rectangle with ends at Q1 and Q3.• Draw line in the box for the median.• Draw lines to the last observations within 1.5IQR of
the quartiles.• Observations outside 1.5IQR of the quartiles are
suspected outliers.
![Page 25: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/25.jpg)
Boxplot – Example 65-Num-Sum: 60, ____, 80, ____, 95
• Draw rectangle with ends at Q1 and Q3• Draw line in the box for the median• Draw lines to last observations within 1.5IQR of the quartiles• Observations outside 1.5IQR of the quartiles are suspected outliers
50 60 70 80 90 100
![Page 26: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/26.jpg)
Boxplot – Example 7July 2010 5-Num-Sum: 83, 92, 94, 97, 102
80 85 90 95 100 105
2010IQR = 97-92=5
July 2011 5-Num-Sum: 84, 91, 95, 98, 99
2010
20112011IQR = 98-91=7
![Page 27: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/27.jpg)
Spread: Standard DeviationMore common measure of spread (in conjunction
with the mean) is the standard deviation.
A single deviation from the mean looks like
For every value in a data set, deviations are either positive, negative or zero.
Finding an average of those will be trouble, since when you add the deviations together, you’ll get 0.
Example 1 data: 60, 75, 92, 80 75.76x
![Page 28: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/28.jpg)
To deal with this “adding to zero”, we get rid of any negative terms by squaring each deviation.
A single squared deviation from the mean looks like:
The average of the squared deviations is called the variance:
n-1 is called the degrees of freedom, since knowledge of the first (n-1) deviations will automatically set the last one.
![Page 29: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/29.jpg)
The standard deviation is the square root of the variance.
2s
1
2
n
xxs i
Observations Deviations Squared Dev
60
75
92
80
mean=76.75
s
![Page 30: Introduction](https://reader036.fdocuments.us/reader036/viewer/2022070421/568161b9550346895dd18c65/html5/thumbnails/30.jpg)
When to use what?For skewed data:
For (nearly) symmetric data:
Outliers have a big impact on mean and std. dev.
Consider two data sets:Set 1: 1, 1, 3, 5, 10
Set 2: 1, 1, 3, 5, 70