Students will understand the definition of mean, median...
Transcript of Students will understand the definition of mean, median...
Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with
given set of numbers. Also, students will understand why some measures of central tendency are more accurate than others.
1. Which non-descriptive research method would be the best way to show how brain damage affects people’s ability to form new memories?
a. Case study c. Surveyb. Correlation d. Naturalistic observation
2. Jocelyn is interested in a rare form of phobia. She is particularly interested in the factors associated with the development of this phobia. The research method that would be most useful for her is
a. Field Study c. Case Studyb. Experiment d. Naturalist observation
3. Harvard Business School is famous for teaching MBA students about business by using the case study method. If a Harvard MBA tried to apply knowledge from a case study to a new situation, the MBA should keep in mind that case studies may not bea. Detailed c. Specificb. Unique d. Representative
4. The scatterplot to the right is most closely representative of which correlation coefficient?a. +.10 c. +.89b. -.40 d. +.45
Statistics
• Recording the results from our studies.
• Must use a common language so we all know what we are talking about.
Descriptive Statistics
• Just describes sets of data.
• You might create a frequency distribution.
• Bargraphs or histograms.
Tools for Describing DataThe bar graph is one simple display
method but even this tool can be manipulated.
Our brand of truck is better!
Our brand of truck is not so different…
Why is there a difference in the apparent result?
8
• The frequency (f) of a particular observation is the number of times the observation occurs in the data.
• The distribution of a variable is the pattern of frequencies of the observation.
• Frequency distributions are generally reported in tables or histograms
• Histogram: A graph that consists of a series of columns, each having a class interval as its base and frequency of occurrence as its height.
Frequency distribution and histograms
7-8 8-9 9-10 10-11 11-12 12-1 1-2 2-3 3 +
Time of day
Nu
mb
er
of
tim
es c
allin
g o
ut
8
6
4
2
0
12
10
Histograms vs. bar graphs• “Histograms look a lot like bar graphs.”• Think of histograms as "sorting bins." You
have one variable, and you sort data by this variable by placing them into "bins."
• Then you count how many pieces of data are in each bin. The height of the rectangle you draw on top of each bin is proportional to the number of pieces in that bin.
We want to compare total revenues of five different companies.Key question: What is the revenue for each company?
Bar graph
We want to compare heights of ten oak trees in a city park.Key question: What is the height of each tree?
Bar graph
We have measured revenues of several companies. We want to compare numbers of companies that make from 0 to 10,000; from 10,000 to 20,000;
from 20,000 to 30,000 and so on.Key question: How many companies are there in each class of revenues?
Histogram
We have measured several trees in a city park. We want to compare numbers of trees that are from 0 to 5 meters high; from 5 to 10; from 10
to 15 and so on.Key question: How many trees are there in each class of heights?
Histogram
Bar graph or Histogram? (Both allow you to compare groups.)
SCENARIO:- You are trying to decide if you want to take a class in school based on how the difficult the class is. You decide to use the grades of students who have taken the class previously as a measure of difficulty.- What are some ways of looking at the data to make your decision?
11
Measures of Central Tendency
Median: The middle score in a rank-ordered distribution.
If the median score is 85%, would you consider this an easy class?
What if you found out that the grades were 42, 44, 50, 85, 85, 85, 85?
Is median a great measure of central tendency?
12
Measures of Central Tendency
Mode: The most frequently occurring score in a distribution.
If you find a class with a mode of 86 would this be an easy class?
Here are the grades: 14,25,32,45,50,60,86,86.
Is mode a great measure of central tendency?
13
Measures of Central TendencyMean: The arithmetic average of scores
in a distribution obtained by adding the scores and then dividing by the number of scores that were added together.
You have found a class with a mean of 85 and have decided that this must be an easy class.
The grades were: 70,70,100,100. Would you feel confident that this an easy class?
Measures of Central Tendency
• It is important to always note which measure of central tendency is being reported. If it is a mean, one must consider whether a few atypical scores could be distorting it, or causing a skewed distribution.
• Skewed distribution: When scores don’t distribute themselves evenly around the center. There are a few extremely high or low scores.
15
Measures of Central Tendency
A Skewed Distribution
Central Tendency• Mean, Median and Mode.
• Watch out for extreme scores or outliers.
$25,000-Pam $25,000- Kevin$25,000- Angela$75,000- Andy$75,000- Dwight$75,000- Jim$350,000- Michael
Let’s look at the salaries of the employees at Dunder Mifflen Paper in Scranton:
Measures of central tendency are Quick and easy, but outliers may distort the numbers.
Normal Distribution
• In a normal distribution, the mean, median and mode are all the same.
The “Bell Curve”
Distributions
• Outliers skew distributions.
• If group has one high score, the curve has a positive skew (contains more low scores)
• If a group has a low outlier, the curve has a negative skew (contains more high scores)
Measures of variation
• Averages from scores with low variability are more reliable than those with high variability.
• Range: Difference between the highest and lowest scores in a distribution. Like with the mean, high and low scores could present a deceptively large range.
21
Measures of Variation
Standard Deviation:A computed measure of how much scores vary around the mean.Standard Deviation uses information from each score, so it better represents data.
Standard Deviation
• SCORES
18
20
24
25
33MEAN: 24
• Score-Mean
-6
-4
0
1
9
(Score-Mean)²
36
16
0
1
81
134
134
5
𝟐𝟔. 𝟖➢ 26.8 is the “variance”➢ Standard deviation is
the “square root of the variance.” (SD=5.17)
Variance: Gauges a spread of scores within a sample
26.8
Normal Curve
-3 -2 -1 0 +1 +2 +3
13.5 34 34 13.5
2.352.35
68%
95%
-Each mark represents one deviation away from the mean.-Numbers in red are the percentage of people whose score falls within each standard deviation.-68% of people will fall within 1 standard deviation from the mean.-95% of people will fall within 2 standard deviations from the mean.
0.150.15
99.7%
Normal Curve
9 14 19 24 29 34 39
13.5 34 34 13.52.35
68%
95%
-Using our numbers from our standard deviation exercise, the normal curve would look like this.68% would have scored within one standard deviation of the mean, or would have scored between 19 and 29.95% would have scored within two standard deviations, or between 14 and 34.
2.35
.15 .1599.7%
9/28/16
1. Create the Normal Curve template with data.
2. A shop foreman found it took 40 minutes to complete a task with a standard deviation of 5, and the times for completing the task are normally distributed. What percentage of workers will take 50 minutes or more to complete the task?
3. The scores from the AP Physics exam had an average of 82, with a standard deviation of 3. People who scored within 2 standard deviations of the mean had a score between ____ and _____
A B C
The three curves below represent standard deviations of 1, 2 and 3.
Which curve below would represent a standard deviation of 1? How do you know?
Which curve would represent a standard deviation of 3? How do you know?
ESTIMATING VARIANCE
THE GREATER THE VARIANCE IN RESULTS, THE GREATER THE STANDARD DEVIATION.
Standard deviation, the normal curve and baseball.
Weighing the odds…
• 2 High School punters
– Kicker A:
• mean distance: 40.0 yds
• Standard deviation: + 16 yds.
– Kicker B:
• mean distance: 34.5 yds.
• Standard deviation: + 4 yds.
• Which player do you play?• Kicker B – team will know
what to expect
Applying the conceptsTry, with the help of this rough drawing below, to describe
intelligence test scores at a high school and at a college
using the concepts of range and standard deviation.
Intelligence test scores at a high school
Intelligence test scores at a college
100
Want to take that class?• So, if you were told that the mean average
in a class was 85%, with a standard deviation of 5, would you feel confident that you would get a “B”?
• The smaller the standard deviation, the more closely the scores are packed near the mean, and the steeper the curve would appear.
• What percentage of students got a B or A?
• What would the standard deviation be if every score was the mean score?
84%
0
Z-scores• Sometimes being able to compare scores
from different distributions is important.
• Z-scores measure distance of a score from the mean in units of standard deviation.
• Scores below the mean have negative z-scores, and those above have positive z-scores.
• FOR EXAMPLE: Test: Mean = 80 & SD = 8
Phineas got a 72%: z-score of -1
Ferb got a 84%: z-score of +0.5
Direction of a Z-score
• The sign of any Z-score indicates the direction of a score: whether that observation fell above the mean (the positive direction) or below the mean (the negative direction)– If a raw score is below the mean, the z-
score will be negative, and vice versa
Comparing variables with very different observed units of measure
• Example of comparing an SAT score to an ACT score– Mary’s ACT score is 26. Jason’s SAT score
is 900. Who did better?
– The mean SAT score is 1000 with a standard deviation of 100 SAT points. The mean ACT score is 22 with a standard deviation of 2 ACT points.
Let’s find the z-scores
Jason: 900-1000
100
Mary: 26-22
2
• From these findings, we gather that Jason’s score is 1 standard deviation below the mean SAT score and Mary’s score is 2 standard deviations above the mean ACT score.
• Therefore, Mary’s score is relatively better.
Zx =
Zx =
=
=
-1
+2
Z = Score-meanSD
Interpreting the graph
• For any normally distributed variable:– 50% of the scores fall above the mean and
50% fall below.– Approximately 68% of the scores fall
within plus and minus 1 Z-score from the mean.
– Approximately 95% of the scores fall within plus and minus 2 Z-scores from the mean.
– 99.7% of the scores fall within plus and minus 3 Z-scores from the mean.
Z – Score Conclusions• Z-score is defined as the number of standard
deviations from the mean.• Z-score is useful in comparing variables with very
different observed units of measure.(Like measures of central tendency and variation – z-scores can describe.)
- HOWEVER -• Z-scores allow for precise predictions to be
made of how many of a population’s scores fall within a score range in a normal distribution.
(So they are also inferential, because they can infer what might happen in the future.)
Types of statistics• Descriptive statistics are used to reveal
patterns through the analysis of numeric data.
• Measures of Central tendency• Measures of variation• Z-scores
• Inferential statistics are used to draw conclusions and make predictions based on the analysis of numeric data.
• Z-scores• t-tests
– These types of stats help us determine if chance played a role in our findings.
37
Statistically Significant: a result is called statistically significant if it is unlikely to have occurred by chance.
“Magic number” is p ≤.05
This means you are 95% sure the results did not occur by chance.
When is a Difference
Significant?
Inferential Statistics•The purpose is to discover whether the finding can be applied to the larger population from which the sample was collected.
p-values
38
Making Inferences
1. Large, representative samples are better than biased samples.
2. Observations with low variability are more reliable than those with high variability.
3. Many cases that support your data are better than fewer cases.
POINT TO REMEMBER: Don’t be overly impressed by a few anecdotes. Generalizations based on a few unrepresentative cases are unreliable.
When is an Observed Difference Reliable?