Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan...
-
Upload
charleen-joseph -
Category
Documents
-
view
212 -
download
0
Transcript of Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan...
![Page 1: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/1.jpg)
Sociology 5811:Lecture 3: Measures of Central
Tendency and Dispersion
Copyright © 2005 by Evan Schofer
Do not copy or distribute without permission
![Page 2: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/2.jpg)
Announcements
• First math problem set will be handed out in Lab on Monday…
• Due September 20
Today’s Class: • The Mean (and relevant mathematical notation)
• Measures of Dispersion
![Page 3: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/3.jpg)
Review: Variables / Notation
• Each column of a dataset is considered a variable
• We’ll refer to a column generically as “Y”
Person # Guns owned
1 0
2 3
3 0
4 1
5 1
The variable “Y”
Note: The total number of cases in
the dataset is referred to as “N”.
Here, N=5.
![Page 4: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/4.jpg)
Equation of Mean: Notation• Each case can be
identified a subscript• Yi represents “ith” case of
variable Y• i goes from 1 to N• Y1 = value of Y for first
case in spreadsheet• Y2 = value for second
case, etc.• YN = value for last case
Person # Guns owned (Y)
1 Y1 = 0
2 Y2 = 3
3 Y3 = 0
4 Y4 = 1
5 Y5 = 1
![Page 5: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/5.jpg)
Calculating the Mean
• Equation:
• 1. Mean of variable Y represented by Y with a line on top – called “Y-bar”
• 2. Equals sign means equals: “is calculated by the following…”
• 3. N refers to the total number of cases for which there is data
• Summation () – will be explained next…
N
i
iYN
Y1
1
![Page 6: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/6.jpg)
Equation of Mean: Summation
• Sigma (Σ): Summation– Indicates that you should add up a series of numbers
The thing on the right is the
item to be added
repeatedly
N
i
iY1
The things on top and bottom tell you how many times to add up Y-sub-i…
AND what numbers to
substitute for i.
![Page 7: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/7.jpg)
Equation of Mean: Summation
• 1. Start with bottom: i = 1.– The first number to add is Y-sub-1
N
i
iY1
1Y 2Y 5Y3Y 4Y
• 2. Then, allow i to increase by 1 – The second number to add is i = 2, then i = 3
• 3. Keep adding numbers until i = N– In this case N=5, so stop at 5
![Page 8: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/8.jpg)
Equation of the Mean: Example 2
• Can you calculate mean for gun ownership?
Person # Guns owned (Y)
1 Y1 = 0
2 Y2 = 3
3 Y3 = 0
4 Y4 = 1
5 Y5 = 1
N
i
iYN
Y1
1
• Answer:
155
1Y
![Page 9: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/9.jpg)
Properties of the Mean• The mean takes into account the value of every
case to determine what is “typical”– In contrast to the the mode & median– Probably the most commonly used measure of
“central tendency”• But, it is often good to look at median & mode also!
• Disadvantages– Every case influences outcome… even unusual ones– Extreme cases affect results a lot– The mean doesn’t give you any information on the
shape of the distribution• Cases could be very spread out, or very tightly clustered
![Page 10: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/10.jpg)
The Mean and Extreme Values
Case Num CD’s Num CD’s2
1 20 20
2 40 40
3 0 0
4 70 1000
Mean 32.5 265
• Extreme values affect the mean a lot:
Changing this one case really
affects the mean a lot
![Page 11: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/11.jpg)
Example 1
Number of CDs (Group 1)
200
175
150
125
100
75
50
25
0
16
14
12
10
8
6
4
20
Std. Dev = 21.72
Mean = 101
N = 23.00
• And, very different groups can have the same mean:
![Page 12: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/12.jpg)
Example 2
Number of CDs (Group 2)
200.0
175.0
150.0
125.0
100.0
75.0
50.0
25.0
0.0
6
5
4
3
2
1
0
Std. Dev = 67.62
Mean = 100.0
N = 23.00
![Page 13: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/13.jpg)
Example 3
Number of CDs (Group 3)
200
175
150
125
100
75
50
25
0
14
12
10
8
6
4
2
0
Std. Dev = 102.15
Mean = 104
N = 23.00
![Page 14: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/14.jpg)
Interpreting Dispersion
• Question: What are possible social interpretations of the different distributions (all with the same mean)?
• Example 1: Individuals cluster around 100
• Example 2: Individuals distributed sporadically over range 0-200
• Example 3: Individuals in two groups – near zero and near 200
![Page 15: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/15.jpg)
Measures of Dispersion
• Remember: Goal is to understand your variable…
• Center of the distribution is only part of the story
• Important issue:
• How “spread out” are the cases around the mean?– How “dispersed”, “varied” are your cases?– Are most cases like the “typical” case? Or not?
![Page 16: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/16.jpg)
Measures of Dispersion
• Some measures of dispersion:
• 1. Range– Also related: Minimum and Maximum
• 2. Average Absolute deviation
• 3. Variance
• 4. Standard deviation
![Page 17: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/17.jpg)
Minimum and Maximum
• Minimum: the lowest value of a variable represented in your data
• Maximum: the highest value of a variable represented in your data
• Example: In previous histograms about number of CDs owned, the minimum was 0, the maximum was 200.
![Page 18: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/18.jpg)
The Range
• The Range is calculated as the maximum minus the minimum– In case of CD ownership, 200 - 0 = 200
• Advantage:– Easy
• Disadvantage:– 1. Easily influenced by extreme values… may not
be representative – 2. Doesn’t tell you anything about the middle cases
![Page 19: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/19.jpg)
The Idea of Deviation
• Deviation: How much a particular case differs from the mean of all cases
• Deviation of zero indicates the case has the same value as the mean of all cases– Positive deviation: case has higher value than mean– Negative deviation: case has lower value than mean
• Extreme positive/negative indicates cases further from mean.
![Page 20: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/20.jpg)
Deviation of a Case
YYd ii • Formula:
• Literally, it is the distance from the mean (Y-bar)
![Page 21: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/21.jpg)
Deviation Example
Case Num CD’s Deviation from mean (32.5)
1 20 -12.5
2 40 7.5
3 0 -32.5
4 70 37.5
![Page 22: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/22.jpg)
Turning the Deviation into a Useful Measure of Dispersion
• Idea #1: Add it all up– The sum of deviation for all cases:
• What is sum of the following?-12.5, 7.5, -32.5, 37.5
• Problem: Sum of deviation is always zero– Because mean is the exact center of all cases– Cases equally deviate positively and negatively– Conclusion: You can’t measure dispersion this way
N
iid
1
![Page 23: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/23.jpg)
Turning the Deviation into a Useful Measure of Dispersion
• Idea #2: Sum up “absolute value” of deviation– Absolute value makes negative values positive– Designated by vertical bars:
N
iid
1
• What is sum?-12.5, 7.5, -32.5, 37.5
• Answer: 90– These 4 cases deviate by 90 cds from the mean
• Problem: Sum of Absolute Deviation grows larger if you have more cases…– Doesn’t allow comparison across samples
![Page 24: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/24.jpg)
Turning the Deviation into a Useful Measure of Dispersion
• Idea #3: The Average Absolute Deviation– Calculate the sum, divide by total N of cases– Gives the deviation of the average case
• Formula:
N
YY
N
dAAD
N
i
i
N
i
i
11
![Page 25: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/25.jpg)
Turning the Deviation into a Useful Measure of Dispersion
• Digression: Here we have used the mean to determine “typical” size of case deviations– Originally, I introduce the mean as a way to analyze
actual case values (e.g. # of CDs owned)– Now: Instead of looking at typical case values, we
want to know what sort of deviation is typical• In other words a statistic, the mean, is being used to analyze
another statistic – a deviation
– This is a general principle that we will use often: statistics can help us understand our raw data and also further summarize our statistical calculations!
![Page 26: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/26.jpg)
Average Absolute Deviation
• Example: Total Deviation = 90, N=4– What is Average absolute deviation?– Answer: 22.5
• Advantages– Very intuitive interpretation:
• Tells you how much cases differ from the mean, on average
• Disadvantages– Has non-ideal properties, according to statisticians
![Page 27: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/27.jpg)
Turning the Deviation into a Useful Measure of Dispersion
• Idea #4: Square the deviation to avoid problem of negative values– Sum of “squared” deviation– Divide by “N-1” (instead of N) to get the average
• Result: The “variance”:
1
)(
1
2
11
2
2
N
YY
N
ds
N
ii
N
ii
Y
![Page 28: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/28.jpg)
Calculating the Variance 1
Case Num CD’s (Y)
1 20
2 40
3 0
4 70
![Page 29: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/29.jpg)
Calculating the Variance 2
Case Num CD’s (Y)
Mean(Y bar)
1 20 32.5
2 40 32.5
3 0 32.5
4 70 32.5
![Page 30: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/30.jpg)
Calculating the Variance 3
Case Num CD’s (Y)
Mean(Y bar)
Deviation (d)
1 20 32.5 -12.5
2 40 32.5 7.5
3 0 32.5 -32.5
4 70 32.5 37.5
![Page 31: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/31.jpg)
Calculating the Variance 4
Case Num CD’s (Y)
Mean(Y bar)
Deviation (d)
Squared Deviation (d2)
1 20 32.5 -12.5 150
2 40 32.5 7.5 56.25
3 0 32.5 -32.5 1056.25
4 70 32.5 37.5 1406.25
![Page 32: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/32.jpg)
Calculating the Variance 5
• Variance = Average of “squared deviation”– Average = mean = sum up, divide by N– In this case, use N-1
• Sum of 150 + 56.25 + 1056.26 + 1406.25 = 2668.75
• Divide by N-1– N-1 = 4-1 = 3
• Compute variance:
• 2668.75 / 3 = 889.6 = variance = s2
![Page 33: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/33.jpg)
The Variance
• Properties of the variance– Zero if all points cluster exactly on the mean– Increases the further points lie from the mean– Comparable across samples of different size
• Advantages– 1. Provides a good measure of dispersion– 2. Better mathematical characteristics than the AAD
• Disadvantages:– 1. Not as easy to interpret as AAD– 2. Values get large, due to “squaring”
![Page 34: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/34.jpg)
Turning the Deviation into a Useful Measure of Dispersion
• Idea #5: Take square root of Variance to shrink it back down
• Result: Standard Deviation– Denoted by lower-case s– Most commonly used measure of dispersion
• Formula:
1
)( 2
12
N
YYss
N
ii
YY
![Page 35: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/35.jpg)
Calculating the Standard Deviation
• Simply take the square root of the variance
• Example:– Variance = 889.6– Square root of 889.6 = 29.8
• Properties:– Similar to Variance– Zero for perfectly concentrated distribution– Grows larger if cases are spread further from the mean– Comparable across different sample sizes
![Page 36: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/36.jpg)
Example 1: s = 21.72
Number of CDs (Group 1)
200
175
150
125
100
75
50
25
0
16
14
12
10
8
6
4
20
Std. Dev = 21.72
Mean = 101
N = 23.00
![Page 37: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/37.jpg)
Example 2: s = 67.62
Number of CDs (Group 2)
200.0
175.0
150.0
125.0
100.0
75.0
50.0
25.0
0.0
6
5
4
3
2
1
0
Std. Dev = 67.62
Mean = 100.0
N = 23.00
![Page 38: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/38.jpg)
Example 3: s = 102.15
Number of CDs (Group 3)
200
175
150
125
100
75
50
25
0
14
12
10
8
6
4
2
0
Std. Dev = 102.15
Mean = 104
N = 23.00
![Page 39: Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.](https://reader036.fdocuments.us/reader036/viewer/2022070412/56649f455503460f94c66f5c/html5/thumbnails/39.jpg)
Thinking About Dispersion• Suppose we observe that the standard deviation of
wealth is greater in the U.S. than in Sweden…– What can we conclude about the two countries?
• Guess which group has a higher standard deviation for income: Men or Women? Why?
• The standard deviation of a stock’s price is sometimes considered a measure of “risk”. Why?
• Suppose we polled people on two political issues and the S.D. was much higher for one
• What are some possible interpretations?
• What are some other examples where the deviation would provide useful information?