Sociology 601(Martin) Lecture for week 2: September 9 - 11 Chapter 3.1: –Making Charts Chapter 3.2...

Sociology 601(Martin)Lecture for week 2: September 9 - 11

• Chapter 3.1:– Making Charts

• Chapter 3.2 – 3.5 (if time permits)– Measures of central tendency– Measures of variation

• Walk-through of the STATA graphic user interface.

Definitions for charts

• frequency distribution: a graph listing intervals of possible values for a variable (on the x-axis), and number of observations in each interval (on the y-axis).

• relative frequency distribution: as above, but the y-axis has the percent or proportion of observations in each interval.

• bar graph: the variable is ordinal or nominal scale.– The bars should not touch

• histogram: the variable is interval scale.– The bars should touch

General Rules for Relative Frequency Distributions

• Whether you are making a bar graph or histogram:– Make sure each observation is in one and only one category.– Use categories of equal width.– Choose an appealing number of categories.– Decide whether to provide labels – Double-check your graph.

• If you use fewer bars to describe the distribution of a variable, you lose information but gain clarity.

Example from Text, p. 36

• Murders per 100,000 population, by State for 1993Alabama 11.6 Louisiana 20.3 Ohio 6.0

Alaska 9.0 Maine 1.6 Oklahoma 8.4

Arizona 8.6 Maryland 12.7 Oregon 4.6

Arkansas 10.2 Massachusetts 3.9 Pennsylvania 6.8

California 13.1 Michigan 9.8 Rhode Island 3.9

Colorado 5.8 Minnesota 3.4 South Carolina 10.3

Connecticut 6.3 Mississippi 13.5 South Dakota 3.4

Delaware 5.0 Missouri 11.3 Tennessee 10.2

Florida 8.9 Montana 3.0 Texas 11.9

Georgia 11.4 Nebraska 3.9 Utah 3.1

Hawaii 3.8 Nevada 10.4 Vermont 3.6

Idaho 3.5 New Hampshire 2.0 Virginia 8.3

Illinois 11.4 New Jersey 5.3 Washington 5.2

Indiana 7.5 New Mexico 8.0 West Virginia 6.9

Iowa 2.3 New York 13.3 Wisconsin 4.4

Kansas 6.4 North Carolina 11.3 Wyoming 3.4

Kentucky 6.6 North Dakota 1.7

Frequency Distribution

• Murders per 100,000 population for 1993, by State

• What have we lost? What have we gained?

0

1

2

3

0 2 4 6 8 10 12 14 16 18 20

murder rate

nu

mb

er

of

sta

tes

Relative Frequency Distribution

• Murders per 100,000 population, by State

0

0.02

0.04

0.06

0 2 4 6 8 10 12 14 16 18 20

murder rate

rela

tive

fre

qu

ency

Collapsed Relative Frequency Distribution

• Murders per 100,000 population, by State

• What have we lost? What have we gained?

0

0.1

0.2

0.3

0-1.9 2-3.9 4-5.9 6-7.9 8-9.9 10-11.9

12-13.9

14-15.9

16-17.9

18-19.9

20-21.9

murder rate

rela

tive f

req

uen

cy

3.2: Measuring central tendency - mean

• Mean: sum of measurements divided by number of measurements.

• Equation for the mean of a sample:

• or, if you don’t have an equation editor,Ybar = SUM(Yi) / n

where…

Ybar is the sample mean

(Yi) is a measurement of Y for case i

n is the number of cases in the sample

n

YY

n

i 1

Weighted means

• Weighted sample mean: the sum of measurements divided by the number of observations, adjusted for the number of cases in each observation

– Example: we could weight the state murder rates by the number of persons in each state in 1993 to get the mean murder rate for persons in the US

• If n = 2 the equation for the weighted mean is

jjjweighted nYnY )(

)()( 212211 nnYnYnYweighted

3.3 Other measures of central tendency

• Median: the measurement that falls in the middle of an ordered sample– the median is the value of the 50th percentile

• Percentile: the number such that p% of scores fall below it and (100-p)% of scores fall above it

• Mode: the value that occurs most frequently

3.4: Measures of variation

• range: the difference between the largest and smallest observations

• interquartile range: the difference between the 25th and 75th percentile observation

• deviation: for any observation, the difference between that observation and the sample meanDi = Yi - Ybar

(one averaged measure of variation for a sample would be to take the mean of the absolute values of all the deviations for the sample)

Variance and standard deviation: the most common measures of variation

• variance: the mean of the squared deviations for a sample, labeled s2.

• standard deviation: the square root of the variance, or the root mean squared deviation, labeled s.

1

22

n

YYis 1

2

n

YYis

Practice: Calculate the mean, variance, and standard deviation.

yi ybar yi - ybar (yi – ybar)2 yi ybar yi - ybar (yi – ybar)2

1 1

2 2

3 3

3 3

4 4

4 4

7 7

8 48

Σyi Σ(yi – ybar)2 Σyi Σ(yi – ybar)2

ybar: s2: ybar: s2:

s: s:

Interpreting the standard deviation.

• s is (formally) the root mean squared deviation.

• s is one version of the typical distance of an observation from the sample mean.

• Because s accounts for squared deviations, it is affected by extreme scores.– Is this a desirable property?– Compare these samples: (-3,-3,+3,+3) vs (-2,-2,-2,+6)

• Generally, for a continuous quantitative variable Y about 68% of scores fall between Ybar - s and Ybar + s.

Interpreting sample statistics.

• Recall that…– A statistic is a single number estimated from a sample– A parameter is a single number that summarizes some

quality of a variable in a population.

• For means:– the population mean is (mu)

– The sample mean Ybar is an estimator of .

• For standard deviations– the population standard deviation is (sigma), – The sample standard deviation s is an estimator of .

A conceptual map of STATA

source ---------interface---------- output

.do file

outside data set

command window

log file

data editor results window

graphics

interactive data entry

pull-down menus

active data set

icons

The STATA windows environment - icons

– Open (use)– Save– Print Results– Begin Log– Start viewer– Bring results window to front– Bring graph window to front– Do-file editor– Data editor– Data browser– Clear– Break

The .do file: interface of choice for social research

• Icons within the .do file:– New– Open– Save– Print– Find – Cut – Copy – Paste – Undo– Do current file– Run current file

Sample commands in a .do file

use "I:\601Fall08\socy601data.dta", clear

summarize AGE

summarize AGE [weight=ADULTS]

tabulate AGE

tabulate AGE [weight=ADULTS]

clear

How to create a log file

• One approach is to use the log icon to start and stop a log.

• Another approach is to type the log-starting command into a .do file :

log using I:\601Fall08\week01hmwk.txt, replace

*. . . (your work here) . . .

log close

Sociology 601(Martin) Lecture for week 2: September 9 - 11 Chapter 3.1: –Making Charts Chapter 3.2...

Documents

Transcript of Sociology 601(Martin) Lecture for week 2: September 9 - 11 Chapter 3.1: –Making Charts Chapter 3.2...