Distributions of Variables

Post on 19-Jan-2016

60 views 0 download

Tags:

description

Distributions of Variables. Properties of Variables Nominal Data & Bar Charts Ordinal Data Interval & Ratio Data, Histograms & Frequency Distributions Cumulative Frequency Distributions & Percentile Ranks. - PowerPoint PPT Presentation

Transcript of Distributions of Variables

Anthony J Greene 1

Distributions of Variables

I. Properties of Variables

II. Nominal Data & Bar Charts

III. Ordinal Data

IV. Interval & Ratio Data, Histograms & Frequency Distributions

V. Cumulative Frequency Distributions & Percentile Ranks

Anthony J Greene 2

Variables

Variable: A characteristic that takes on multiple values. I.e.,varies from one person or thing to another.

Anthony J Greene 3

Variables

Cause and Effect

• The Independent Variable

• The Dependent Variable

Anthony J Greene 4

Distributions

• The distribution of population data is called the population distribution or the distribution of the variable.

• The distribution of sample data is called a sample distribution.

Anthony J Greene 5

Variables

Anthony J Greene 6

VariablesKinds of Variables (any of which can be an independent

or dependent variable)

• Qualitative variable: A nonnumerically valued variable.

• Quantitative variable: A numerically valued variable.

• Discrete Variable: A quantitative variable whose possible values form a finite (or countably infinite) set of numbers.

• Continuous variable: A quantitative variable whose possible values form some interval of numbers.

Anthony J Greene 7

Quantitative Variables

• Discrete data: Data obtained by observing values of a discrete variable.

• Continuous data: Data obtained by observing values of a continuous variable.

Anthony J Greene 8

The Four Scales

• Nominal: Categories• Ordinal: Sequence• Interval: Mathematical Scale w/o a true zero• Ratio: Mathematical Scale with a true zero

Anthony J Greene 9

The Four Scales

• Nominal: Classes or Categories. Also called a Categorical scale. E.g., Catholic, Methodist, Jewish, Hindu, Buddhist, …

Qualitative Data

Anthony J Greene 10

The Four Scales

• Ordinal: Sequential Categories. e.g., 1st, 2nd, 3rd, … with no indication of the distance between classes

Discrete Data

Anthony J Greene 11

The Four Scales

• Interval: Data where equal spacing in the variable corresponds to equal spacing in the scale. E.g., 1940s, 1950s, 1960s… : or SAT Scores.

Discrete or Continuous

Anthony J Greene 12

The Four Scales

• Ratio: An interval scale with a mathematically meaningful zero. e.g., latencies of 1252 ms, 1856 ms, ….: mg of Prozac

Discrete or Continuous

Anthony J Greene 13

The Four Scales

Nominal: No mathematical operations

Ordinal: <, >, =

Interval: +, -, and ordinal operations

Ratio: , , and interval operations

Anthony J Greene 14

Nominal Variables

• Classes: Categories for grouping data. • Frequency: The number of observations that

fall in a class.• Frequency distribution: A listing of all

classes along with their frequencies.• Relative frequency: The ratio of the

frequency of a class to the total number of observations.

• Relative-frequency distribution: A listing of all classes along with their relative frequencies.

Anthony J Greene 15

Frequencies of Nominal Variables

Anthony J Greene 16

Sample Pie Charts and Bar Charts of Nominal Data

44%

23%

33%

DemocraticRepublicanOther

0

5

10

15

20

25

30

35

40

45

50

Republican Democratic Other

Anthony J Greene 17

Frequency Bar Charts

• Frequency bar chart: A graph that displays the independent variable on the horizontal axis -- categories -- and the frequencies -- dependent variable -- on the vertical axis. The frequency is represented by a vertical bar whose height is equal to the frequency of cases that fall within a given class of the I.V.

Anthony J Greene 18

Frequency Charts of Nominal Data

90

180

130

Democratic

RepublicanOther

0

20

40

60

80

100

120

140

160

180

200

Republican Democratic Other

Anthony J Greene 19

Relative Frequency Bar Charts• Relative-frequency bar chart: A graph that displays

the I.V. on the horizontal axis -- categories -- and the relative frequencies -- D.V. -- on the vertical axis. The relative frequency of each class is represented by a vertical bar whose height is equal to the relative frequency of the class.

• The difference between this and a frequency bar chart is that the proportion or percentage (always between zero and one) is listed instead of the numbers that fall into a given class.

Anthony J Greene 20

Relative Frequency Charts of Nominal Data

44%

23%

33%

DemocraticRepublicanOther

05

101520253035404550

%

Anthony J Greene 21

Probability Distribution and Probability Bar Chart

Frequency Distributions and Charts for a whole population

Probability distribution: A listing of the possible values and corresponding probabilities of a discrete random variable; or a formula for the probabilities.

Probability bar chart: A graph of the probability distribution that displays the possible values of a discrete random variable on the horizontal axis and the probabilities of those values on the vertical axis. The probability of each value is represented by a vertical bar whose height is equal to the probability.

Anthony J Greene 22

Probability Charts of Nominal Data

44%

23%

33%

DemocraticRepublicanOther

05

101520253035404550

Repub

lican

Democ

ratic

Other

%

Anthony J Greene 23

Bar Chart

Anthony J Greene 24

The Bar Graph: Nominal Data

Anthony J Greene 25

Sum of the Probabilities of a Discrete Random Variable

For any discrete random variable, X, the sum of the probabilities of its possible values equals 1; in symbols, we have P(X = x) = 1.

For example Republicans: 32.5%, Democrats 45.0%, Other 22.5%

0.325 + 0.450 + 0.225 = 1.00 or 100%

Anthony J Greene 26

Ordinal Variables

Note that “Rank” is the ordinal variable. “Mortality” is a ratio variable but can easily be downgraded to an ordinal variable with a loss of information

Anthony J Greene 27

Distributions and Charts for Ordinal Data

• Frequency distributions, relative frequency distribution, and probability distributions are done exactly as they were for Nominal Data

• Bar charts are used.

Anthony J Greene 28

Distribution of Education Level

Level P(x)

Elementary 0.03

High School 0.45

Associates 0.12

Bachelors 0.28

Masters 0.10

Doctorate 0.020

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Elemen

tary

High

Schoo

l

Assoc

iate

s

Bachelo

rs

Mas

ters

Docto

rate

Anthony J Greene 29

Interval and Ratio Data

• Frequency: The number of observations that fall in a class.

• Frequency distribution: A listing of all classes along with their frequencies.

• Relative frequency: The ratio of the frequency of a class to the total number of observations.

• Relative-frequency distribution: A listing of all classes along with their relative frequencies.

Anthony J Greene 30

Histograms

• Frequency histogram: A graph that displays the independent variable on the horizontal axis and the frequencies -- dependent variable -- on the vertical axis. The frequency is represented by a vertical bar whose height is equal to the frequency of cases that fall within a given range of the I.V.

Anthony J Greene 31

Interval and Ratio Variables

0

10

20

30

40

50

60

70

80

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Years of Education

Avg.Income (in thousands)

Anthony J Greene 32

Enrollment in Milwaukee Public Elementary Schools

Anthony J Greene 33

Relative Frequency distribution of Enrollments in MPS

Anthony J Greene 34

Probability distribution of a randomly selected elementary-school student

Anthony J Greene 35

Probability distribution of the age of a randomly selected student

Anthony J Greene 36

Probability Histogram

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

19 20 21 22 23 24 25 26 27

Anthony J Greene 37

Another Example

Anthony J Greene 38

Frequency vs. Relative Frequency

Anthony J Greene 39

Frequency vs. Relative Frequency

This is also the Probability Distribution

Anthony J Greene 40

More Examples:Frequency Histogram

Anthony J Greene 41

More Examples: Grouped Frequency Histogram

Anthony J Greene 42

Grouped Frequency Histogram

Anthony J Greene 43

Anthony J Greene 44

Proportions and Frequency

Anthony J Greene 45

Frequency Groupings

9 intervals with each interval 5 points wide. The frequency column (f) lists the number of individuals with scores in each of the class intervals.

Groupings: There had

to be a catch

• What to do with the in-betweens?

• Only a concern for continuous variables

• Real Limits -- those in the “14” bar are really from 13.5 to 14.5

• Upper Real Limits & Lower Real Limits: For the case of whole numbers, simply add 0.5 to the high score and subtract 0.5 from the lowest observed score (these observed scores are the “apparent limits”)

Anthony J Greene 47

Understanding Real Limits

Anthony J Greene 48

Real Limits & Apparent Limits

Anthony J Greene 49

Frequency & Cumulative FrequencyI.Q. Range Real Limits Frequency Cuml. Freq.

< 52 0 – 52.5 1 1

52-67 52.5-67.5 4 5

68-84 67.5-84.5 11 16

85-100 84.5-100.5 34 50

101-116 100.5-116.5 34 84

117-132 116.5-132.5 11 95

133-148 132.5-148.5 4 99

>148 148.5 + 1 100

Frequency (Normal Distribution)

0

5

10

15

20

25

30

35

40

>52 52-68 68-84 84-100 100-116 116-132 132-148 >148

Cumulative Frequency (Ogive)

0

10

20

30

40

50

60

70

80

90

100

>52 52-68 68-84 84-100 100-116 116-132 132-148 >148

Anthony J Greene 52

Computing Percentile RanksPounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 53

Computing Percentile Ranks

• Remember that each value has real limits

• What is the 90th %ile?

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 54

Computing Percentile Ranks

• Remember that each value has real limits

• What is the 90th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 55

Computing Percentile Ranks

• Remember that each value has real limits

• What is the 90th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5

• What is the 20th %ile?

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 56

Computing Percentile Ranks

• Remember that each value has real limits

• What is the 90th %ile? 2.5 because at or below “2” are 90% of the scores, but “2” includes all from 1.5 to 2.5

• What is the 20th %ile? 0.5

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 57

Computing Percentile Ranks

• What about the in-betweens?

• What is the 80th %ile?

• What %ile corresponds to 2 lbs?

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 58

Linear Interpolation

Anthony J Greene 59

Linear Interpolation

And Percentiles

• What is the 80th %ile?

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 60

Linear Interpolation

And Percentiles

• What is the 80th %ile?

Where’s the 80th %ile? 17.5/27.5 = 0.63. The interval is 1.0 lb, so 1.5 + 1(0.63) = 2.13

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 61

Linear Interpolation

And Percentiles

• What is the 80th %ile?

Where’s the 80th %ile? 17.5/27.5 = 0.63. The interval is 1.0 lb, so 1.5 + 1(0.63) = 2.13

• What %ile corresponds to 2 lbs?

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 62

Linear Interpolation

And Percentiles

• What is the 80th %ile?

Where’s the 80th %ile? 17.5/27.5 = 0.63. The interval is 1.0 lb, so 1.5 + 1(0.63) = 2.13

• What %ile corresponds to 2 lbs?

2 lbs. Is halfway into the interval (0.5). So its halfway between 62.5 - 90.0. Since 27.5% of the scores are in this interval we need to go up 0.5(27.5%) = 13.75%. 62.5% + 13.75% = 76.25%

Pounds x

Real Limits

Freq

f

RelativeFreq.

Cuml.Freq.

%ile

0 0-0.5 8 0.200 0.200 20.0

1 0.5-1.5 17 0.425 0.625 62.5

2 1.5-2.5 11 0.275 0.900 90.0

3 2.5-3.5 3 0.075 0.975 97.5

4 3.5-4.5 1 0.025 1.000 100

Anthony J Greene 63

The Stem & Leaf Diagram

Anthony J Greene 64

Stem & Leaf Plots

Anthony J Greene 65

Comparison of Frequency Histogram vs.

Stem & Leaf Diagram

Anthony J Greene 66

The Blocked Frequency Histogram

Anthony J Greene 67

The Frequency Distribution Polygon –or– Line Graph

Anthony J Greene 68

Grouped Frequency Polygon

Anthony J Greene 69

The Normal Distribution

Anthony J Greene 70

Variants on the Normal Distribution

Anthony J Greene 71

Comparing Two Distributions

Number of Sentences recalled from each category

Anthony J Greene 72

Comparing Distributions

Anthony J Greene 73

Distributions

Anthony J Greene 74

Variables and Distributions

In Class Exercise

Anthony J Greene 75

The Math You’ll Need To Know

Calculate:

ΣX =

ΣX2 =

(ΣX)2 =

X

1

2

0

4

Anthony J Greene 76

The Math You’ll Need To Know

Calculate:

ΣX = 7

ΣX2 =

(ΣX)2 =

X

1

2

0

4

Anthony J Greene 77

The Math You’ll Need To Know

Calculate:

ΣX = 7

ΣX2 = 21

(ΣX)2 =

X

1

2

0

4

Anthony J Greene 78

The Math You’ll Need To Know

Calculate:

ΣX = 7

ΣX2 = 21

(ΣX)2 = 49

X

1

2

0

4

Anthony J Greene 79

The Math You’ll Need To Know

Calculate:ΣX = ΣY = ΣX ΣY = ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 80

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = ΣX ΣY = ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 81

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = -2ΣX ΣY = ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 82

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY =ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 83

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = (ΣX)2 = ΣY2 = (ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 84

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = ΣY2 = (ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 85

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = 36ΣY2 = (ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 86

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = 36ΣY2 = 30(ΣY)2 =

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 87

The Math You’ll Need To Know

Calculate:ΣX = 6ΣY = -2ΣX ΣY = -12ΣXY = -2ΣX2 = 14(ΣX)2 = 36ΣY2 = 30(ΣY)2 = 4

X Y

1 3

3 1

0 -2

2 -4

Anthony J Greene 88

The Math You’ll Need To Know

The Mean

Σx/n = M

where n = sample size

X

1

4

8

3

Anthony J Greene 89

The Math You’ll Need To Know

Calculate:

Σ(x-M) =

Σ(x-M)2 =

Σ(x2 –M2) =

X M = 4

1

4

8

3

Anthony J Greene 90

The Math You’ll Need To Know

Calculate:

Σ(x-M) = 0

Σ(x-M)2 =

Σ(x2 –M2) =

X M = 4

1

4

8

3

Anthony J Greene 91

The Math You’ll Need To Know

Calculate:

Σ(x-M) = 0

Σ(x-M)2 = 26

Σ(x2 –M2) =

X M = 4

1

4

8

3

Anthony J Greene 92

The Math You’ll Need To Know

Calculate:

Σ(x-M) = 0

Σ(x-M)2 = 26

Σ(x2 –M2) = 26

X M = 4

1

4

8

3

Anthony J Greene 93

The Math You’ll Need To Know

Calculate:

sp = 13

n1= 8

n2= 102

2

1

2

n

s

n

s pp

Anthony J Greene 94

The Math You’ll Need To Know

Calculate:

sp = 13

n1= 8

n2= 10

17.6025.38

9.16125.21

10

169

8

169

Anthony J Greene 95

The Math You’ll Need To Know

Calculate:

sp = 13

n1= 8

n2= 1017.62

2

1

2

n

s

n

s pp

Anthony J Greene 96

What Type of Data?

Years Spent in the Military

Anthony J Greene 97

What Type of Data?

Military Rank:

Lieutenant

Captain

Major

Lt. Colonel

Colonel

General

Anthony J Greene 98

What Type of Data?

Branch of Service:

Army

Air Force

Navy

Marine Corps

Coast Guard

Anthony J Greene 99

What Type of Data?

Time taken to complete a 30 mile bicycle race

Anthony J Greene 100

What Type of Data?

Finishing place in a 30 mile bicycle race

Anthony J Greene 101

Frequency Dist. & Percentile

Raw Scores:

15, 18, 21, 23, 27, 33, 33, 35, 36, 36, 39, 41

44, 47, 49, 50

Anthony J Greene 102

Frequency Dist. & Percentile

X f

10-19 2

20-29 3

30-39 6

40-49 4

50-59 1

Anthony J Greene 103

Frequency Dist. & Percentile

Compute the 52%ile X f

10-19 2

20-29 3

30-39 6

40-49 4

50-59 1

Anthony J Greene 104

Frequency Dist. & Percentile

Compute the 52%ile X f Cum f

10-19 2 2

20-29 3 5

30-39 6 11

40-49 4 15

50-59 1 16

Anthony J Greene 105

Frequency Dist. & Percentile

Compute the 52%ile• The 52%ile is

somewhere between 30-39.

X f Cum f

10-19 2 2 0.125

20-29 3 5 0.3125

30-39 6 11 0.6875

40-49 4 15 0.9375

50-59 1 16 1.0

Anthony J Greene 106

Frequency Dist. & Percentile

Compute the 52%ile• The 52%ile is

somewhere between 30-39.

• That interval is from 0.3125 – 0.6875

X f Cum f

10-19 2 2 0.125

20-29 3 5 0.3125

30-39 6 11 0.6875

40-49 4 15 0.9375

50-59 1 16 1.0

Anthony J Greene 107

Frequency Dist. & PercentileCompute the 52%ile• The 52%ile is

somewhere between 30-39.

• That interval is from 0.3125 – 0.6875

• That interval is 0.375 wide

X f Cum f

10-19 2 2 0.125

20-29 3 5 0.3125

30-39 6 11 0.6875

40-49 4 15 0.9375

50-59 1 16 1.0

Anthony J Greene 108

Frequency Dist. & PercentileCompute the 52%ile• The 52%ile is

somewhere between 20-29.

• That interval is from 0.3125 – 0.6875

• That interval is 0.375 wide

• To get from 0.3125 to 0.52 we go 0.2075 into the interval

X f Cum f

10-19 2 2 0.125

20-29 3 5 0.3125

30-39 6 11 0.6875

40-49 4 15 0.9375

50-59 1 16 1.0

Anthony J Greene 109

Frequency Dist. & Percentile• That interval is from 0.3125 – 0.6875

• That interval is 0.375 wide

• To get from 0.3125 to 0.52 we go 0.2075 into the interval

• That’s 0.553 of the way into the interval (0.2075/0.375)

X f Cum f

10-19 2 2 0.125

20-29 3 5 0.3125

30-39 6 11 0.6875

40-49 4 15 0.9375

50-59 1 16 1.0

Anthony J Greene 110

Frequency Dist. & Percentile• That’s 0.553 of the

way into the interval (0.2075/0.375)

• The real limits are from 19.5 to 29.5 (a range of 10)

• 52%ile is 29.5 + 5.53 = 35.03

This Process is called

Linear Interpolation

X f Cum f

10-19 2 2 0.125

20-29 3 5 0.3125

30-39 6 11 0.6875

40-49 4 15 0.9375

50-59 1 16 1.0