1 Association Variables –Response – an outcome variable whose values exhibit variability....

57
1 Association Variables Response – an outcome variable whose values exhibit variability. Explanatory – a variable that we use to try to explain the variability in the response.

Transcript of 1 Association Variables –Response – an outcome variable whose values exhibit variability....

Page 1: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

1

Association Variables

–Response – an outcome variable whose values exhibit variability.

–Explanatory – a variable that we use to try to explain the variability in the response.

Page 2: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

2

Association There is an association

between two variables if values of one variable are more likely to occur with certain values of a second variable.

Page 3: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

3

Picturing Association Two Categorical (Qualitative).

–Cross-tabs table, mosaic plot. Two Numerical (Quantitative).

–Scatter diagram.

Page 4: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

4

Categorical Data Who?

– Students in a statistics class at Penn State University.

What?– “With whom is it easiest to make

friends?” Opposite sex, same sex, no difference.

– Gender. Male, female.

Page 5: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

5

Cross-tabs Table

Same Sex

Opposite Sex

No Diff Total

Female 16 58 63 137

Male 13 15 40 68

Total 29 73 103 205

With whom is it easiest to make friends?

Page 6: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

6

Bar Graph

50.2

35.6

14.125

50

75

100

Cou

nt

No Diff Opposite Same

No DiffOppositeSameTotal

Level 1037329

205

Count0.502440.356100.141461.00000

Prob

N Missing 03 Levels

Frequencies

Answer

Distributions

With whom is it easiest to make friends?

Page 7: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

7

Percentages

Count

Row %

Same Sex

Opposite Sex

No Diff Total

Female 16

11.7%

58

42.3%

63

46.0%

137

100%

Male 13

19.1%

15

22.1%

40

58.8%

68

100%

Total 29 73 103 205

With whom is it easiest to make friends?

Page 8: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

8

Mosaic PlotA

nsw

er

0.00

0.25

0.50

0.75

1.00

Female Male

Gender

No Diff

Opposite

Same

Page 9: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

9

Interpretation More than 50% of males say no

difference while less than 50% of females say no difference.

Females are about twice as likely as males to say opposite.

Males are about twice as likely as females to say the same.

Page 10: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

10

Scatter Plot Statistics is about … variation. Recognize, quantify and try to

explain variation. Variation in two quantitative

variables is displayed in a scatter plot.

Page 11: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

11

Scatter Plot Numerical variable on the

vertical axis, y, is the response variable.

Numerical variable on the horizontal axis, x, is the explanatory variable.

Page 12: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

12

Scatter Plot Example: Body mass (kg) and

Bite force (N) for Canidae.–y, Response: Bite force (N)–x, Explanatory: Body mass (kg)–Cases: 28 species of Canidae.

Page 13: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

13

0

100

200

300

400

500

BF

ca (

N)

0 5 10 15 20 25 30 35 40

Body Mass (kg)

Bivariate Fit of BFca (N) By Body Mass (kg)

Page 14: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

14

Positive Association Positive Association

– Above average values of Bite force are associated with above average values of Body mass.

– Below average values of Bite force are associated with below average values of Body mass.

Page 15: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

15

Scatter Plot Example: Outside temperature

and amount of natural gas used.– Response: Natural gas used (1000

ft3).– Explanatory: Outside temperature

(o C).– Cases: 26 days.

Page 16: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

16

0

5

10

Gas

-5.0 .0 5.0 10.0 15.0

Temp

Page 17: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

17

Negative Association–Above average values of gas

are associated with below average temperatures.

–Below average values of gas are associated with above average temperatures.

Page 18: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

18

Association Positive

–As x goes up, y tends to go up. Negative

–As x goes up, y tends to go down.

Page 19: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

19

Correlation Linear Association

– How closely do the points on the scatter plot represent a straight line?

– The correlation coefficient gives the direction of and quantifies the strength of the linear association between two quantitative variables.

Page 20: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

20

Correlation Standardize y

Standardize x

y

y s

yyz

x

x s

xxz

Page 21: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

21

3210-1

3

2

1

0

-1

Standardized Body Mass

Sta

ndar

dize

d B

ite F

orce

Bite Force vs Body Mass of Canidae

Page 22: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

22

Correlation Coefficient

yx

yx

ssn

yyxxr

n

zzr

1

1

Page 23: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

23

Correlation Coefficient Body mass and Bite force

r = 0.9807

27

479626

1

.

n

zzr yx

Page 24: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

24

Correlation Coefficient There is a very strong

positive correlation, linear association, between the body mass and bite force for the various species of Canidae.

Page 25: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

25

JMP Analyze – Multivariate

methods – Multivariate Y, Columns

– Body mass– BF ca (Bite force at the

canine)

Page 26: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

26

Body Mass (kg)BFca (N)

1.00000.9807

0.98071.0000

Body Mass (kg) BFca (N)

Correlations

5

10

15

20

25

30

35

40

100

200

300

400

500

Body

Mass (kg)

5 10 15 20 25 30 35 40

BFca (N)

100 200 300 400 500

Scatterplot Matrix

Multivariate

Page 27: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

27

Correlation Properties The sign of r indicates the direction

of the association. The value of r is always between –1 and +1. Correlation has no units. Correlation is not affected by

changes of center or scale.

Page 28: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

28

Algebra Review The equation of a straight line y = mx + b

– m is the slope – the change in y over the change in x – or rise over run.

– b is the y-intercept – the value where the line cuts the y axis.

Page 29: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

29

-5 -4 -3 -2 -1 0 1 2 3 4 5

-15

-10

-5

0

5

10

15

x

yy = 3x + 2

Page 30: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

30

Review y = 3x + 2

– x = 0 y = 2 (y-intercept)

– x = 3 y = 11

– Change in y (+9) divided by the change in x (+3) gives the slope, 3.

Page 31: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

31

Linear Regression Example: Body mass (kg) and

Bite force (N) for Canidae.–y, Response: Bite force (N)–x, Explanatory: Body mass (kg)–Cases: 28 species of Canidae.

Page 32: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

32

Correlation Coefficient Body mass and Bite force

r = 0.9807

27

479626

1

.

n

zzr yx

Page 33: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

33

Correlation Coefficient There is a strong correlation,

linear association, between the body mass and bite force for the various species of Canidae.

Page 34: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

34

Linear Model The linear model is the equation

of a straight line through the data.

A point on the straight line through the data gives a predicted value of y, denoted .

y

Page 35: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

35

Residual The difference between the

observed value of y and the predicted value of y, , is called the residual.

Residual =

y

yy ˆ

Page 36: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

36

353025201510 5 0

500

400

300

200

100

0

Body mass (kg)

BF

ca

(N

)Regression Plot

Residual

Page 37: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

37

Line of “Best Fit” There are lots of straight lines

that go through the data. The line of “best fit” is the

line for which the sum of squared residuals is the smallest – the least squares line.

Page 38: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

38

Line of “Best Fit”Some positive and some

negative residuals but they sum to zero.

Passes through the point . yx,

Page 39: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

39

Line of “Best Fit”

bxay ˆLeast squares slope:

intercept:

x

y

s

srb

xbya

Page 40: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

40

Body mass, x Bite Force, y

0.9807

kg 8.016

kg 9.207

r

s

x

x N 109.760

N 154.029

ys

y

Least Squares Estimates

Page 41: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

41

Least Squares Estimates

xy

a

b

428.13397.30ˆ

397.30)207.9(428.13029.154

428.13016.8

760.1099807.0

Page 42: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

42

Interpretation Slope – for a 1 kg increase in body

mass, the bite force increases, on average, 13.428 N.

Intercept – there is not a reasonable interpretation of the intercept in this context because one wouldn’t see a Canidae with a body mass of 0 kg.

Page 43: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

43

353025201510 5 0

500

400

300

200

100

0

Body mass (kg)

BF

ca

(N

)

Bite Force vs Body Mass

x..y 4281339730

Page 44: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

44

Prediction Least squares line

N1366254281339730

25

4281339730

.)(..y

x

x..y

Page 45: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

45

Residual Body mass, x = 25 kg Bite force, y = 351.5 N Predicted, = 366.1 N Residual, = 351.5 – 366.1

= – 14.6 N

y

yy ˆ

Page 46: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

46

Residuals Residuals help us see if the

linear model makes sense. Plot residuals versus the

explanatory variable.– If the plot is a random scatter of

points, then the linear model is the best we can do.

Page 47: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

47

35302520151050

60

50

40

30

20

10

0

-10

-20

-30

Body mass (kg)

Res

idua

lPlot of Residuals vs Body Mass

Page 48: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

48

Interpretation of the Plot The residuals are scattered

randomly. This indicates that the linear model is an appropriate model for the relationship between body mass and bite force for Canidae.

Page 49: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

49

(r)2 or R2

The square of the correlation coefficient gives the amount of variation in y, that is accounted for or explained by the linear relationship with x.

Page 50: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

50

Body mass and Bite force r = 0.9807 (r)2 = (0.9807)2 = 0.962 or 96.2% 96.2% of the variation in bite

force can be explained by the linear relationship with body mass.

Page 51: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

51

Regression Conditions Quantitative variables – both

variables should be quantitative. Linear model – does the scatter

diagram show a reasonably straight line?

Outliers – watch out for outliers as they can be very influential.

Page 52: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

52

Regression Cautions Beware of extraordinary points. Don’t extrapolate beyond the

data. Don’t infer x causes y just

because there is a good linear model relating the two variables.

Page 53: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

53

Extraordinary Points

Page 54: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

54

Don’t ExtrapolateExplanatory (x) – Average outdoor

temperature (o C).Response (y) – Amount of natural

gas used (1000 cu ft).

xy 393.085.6ˆ

Page 55: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

55

Don’t Extrapolate

0

5

10G

as

-5 0 5 10 15 20

Temp

Page 56: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

56

Don’t ExtrapolateExplanatory (x = 20) – Average

outdoor temperature (o C).Response (y) – Amount of natural

gas used (1000 cu ft).

01.1ˆ

20393.085.6ˆ

y

y

Page 57: 1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.

57

Correlation Causation Don’t confuse correlation with

causation.– There is a strong positive correlation

between the number of crimes committed in communities and the number of 2nd graders in those communities.

Beware of lurking variables.