RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of...

34
RESEARCH STATISTICS RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of...

Page 1: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

RESEARCH STATISTICSRESEARCH STATISTICS

Jobayer Hossain

Larry Holmes, Jr

November 6, 2008

Examining Relationship of Variables

Page 2: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Examining Relationship of Examining Relationship of VariablesVariables

Response (dependent) variable - measures the outcome of a study.

Explanatory (Independent) variable - explains or influences the changes in a response variable

Outlier - an observation that falls outside the overall pattern of the relationship.

Positive Association - An increase in an independent variable is associated with an increase in a dependent variable.

Negative Association - An increase in an independent variable is associated with a decrease in a dependent variable.

Page 3: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

ScatterplotScatterplot

Shows relationship

between two variables (age

and height in this case).

Reveals form, direction,

and strength of the

relationship.

Page 4: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Scatterplot - DemoScatterplot - Demo

MS-Excel: – Step 1: Select Chart wizard– Step 2: From the chart type select XY(Scatter)– Step 3: Select Chart from chart sub type and click next– Step 4: Data range: range for variable in the x axis (age in our

case), range for the variable in the y axis (hgt in our case) and click on next

– Step 5: Click on titles to write title and labels of the chart , click on axes and mark on Value (X) and Value (Y) axis , click on Gridlines and take all the mark off( if you want no gridlines), click on legend and put options of legend (if you don’t want legend) then click on Finish

Page 5: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Scatterplot - DemoScatterplot - Demo

SPSS:

– Step 1: Click on Graphs and select Scatter/Dot

– Step 2: Select simple scatter

– Step 3: Select variables for Y axis (usually response variable)

and X axis (explanatory variable) and click on Titles to write

titles in Line 1. After writing title click on ‘Continue’.

– Step 4: Click on ‘Ok’.

Page 6: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

ScatterplotsScatterplots

60 80 100 120 140

4045

5055

Scatterplot of Age vs Height

Age

Hei

ght

60 80 100 120 140

4045

5055

Scatterplot of varibles X and Y

X

Y

6 8 10 12 14 16 18

3050

7090

Scaterplot of variables X1 and Y1

X1

Y1

Strong positive association

Points are scattered with a poor association

Strong negative association

Page 7: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Correlation of two variablesCorrelation of two variables

Correlation measures the direction and strength of the relationship between two quantitative variables. Suppose that we have data on variables x and y for n individuals. Then the correlation coefficient r between x and y is defined as,

Where, sx and sy are the standard deviations of x and y.

r is always a number between -1 and 1. Values of r near 0 indicate little or no linear relationship. Values of r near -1 or 1 indicate a very strong linear relationship.

The extreme values r=1 or r=-1 occur only in the case of a perfect linear relationship, when the points lie exactly along a straight line.

y

in

i x

i

s

yy

s

xx

nr

11

1

Page 8: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Correlation of two variablesCorrelation of two variables

Positive r indicates positive association i.e. association between

two variables in the same direction, and negative r indicates

negative association.

Scatterplot of Height and Age shows that these two variables

possess a strong, positive linear relationship. The correlation

coefficient of these two variables is 0.9829632, which is very

close to 1.

Page 9: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Correlation of two variablesCorrelation of two variables

60 80 100 120 140

4045

5055

Scatterplot of Age vs Height

Age

Hei

ght

60 80 100 120 140

4045

5055

Scatterplot of varibles X and Y

X

Y

6 8 10 12 14 16 18

3050

7090

Scaterplot of variables X1 and Y1

X1

Y1

r=0.02

r=-0.96r=0.98

Page 10: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Correlation of two variablesCorrelation of two variables

MS-Excel:

– Step 1: Click on function Wizard fx

– Step 2: Select function Correl

– Step 3: Input range Array 1 and Array 2 (for our example age and hgt) and click

on Ok.

Another option:

Check if the data of two variables are in a side by side position. If not, copy and paste

them side by side in a new worksheet and follow the following steps-

– Step 1: Select Data Analysis from the menu Tools

– Step 2: Select Correlation and type or select Input and Output range and click on

ok.

Page 11: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Correlation of two variablesCorrelation of two variables

SPSS: – Step 1: Select Correlate from the menu Analyze– Step2: Click on Bivariate and select the variables

(age and hgt) and click on Ok

Page 12: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Example 1: Calculating Correlation Coefficient Example 1: Calculating Correlation Coefficient of the variables Height and Age in our data of the variables Height and Age in our data

set1.set1. MS-Excel: Click on function Wizard fx > Correl > Input range Array 1: age Input

range Array 1: hgt > ok. It gives correlation coefficient, r=.98. or Tools > data analysis > correlation > range for variables age and hgt > ok. It gives you the following results.

age hgt

age 1

hgt 0.982 1

Page 13: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Example 1: Calculating Correlation Coefficient Example 1: Calculating Correlation Coefficient of the variables Height and Age in our data set1of the variables Height and Age in our data set1

SPSS: Analyze > Correlate > Bivariate > select age and hgt > ok. It gives the following result

age hgt

age Pearson Correlation

sig (- tailed)

N

1

60

.983

.000

60

Hgt Pearson Correlation

sig (- tailed)

N

.983

000

60

1

60

Page 14: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Strong Association but no CorrelationStrong Association but no Correlation

20 30 40 50 60

20

25

30

35

Scatter Plot

Speed

MP

GGas mileage of an auto mobile first increases than decreases as the speed increases like the following data: Speed 20 30 40 50 60

MPG 24 28 30 28 24

Scatter plot shows an strong association. But calculated, r = 0, why?

r = 0 ?It’s because the relationship is not linear and r measures the linear relationship between two variables.

Page 15: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Influence of an outlierInfluence of an outlier

Consider the following data set of two variables X and Y:

r = -0.237 After dropping the last pair,

r = 0.996

X 20 30 40 50 60 80 Y 24 28 30 34 37 15

X 20 30 40 50 60 Y 24 28 30 34 37

Page 16: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear RegressionSimple Linear Regression

Regression refers to the value of a response variable as a function of the value of an explanatory variable.

A regression model is a function that describes the relationship between response and explanatory variables.

A simple linear regression has one explanatory variable and the regression line is straight.

The linear relationship of variable Y and X can be written as in the following regression model form

Y= b0 + b1X + e

where, ‘Y’ is the response variable, ‘X’ is the explanatory variable, ‘e’ is the residual (error), and b0 and b1 are two parameters. Basically, bo is the intercept and

b1 is the slope of a straight line y= b0 + b1X.

Page 17: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear RegressionSimple Linear Regression

A simple regression line is fitted for height on age. The intercept is 31.019 and the slope (regression coefficient) is .1877.

Regresion line of Height on Age

Height = 0.1877Age + 31.019

0

10

20

30

40

50

60

70

0 50 100 150 200

Age (month)

Hei

gh

t (i

nch

)

Page 18: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear Regression Assumptions:Simple Linear Regression Assumptions:

1. Response variable is normally distributed.

2. Relationship between the two variables is linear.

3. Observations of response variable are independent.

4. Residual error is normally distributed with mean 0

and constant standard deviation.

Page 19: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear RegressionSimple Linear Regression

Estimating Parameters b0 and b1

– Least Square method estimates b0 and b1 by fitting a straight line through the data points so that it minimizes the sum of square of the deviation from each data point.

– Formula:

n

ii

n

iii

XX

YYXXb

1

2

11

)(

))((ˆ XbYb 10

ˆˆ

Page 20: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear RegressionSimple Linear Regression

Fitted Least Square Regression line – Fitted Line:

– Where is the fitted / predicted value of ith observation (Yi) of the response variable.

– Estimated Residual:

– Least square method estimates b0 and b1 to minimize the summed error:

iY

ii XbbY 10ˆˆˆ

iii YYe ˆˆ

n

iie

1

Page 21: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear RegressionSimple Linear Regression

Average Income versus % with a College Degree (by State)

22,000

22,500

23,000

23,500

24,000

24,500

25,000

25,500

26,000

15 15.5 16 16.5 17 17.5 18 18.5 19 19.5 20

Percentage of Population with College Degree or Higher

Ave

rag

e In

co

me

Le

vel (

$ p

er

yea

r)

e1 e3

e2

In this example, a regression line (red line) has been fitted to a series of observations (blue diamonds) and residuals are shown for a few observations (arrows).

Fitted Least Square Regression line

Page 22: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear RegressionSimple Linear Regression

Interpretation of the Regression Coefficient and Intercept

– Regression coefficient (b1) reflects the average change in

the response variable Y for a unit change in the

explanatory variable X. That is, the slope of the regression

line. E.g.

– Intercept (b0) estimates the average value of the response

variable Y without the influence of the explanatory

variable X. That is, when the explanatory variable = 0.0.

Page 23: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Simple Linear RegressionSimple Linear Regression

MS-Excel:

– Step1: Select Data Analysis from the menu Tools

– Step2: Input Y range: range for the response (dependent) variable, Input X range:

range for the independent variable

– Step 3: Make selections of out puts you want and click on ok

SPSS:

– Step 1: Select Regression from the menu Analyze

– Step 2: Select Linear and then for Dependent: Select dependent variable and for

Independent(s): select Independent variable

– Step 3: Click on Statistics and select options, clik on Plots and select variables for

scatter plots (if you want), click on continue an d then ok

Page 24: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Example 2: MS-Excel Linear Regression Example 2: MS-Excel Linear Regression output: Height on ageoutput: Height on age

SUMMARY OUTPUT

Regression Statistics Multiple R 0.982963 R Square 0.966217 Adjusted R Square 0.965634 Standard Error 5.604001 Observations 60 ANOVA

df SS MS F Significance

F Regression 1 52095.1 52095.1 1658.825 2.28E-44 Residual 58 1821.48 31.40483 Total 59 53916.58

Coefficients Standard

Error t Stat P-value Lower 95% Upper 95%

Lower 95.0%

Upper 95.0%

Intercept -156.589 6.107669 -25.6381 2.48E-33 -168.815 -144.363 -168.815 -144.363 hgt 5.146693 0.126365 40.72867 2.28E-44 4.893746 5.399641 4.893746 5.399641

Page 25: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Example 2: SPSS Linear Regression Example 2: SPSS Linear Regression output: Height on ageoutput: Height on age

Model Summary

Model R R

Square

Adjusted R

Square Std. Error of the Estimate Change Statistics

R Square Change

F Change df1 df2

Sig. F Change

1 .983(a) .966 .966 1.0703 .966 1658.825 1 58 .000

a Predictors: (Constant), age Coefficients(a)

Unstandardized Coefficients

Standardized Coefficients

Model B Std. Error Beta t Sig.

(Constant) 31.019 .439 70.645 .000 1

age .188 .005 .983 40.729 .000

a Dependent Variable: hgt

Page 26: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Multiple RegressionMultiple Regression

Two or more independent variables to predict a single

dependent variable.

Multiple regression model of Y on p number of explanatory

variables can be written as,

Y = b0 + b1X1 + b2X2 +… +bpXp +e

where bi (i=1,2, …, p) is the regression coefficient of Xi

Page 27: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Multiple RegressionMultiple Regression Fitted Y is given by,

The estimated residual error is the same as that in the simple linear regression,

ii

pp

bb

XbXbXbbY

of estimate theis ˆ Where,

ˆ...ˆˆˆˆ22110

iii YYe ˆˆ

Page 28: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Multiple RegressionMultiple Regression

MS-Excel: Every thing is the same as for the simple regression except

we need to select multiple columns for the range of independent

variables. If the columns of independent variables are not side by side,

then we need to have them side by side. For that we can insert another

excel sheet and copy the data from original sheet and paste the data in the

new sheet. We first copy the dependent variable and paste that in one of

the columns and then copy independent variables and paste them in side

by side columns.

SPSS: Everything is the same as for the simple regression except we

select more than one independent variables.

Page 29: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Example 3: MS Excel Multiple Regression Example 3: MS Excel Multiple Regression output: Height on Age and PLUC.postoutput: Height on Age and PLUC.post

SUMMARY OUTPUT

Regression Statistics Multiple R 0.983103 R Square 0.966491 Adjusted R Square 0.965315 Standard Error 5.629974 Observations 60 ANOVA

df SS MS F Significance

F Regression 2 52109.88 26054.94 822.0105 9.27E-43 Residual 57 1806.706 31.6966 Total 59 53916.58

Coefficients Standard

Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept -158.474 6.728226 -23.5536 4.65E-31 -171.947 -145.001 -171.947 -145.001 hgt 5.136708 0.127791 40.19623 1.6E-43 4.880811 5.392605 4.880811 5.392605 PLUC.post 0.194634 0.28509 0.68271 0.497556 -0.37625 0.765517 -0.37625 0.765517

Page 30: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Example 3: SPSS Multiple Regression Example 3: SPSS Multiple Regression output: Height on Age and PLUC.postoutput: Height on Age and PLUC.post

Model Summary

Model R R

Square Adjusted R

Square Std. Error of the Estimate Change Statistics

R Square Change

F Change df1 df2

Sig. F Change

1 .983(a) .966 .965

1.077191687455185

.966 818.970 2 57 .000

a Predictors: (Constant), PLUC.post, age Coefficients(a)

Unstandardized Coefficients

Standardized Coefficients

Model B Std. Error Beta t Sig.

(Constant) 31.330 .752 41.635 .000 age .188 .005 .985 40.196 .000

1

PLUC.post -.028 .055 -.013 -.511 .612

a Dependent Variable: hgt

Page 31: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Coefficient of Determination (Multiple Coefficient of Determination (Multiple R-squared)R-squared)

Total variation in the response variable Y is due to (i) regression of all variables in the model (ii) residual (error).

Total variation of y, SS (y) = SS(Regression) +SS(Residual)

The Coefficient of Determination is,

(Y) Total

Residual)(1

(Y) Total

)Regression(2

SS

SS

SS

SSR

Page 32: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Coefficient of DeterminationCoefficient of Determination (Multiple R-squared) (Multiple R-squared)

R2 lies between 0 and 1.

R2 = 0.8 implies that 80% of the total variation in the

response variable Y is due to the contribution of all

explanatory variables in the model. That is, the fitted

regression model explains 80% of the variance in the

response variable.

Page 33: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

Example 4: Calculating coefficient of Example 4: Calculating coefficient of determination for the variables in example 3.determination for the variables in example 3.

MS-Excel: It’s in the MS-Excel output of Example 3.

SPSS: It’s in the SPSS output of Example 3.

Page 34: RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.

QuestionsQuestions