Regression Corr

download Regression Corr

of 15

Transcript of Regression Corr

  • 8/2/2019 Regression Corr

    1/15

    Simple Linear Regression & Correlation

    We now consider bivariate data. Bivariate data is data

    connecting two variables:

    The independent or explanatory variable, and

    The dependent or response variable.

  • 8/2/2019 Regression Corr

    2/15

    Simple Linear Regression & Correlation

    A scatter diagram is used to display bivariate data.

    We are interested if there is a linear relationship between

    the two sets of data displayed. This can fall in one of 3

    classes:

    1. POSITIVE LINEAR CORRELATION

    2. NEGATIVE LINEAR CORRELATION

    3. NO LINEAR CORRELATION

  • 8/2/2019 Regression Corr

    3/15

    Simple Linear Regression & Correlation

    Scatter diagrams show plots of ordered pairs .

    Usually, represents the independent variable and

    represents the dependent variable. The following

    scatter diagrams show different classes of linear

    relationships:

    x

    y

    , x y

  • 8/2/2019 Regression Corr

    4/15

    Simple Linear Regression & Correlation

    Graph showing positive correlation

  • 8/2/2019 Regression Corr

    5/15

    Simple Linear Regression & Correlation

    Graph showing negative correlation

  • 8/2/2019 Regression Corr

    6/15

    Simple Linear Regression & Correlation

    Graph showing no linear correlation

  • 8/2/2019 Regression Corr

    7/15

    Simple Linear Regression & Correlation

    A scatter diagram is a good indication as to whether the

    linear relationship between the two sets of data can be

    represented by a mathematical equation.

    This mathematical equation is called a regression function.

    We can determine the equation of the regression line

    representing the proposed relationship.

  • 8/2/2019 Regression Corr

    8/15

    Simple Linear Regression & Correlation The least squares regression line of on :

    The general equation of a straight line is

    where represents the gradient and represents the

    y-intercept. The regression equation is very similar:

    The regression equation takes the form

    where will represent the gradient and will represent the

    y-intercept. These values are called the regression coefficients.

    y

    y mx c

    x

    cm

    y a bx

    ab

  • 8/2/2019 Regression Corr

    9/15

    Simple Linear Regression & Correlation

    These values for and are determined by the following

    formulae: and

    where and are the mean of and the mean of

    respectively. The formula for is based on the fact that

    MUST be a point on the regression line.

    N.B. and

    22n xy x

    b n x x

    a y bx

    a b

    x y x

    y , x y

    a

    n

    x x

    n

    y y

  • 8/2/2019 Regression Corr

    10/15

    Simple Linear Regression & Correlation

    A way to measure the strength of a linear relationship between two

    variables is using the product moment correlation coefficient .

    This is a number which lies between -1 and +1 . Consider 3 cases:

    If then there is positive linear correlation.

    If then there is negative linear correlation.

    If then there is no linear correlation present

    r

    0r

    0r

    0r

  • 8/2/2019 Regression Corr

    11/15

    Simple Linear Regression & Correlation

    The correlation coefficient is calculated using the formula:

    From this formula we can also determine the coefficient of determination :

    This measure represents the ratio of explained variation to total variation.

    2 22 2

    nxy xyr

    nx x ny y

    2

    2 22 2

    nxy xy

    cdr nx x ny y

  • 8/2/2019 Regression Corr

    12/15

    *A Practical ExampleTabulated is the maintenance cost/00s$ to the age/mths of 10 X -ray machines.

    i. Find the least squares regression line of maintenance cost on age and use it to

    predict the maintenance cost for a similar machine 40 mths old.

    ii. Calculate the product moment correlation coefficient between the age of the

    machine and the cost for maintenance.

    iii. Determine the percentage variation in the total maintenance cost that is explained

    by the variation in the machine age.

    Question adapted from Business Mathematics & Statistics

    395500000351000504090ost ( y )

    6000000050ge ( x )10achine

  • 8/2/2019 Regression Corr

    13/15

  • 8/2/2019 Regression Corr

    14/15

    From the table on the previous slide we know that:

    i. Using the formulae from slide 9

    The regression equation is

    Estimated cost =

    2 210; 300; 2970; 97650; 12050;

    nxy xy x y

    2 22

    10 97650 300 29702.8

    10 12050 300

    2970 3002.8 21.

    10 10

    nxy x y

    b nx x

    y xaybx b

    n n

    212.9 2.8 y y a bx

    212. 9 2. 840 $324. y

  • 8/2/2019 Regression Corr

    15/15

    Recall

    Using the formulae from slide 11

    i. The correlation coefficient =

    i. The coefficient of determination =

    Thus 77% of the variation in maintenance costs is explained by the variation in

    machine ages.

    2 210; 300; 2970; 97650; 12050;

    nxy xy x y

    2 22 2

    2 2

    10 97650 300 2970

    10 12050 300 10 913050 2970

    0.88

    n xy x yr

    n x x n y y

    r

    r

    22

    0.88 0.cdr