chapter4ZICA4.6

download chapter4ZICA4.6

of 26

Transcript of chapter4ZICA4.6

  • 7/28/2019 chapter4ZICA4.6

    1/26

    4.6 Regression and Correlation Analysis

    This section introduces regression analysis which is a method used to describe arelationship between two variables and goes on to explain about correlationanalysis which measures the strength of the relationship between two variables.

    This manual uses Spearmans rank correlation coefficient and Pearsons productmoment coefficient of correlation as a measure of strength between two variables.

    Regression analysis is concerned with the estimating of one variable (dependentvariable) on the basis of one or more other variables (independent variable). If ananalyst for instance is trying to predict the share price of a particular sector therewill be a whole range of independent variables to be considered. In this manual,we will restrict our attention to the particular case where a dependent variable y isrelated to a single independent variable x .

    The Regression Equation

    When only one independent variable is used in making forecast, the techniqueused is called Simple Linear regression. The forecasts are made by means of astraight line using the equation

    bxay +=

    xinchangetuniawithchangesythatamounttheslopeb

    xwhenerceptythea

    ==

    == 0int

    The linear function is useful because it is mathematically simple and it can beshown to be reasonably close to the approximation of many situations.

    The first step to establish whether there is a relationship between variables is bymeans of a scatter diagram. This is a plot of the two variables on an yx graph.Given that we believe there is a relationship between the two variables, the secondstep is to determine the form of this relationship.

    194

  • 7/28/2019 chapter4ZICA4.6

    2/26

    Example 1

    Consider the following data of a major appliance store. The daily high temperature andof air conditioning units sold for 8 randomly selected business days during the hot dryseason.

    Daily High Temperature

    (x) oc

    Number of Units

    (y)

    2735182046362623

    56216433

    Draw a Scatter diagram for the data.

    y

    6

    Numberof

    units 5

    used

    4

    3

    2

    1

    18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 x

    Daily High temperature (oc)

    Figure 1.0

    195

  • 7/28/2019 chapter4ZICA4.6

    3/26

    The distribution of points in the Scatter diagram suggests that a straight line roughly fitsthese points.The most straight forward method of fitting a straight line to the set of data points is byeye. The values of a and b can then be determined from the graph, a is the intercepton the vertical axis and b is the slope.

    The other method is that of semi averages. This technique consists of splitting the datainto two equal groups, plotting the mean point for each group and joining these pointswith a straight line.

    Example 2

    Using data of Example 1, fit a straight line using the method of semi-averages.

    The procedure for obtaining the y on x regression line is as follows:

    Step 1

    Sort the data into size order by x - value.

    x y

    1820232627

    353646

    21335

    641

    Step 2

    Split the data up into two equal groups, a lower half and upper half (if there is an odd ofitems, drop the central one).

    Lower half of Data Upper half of datax18202326

    y

    2133

    x27353646

    y

    5641

    Totals 87

    Averages 21.75

    9

    2.5

    144

    36

    16

    4

    196

  • 7/28/2019 chapter4ZICA4.6

    4/26

    Table 1.0y

    6

    5

    4

    3

    2

    1

    18 26 34 42 x

    Method of Semi-average for Example 1.

    Figure 2.0

    Step 3

    Calculate the mean point for each group

    Step 4

    Plot the mean points in Step 3 on a graph within suitably scaled axes and joining themwith a straight line. This is the required y on x regression line.

    Least Square Line

    Let us consider a typical data point with coordinates ),( ii yx (See Figure 3.0). The

    error in the forecast ( y coordinate of data point-forecasted coordinate as given by the

    straight line ) is denoted by ie . The line which minimizes the value of ie is called the

    least square line or the regression line. This can be shown by using calculus. Here wejust give the best estimates of a and b by the following formula.

    197

  • 7/28/2019 chapter4ZICA4.6

    5/26

    ( )

    spodataofnumbertheisnwherexbya

    n

    xx

    n

    yxxy

    b

    int

    2

    2

    =

    =

    The values of a and b are then substituted into equation bxay +=

    y

    least squares line

    y

    ie

    iy

    xFigure 3.0

    The least squares line with the error term ie .

    Example 3

    Fit the least squares line to the data in Example 1.

    198

  • 7/28/2019 chapter4ZICA4.6

    6/26

    Table 2 shows the calculations for the estimates of a and b.

    x y 2x 2y xy

    18

    20232627353646

    2

    1335641

    324

    400529676729122512962116

    4

    1992536161

    36

    20697813521014446

    =231x =25y 72952 =x 1012

    =y =738xy

    125.3,8 == yn , 1.23=x

    ( )

    ( )

    38.2

    )875.28(0258.0125.3

    0258.0

    875.624

    125.16

    8

    2317295

    8

    )25)(231(738

    2

    2

    2

    =

    =

    =

    =

    =

    =

    =

    xbya

    b

    b

    n

    xx

    n

    yxxy

    b

    giving the equation for the regression line of xy 0258.038.2 +=

    Forecasting Using the Regression Line

    199

  • 7/28/2019 chapter4ZICA4.6

    7/26

    Having obtained the regression line, It can be used to forecast the value of y for a givenvalue ofx . Suppose that we wish to determine the number of units sold if we have adaily temperature of co42 .

    From the regression line the forecasted value y is 46.3)42(0258.038.2 =+=y i.e.

    the expected number of units sold is 3.

    Now suppose that we wish to determine the number of units sold if the temperature is

    .49 co The forecasted value of y is the given by 64.3)49(0258.038.2 =+=y i.e.

    the expected number of units sold is 4.

    The two examples differ due to the fact that the first y value was forecasted from an x value within the range of x values, while the second value outside the range of x values in the original data set.. The first example is a case of interpolation and thesecond is that of extrapolation. With extrapolation, the assumption is that therelationship between the two variables continue to behave in the same way outside the

    given range ofx values from which the least square line was computed.

    Exercise 7

    1. For the following data

    x 2 5 6 8 10 11 13 16

    y 2 3 4 5 6 8 9 10

    a) Draw a scatter diagram

    b) By eye, fit a straight line to the data (ensuring it passes through the meanvalue)

    c) Fit the equation of the line by the method of semi-averages.

    2. The following data have been collected regarding sales and advertisingexpenditure.

    Sales Advertising expenditure

    (Kms) (K 000s)

    200

  • 7/28/2019 chapter4ZICA4.6

    8/26

    10.5 23011.2 2809.9 31010.6 350

    11.4 40012.1 430

    a) Plot the above data on a scatter diagram.

    b) Fit the regression line using the method of least squares.

    c) Estimate the sales if K530 000 is spent on advertising expenditure.

    Note that advertising expenditure is the x variable and sales is the y variable.

    3. Fit a least square line to the data in the table below.

    x 5 7 8 10 11 13

    y 4 5 6 8 7 10

    4. The table below shows the final grades in Mathematics and Communicationobtained by students selected at random from a large group of students.

    a) Graph the data

    b) Fit a least-squares line

    c) If a student receives a grade of 85 in Mathematics, what is her expectedgrade in Communication?

    d) If a student receives a grade of 65 in Communication, what is her expectedgrade in Mathematics?

    Mathematics (x) 80 86 97 70 89 75 99 69 87 78

    Communication (y) 75 65 80 65 80 70 79 45 70 805. The table below shows the birth rate per 100 population during 1999 2005

    year 1999 2000 2001 2002 2003 2004 2005

    Birth rate per 1000 14.6 14.5 13.8 13.4 13.6 12.8 12.6

    201

  • 7/28/2019 chapter4ZICA4.6

    9/26

    a) Graph the data.

    b) Find the least squares line fitting the data. Code the years 1999 to 2005 as

    the whole number 1 through 7.

    c) Predict the birth rate in 2009, assuming the present trend continues.

    4.7 Correlation Analysis

    Correlation analysis is used to determine the degree of association between twovariables. Having obtained the equation of the regression line, correlationanalysis can be used to measure how well one variable is linearly related toanother. The coefficient of correlation r can assume any value inclusive in the

    range11 + to

    . A value of r is close to or equal to 1 , we have a negativecorrelation. The sign of the correlation coefficient is the same as the sign of theslope of the regression line.

    The following scatter diagrams illustrate certain values of the correlationcoefficient.

    x x x xx x

    xx x x x x

    1=r 0=r

    xx

    xx

    1=rThe method of investigating whether a linear relationship exists between two variablesx and y is by calculating Pearsons product moment correlation coefficient (PPMCC)denoted by r given by the formula

    202

  • 7/28/2019 chapter4ZICA4.6

    10/26

    ( ) ( )

    =

    n

    yy

    n

    xx

    n

    yxxy

    r2

    2

    2

    2

    Example 4

    By calculating the PPMCC find the degree of association between weekly earnings andthe amount of tax paid for each member of a group of 10 manual workers.

    Weekly Wage (K 000) 79 81 87 88 91 92 98 98 103 113

    Income Tax (K 000) 10 8 14 14 17 12 18 22 21 24

    The PPMCC is calculated in the Table below

    x y 2x 2y xy

    7981878891

    929898103113

    108141417

    1218222124

    62416561756977448281

    8464960496041060912769

    10064196196289

    144324484441576

    790648121812321547

    11041764215621632712

    x 930 =160y =874462x =28142y =15334xy

    203

  • 7/28/2019 chapter4ZICA4.6

    11/26

    ( ) ( )

    ( )( )

    ( ) ( )

    921.0

    )254)(956(

    454

    10

    1602814

    10

    93087446

    10

    16043015334

    22

    2

    2

    2

    2

    =

    =

    =

    =

    r

    n

    yy

    n

    xx

    n

    yxxy

    r

    ris near 1 and indicates a strong positive linear correlation between the twovariables.

    Example 5

    Evaluate the PPMCC for the following data.

    x 15 20 25 30 35

    y 143 141 144 149 148

    The PPMCC is calculated in the Table below.

    x y 2x 2y xy

    1520253035

    143141144149148

    2254006259001225

    2044919881207362220121904

    21452820360044705180

    x =125 =725y =33752x =105171

    2y =18215xy

    204

  • 7/28/2019 chapter4ZICA4.6

    12/26

    ( ) ( )

    ( )( )

    ( ) ( )

    839.0

    )46)(250(

    90

    5

    725105171

    5

    1253375

    5

    72512518215

    22

    2

    2

    2

    2

    =

    =

    =

    =

    r

    r

    n

    yy

    n

    xx

    n

    yxxy

    r

    The Coefficient of Determination

    The coefficient of determination is the square of the coefficient of correlation r. Inwords, it gives the proportion of the variation (in the y - values) that is explained (by thevariation in the x - values).

    In Example 10, the correlation coefficient is r = 0.839. Therefore coefficient ofdetermination:

    704.0

    )839.0( 22

    =

    =r

    ( 3 decimals)

    This means that only 70.4% of the variation in the variable y is due to the variation in the

    variable x . Note that the coefficient of determination 2r is between 0 and +1 inclusive.

    Spearmans Rank Correlation Coefficient.

    205

  • 7/28/2019 chapter4ZICA4.6

    13/26

    An alternative method of measuring correlation is by means of the Spearmans rankcorrelation coefficient obtained by the formula.

    )1(

    61

    2

    2

    =

    nn

    dr

    where d = difference between rankings.

    Example 6

    Two members of an interview panel have ranked seven applicants in order of preferencefor a specified post. Calculate the degree of agreement between the two members.

    Applicant A B C D E F G

    Interviewer X 1 2 3 4 5 6 7

    Interviewer Y 4 3 1 2 5 7 6

    The differences in rankings are shown below.

    D -3 -1 2 2 0 -1 12d 9 1 4 4 0 1 1

    6429.0

    3571.01

    336

    1201

    )149(7

    )20(61

    )1(61

    20,0

    2

    2

    2

    =

    =

    =

    =

    =

    ==

    r

    nndr

    dd

    Example 7

    The results of two tests taken by 10 employees are shown below (figures in %)

    Employee A B C D E F G H I J

    206

  • 7/28/2019 chapter4ZICA4.6

    14/26

    Test X 50 52 58 66 70 74 77 86 92 94

    Test Y 56 51 53 65 64 81 76 78 80 92

    Rank each employee in order of performance in the two tests and calculate the rank

    coefficient .

    Ranking the employees in each test we have

    Employee A B C D E F G H I J

    Test X 10 9 8 7 6 5 4 3 2 1

    Test Y 8 10 9 6 7 2 5 4 3 1d 2 -1 -1 1 -1 3 -1 -1 -1 0

    2d 4 1 1 1 1 9 1 1 1 0

    8788.0

    1212.01

    990

    1201

    )1100(10

    )20(61

    )1(

    61

    20,0

    2

    2

    2

    =

    =

    =

    =

    =

    ==

    r

    nn

    dr

    dd

    Exercise 8

    1. Draw a scatter diagram of each of the sets of values given below, and calculate

    the PPMCC in each case.

    x 6 7 8 9 10a)

    y 3 6 9 12 15

    b) x 1 3 5 7 9 11

    207

  • 7/28/2019 chapter4ZICA4.6

    15/26

    y 8 7 6 5 4 3

    c) x 2 4 6 8 10 12 14

    y 12 8 8 14 9 7 13

    2. The following table gives the percentage unemployment figures for males andfemales in 9 regions. Draw a scatter diagram of these data and calculate PPMCC.

    Region

    Unemployment%

    Luapula Northern Eastern Central Lusaka Copperbelt N.Western Western Southern

    Male 3.4 3.5 4.5 4.4 12.5 12.8 3.2 4.2 4.8

    Female 3.2 3.8 4.6 3.8 11.8 11.5 4.0 3.8 3.5

    3. In a job evaluation exercise an assessor ranks eight jobs in order of increasinghealth risk. The same jobs have also been ranked in decreasing order on the basisof the number of applicants attracted per advertised post.

    Job A B C D E F G H

    Health 1 2 3 4 5 6 7 8

    Applicant 4 3 2 1 6 5 8 7

    Calculate the rank correlation coefficient for this information.

    4. The table below gives the Shorthand and Typing speeds of a sample of sevensecretaries

    Secretary 1 2 3 4 5 6 7

    Speed

    (words /min)

    Typing 42 44 47 47 50 54 57Shorthand 97 84 95 96 10

    7

    98 117

    Calculate the degree of correlation between the two skills by:

    208

  • 7/28/2019 chapter4ZICA4.6

    16/26

    a) the PPMCC, and

    b) the rank correlation coefficient.

    5. On the different days (picked at random) the following values were obtained forthe price of a share for a particular company together with the index on that day

    Share price

    (K)

    26

    0

    25

    0

    350 200 150 100 115 120 135 145

    Index 11

    5

    13

    5

    140 120 105 110 106 165 175 115

    Calculate Spearmans rank correlation coefficient and say whether the index andindicate whether the index is a reasonable indicator for the price of thecompanys share.

    EXAMINATION QUESTIONS WITH ANSWERS

    Multiple Choice Questions

    1.1 If ,8102

    == nandd the Spearmans rank correlation coefficient to 3 decimal

    places will be?

    A. 0.188 B. 0.841 C. 0.821 D. 0.881

    (Natech , 1.2. Mathematics & Statistics, December 2004)

    1.2 The prices of the following items are to be ranked prior to the calculation ofSpearmans rank correlation coefficient. What is the rank of item G?

    209

  • 7/28/2019 chapter4ZICA4.6

    17/26

    Item E F G H I J K L

    Price 18 24 23 19 25

    A. 5 B. 4 C. 3 D. 2.5(Natech , 1.2. Mathematics & Statistics, December 2003)

    1.3

    8x x

    6x x

    4 x xx

    2 xx x

    0 4 8 12 16

    On the basis of the Scatter diagram above, which of the following equationswould best represent the regression line of Y on X?

    A. y = x8 B. y = x + 8 C. y = x + 8

    D. y = x 8

    (Natech , 1.2. Mathematics & Statistics, December 2003)

    1.4 An investigation is being carried out regarding the hypothesis that factor X is acause of ailment Y. Which coefficient of correlation between X and Y gives mostsupport to the ailment?

    A. -0.9 B. -0.2 C. +0.8 D. 0

    (Natech , 1.2. /B1Mathematics & Statistics, December 1999 (Rescheduled))

    1.5 If ===== 19635,46075,10436,555,21622 xyyxyx and n =8,

    then the value of r, the coefficient of correlation to two decimal paces, is

    A. 0.79 B. 0.62 C. 1.01 D. 1.02(Natech , 1.2. Mathematics & Statistics, December 2001)

    210

  • 7/28/2019 chapter4ZICA4.6

    18/26

    1.6 The Scatter diagram below shows

    A. High positive correlation B. Very high correlation

    C. Very high negative correlation D. Perfect correlation.

    (Natech , 1.2. Mathematics & Statistics, June 2005)

    1.7 Find the value of a in a regression equation if === 400,150,7 yxb andn = 10.

    A. 145 B. -65 C. 25 D. -650(Natech , 1.2. Mathematics & Statistics, June 2005)

    1.8 In regression analysis, the variable whose value is estimated is referred to as the:

    A. Simple variable B. Independent variable

    C. Linear variable D. Dependent variable

    1.9 The value of the coefficient of determination is interpreted as indicating

    A. The proportion of unexplained varianceB. The proportion of explained varianceC. The extent of causationD. The extent of relationship

    1.10 Of the following coefficient of correlation, the one that is indicative of thegreatest extent of relationship between the independent and dependent variables is

    211

  • 7/28/2019 chapter4ZICA4.6

    19/26

    A. 0 B. +.20 C. 95. D. +.70

    SECTION B

    QUESTION ONE

    a) Derive the product moment correlation coefficient from the following data andcomment on your results.

    Pupil A B C D E F G H I J K

    Mathematicsmarks, x

    41 37 38 39 49 47 42 34 36 48 29

    Physicsmark, y

    36 20 31 24 37 35 42 26 27 29 23

    b) Find the estimated line, by method of least squares, fitting the following resultsfrom a Physics experiment.

    Load, x(Newtons)

    0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5

    Extensions,y (mm)

    18 11 25 22 35 50 54 45 52 68

    (Natech , 1.2. Mathematics & Statistics, June 2001)

    c) A company has the following data on its profit (y) and advertising expenditure9x) over the last six years.

    Profits

    (Million (K)

    Advertising

    Million (K)

    11.312.114.114.615.115.2

    0.520.610.630.700.700.75

    i) Use two (2) methods to justify your assumption that there is a relationshipbetween the two variables.

    ii) Forecast the profits for next year if an advertising budget of K800 000 isallocated.

    (Natech , 1.2. Mathematics & Statistics, December 2003)

    212

  • 7/28/2019 chapter4ZICA4.6

    20/26

    QUESTION TWO

    a) In the context of regression analysis explain what is meant by the following terms.

    i) Regression coefficientii) Explanatory variable.

    b) The following data shows the monthly imports (I) of apples and average prices(P) over a twelve-month period.

    Monthly Imports (I)

    (000 tonnes)

    Average Monthly Prices (P)

    (K/tones)100

    120

    125

    130

    128

    126

    120

    100

    90

    90

    95

    98

    232

    220

    218

    210

    210

    212

    217

    240

    242

    238

    230

    230

    i) Determine the regression equation if imports (I) of apples on the price (P)and use it to forecast monthly imports when the average monthly price isK250 per tonne.

    ii) If the correlation coefficient of the data is 0.95, interpret the results.

    (Natech , 1.2. Mathematics & Statistics, December 2004)

    QUESTION THREE

    Hungry Lion is a major food retailing company, which has recently decided to openseveral new restaurants. In order to assist with the choice of sitting these restaurants themanagement of fast foods limited whished to investigate the effect of income on eatinghabits. As part of their report a marketing agency produced the following table showingthe percentage of annual income spend on food y, for a given annual family income ((K)x)

    213

  • 7/28/2019 chapter4ZICA4.6

    21/26

    x

    (K000,000)

    y

    18

    27

    3645

    54

    72

    90

    62

    48

    3731

    27

    22

    18

    a) Plot, on separate Scatter diagrams.

    i) y against x

    ii) ,loglog 1010 xagainsty and comment on the relationship between

    income and percentage of family spent on food.

    b) Use the method of least squares to fit the relationship baxy = to the data.

    Estimate a and b.

    c) Estimate the percentage of annual income spent on food by a family with anannual income of K64,800,000.

    (Natech , 1.2. Mathematics & Statistics, December 2001)

    QUESTION FOUR

    a) Sales of product A between 0 and 4 years were as follows:

    Year Units sold (000s)

    2000 202001 182002 152003 142004 11

    Required:

    i) Calculate the correlation coefficient r.ii) Comment on the result in (i) above.iii) Calculate the coefficient of determination and comment.

    214

  • 7/28/2019 chapter4ZICA4.6

    22/26

    iv) Use a regression equation to estimate the sales in the year 2005.

    b) The table below shows the respective masses X and Y of a sample of 12 fathersand their oldest ones.

    Mass Xof father

    (Kg)

    65 63 67 643 68 62 70 66 68 67 69 71

    Mass Y

    of son

    (Kg)

    68 66 68 65 69 66 68 65 71 67 68 70

    From the data given above:

    i) Construct a scatter diagramii) Calculate the rank correlation coefficient using Spearmans method.

    (Natech , 1.2. Mathematics & Statistics, June 2005)

    c) Find the degree of correlation between the Bank of Zambia base lending rate andthe dollar exchange rate taken over the past six months using:

    i) The product moment coefficient of correlation.ii) The coefficient of rank correlation.

    Month Jan Feb Mar Apr May Jun

    Base % as on 1st ofeach month

    14 14 13.5 12.5 12 12

    Average rate ($) 1.90 1.91 1.86 1.84 1.84 1.83

    (Natech , 1.2. Mathematics & Statistics, Nov/Dec 2000)

    QUESTION FIVE

    a) The following table shows the number of units of a good product and the totalcosts incurred.

    Units Produced 100 200 300 400 500 600 700

    Total Costs (K) 40 000 45 000 50 000 65 000 70 000 70 000 80 000

    Draw a scatter diagram

    b) Find the appropriate least squares regression line so that the costs can be predictedfrom production levels and estimate the total costs when production is 250 units.

    c) State the fixed costs of production.

    215

  • 7/28/2019 chapter4ZICA4.6

    23/26

    d) Calculate r and explain how much of the variation in the dependent variable isexplained by the variation of the independent variable.

    (Natech , 1.2. Mathematics & Statistics, June 2002)

    QUESTION SIX

    a) A sample of eight employees is taken from the Production Department of anelectronics factory. The data below relates to the number of weeks experience inthe soldering of components, and the number of components, which were rejectedas unsatisfactory last week.

    Employee A B C D E F G H

    Weeks of experience (x)

    4 5 7 9 10 11 12 14

    No. of rejections(y)

    21 22 15 18 14 14 11 13

    i) Draw a Scatter diagram of the data.

    ii) Calculate a coefficient of correlation for these data and interpret its value.

    iii) Find the least squares regression equation of rejects on experience.Predict the number of rejects you would expect from an employee withone week experience.

    (Natech , 1.2. Mathematics & Statistics, December, 1999 Rescheduled))

    b) i) Distinguish between regression and correlation.

    ii) A experiment was conducted on 8 children to determine how a childsreading, ability varied with his/her ability to write. The points awardedwere as follows:

    Child A B C D E F G H

    Writing 7 8 4 0 2 6 9 5

    Reading 8 9 4 2 3 7 6 5

    Calculate the coefficient of rank correlation and interpret the results.(Natech , 1.2. Mathematics & Statistics, December, 2002)

    c) The mass of a growing animal is measured, in g, on the same day each week forwith weeks. The results are given below.

    Week x 1 2 3 4 5 6 7 8

    Mass (g) y 480 504 560 616 666 702 759 801

    216

  • 7/28/2019 chapter4ZICA4.6

    24/26

    i) Using 2cm to represent week 1 on the x-axis and 2cm to represent 100g onthe y-axis, plot a scatter diagram of mass y against week x.

    ii) Find the equation of the regression line of y on x.(Natech , 1.2. Mathematics & Statistics, December, 1998)

    QUESTION SEVEN

    a) The following Table gives the cost price and number of faults per annumexperienced with seven brands of video recorders.

    Video Recorders

    Brand Price (K000) No. of Faults per Annum

    ABCDEFG

    492458435460505439477

    2674351

    i) Determine Spearmans rank Correlation coefficient.

    iii) Interpret your answer in (i) above.

    (Natech , 1.2. Mathematics & Statistics, December,1998)

    b) The following Table gives a set of ten pairs of observation of inspection costs perthousand articles produced recorded on a number of occasions at several factoriescontrolled by a single group and producing comparable products.

    Observation Inspection costs per

    thousand articles

    Number of defective

    articles per thousand

    1

    2

    3

    4

    5

    6

    0.25

    0.30

    0.15

    0.75

    0.40

    0.65

    50

    35

    60

    15

    46

    20

    217

  • 7/28/2019 chapter4ZICA4.6

    25/26

  • 7/28/2019 chapter4ZICA4.6

    26/26

    a) Draw a scatter plot of y against x

    b) Calculate the coefficient of correlation ad interpret its value.

    c) Find the least squares regression equation of the number of defectives onexperience.

    d) Estimate the number of defectives in a box inspected by a worker with 6 weeks ofexperience.

    (Natech , 1.2. Mathematics & Statistics, December 1996)

    CCoefficient of Determination..................................205correlation......194, 202, 204, 205, 206, 208, 209, 210,

    211, 212, 213, 214, 215, 216, 219Correlation Analysis........................................194, 202

    Eextrapolation...........................................................200

    LLeast Square............................................................197

    Rregression 194, 196, 197, 199, 200, 201, 202, 210, 211,

    213, 215, 216, 217, 218, 219Regression...............................................194, 199, 213