Multivariate Ana

download Multivariate Ana

of 20

Transcript of Multivariate Ana

  • 7/29/2019 Multivariate Ana

    1/20

    1

    MULTIVARIATE ANALYSIS Statistical techniques that simultaneously analyze

    more than two variables

    Multivariate techniques two categories1. Dependency techniques deal with one ormore dependent variables

    One dependent variable - data metric Multiple Regression Analysis

    Several dependent variable data metric Discriminant Analysis

    2. Interdependency techniques More than two variables

    Variables not segregated as dependent andindependent variables Interrelationships between the variables are

    analyzed Data metric Factor Analysis, Cluster

    Analysis

  • 7/29/2019 Multivariate Ana

    2/20

    2

    MULTIVARIATE ANALYSIS

    Multiple RegressionUsed to analyse quantitative data

    To study cause and effect

    relationship between a singledependent variable and two or morethan two independent variables

    Used mainly for prediction/forecasting

  • 7/29/2019 Multivariate Ana

    3/20

    3

    Greek letters are used for a (a) and b (b) when

    denoting population parameters.

    Y a b X b X b X k k

    ' ... 1 1 2 2

    The general multiple regression with k

    independent variables is given by:

    X1 to Xkare the independent variables.

    a is theY-intercept.

    bj is called a partial regression coefficient. It is the

    net change in Y for each unit change in Xj holding all

    other variables constant, (where j=1 to k)

    Multiple Regression Analysis

  • 7/29/2019 Multivariate Ana

    4/20

    4

    Successive values of thedependent variable mustbe uncorrelated or notautocorrelated.

    ASSUMPTIONS IN MULTIPLE REGRESSION

    The independent variables

    and the dependent variable

    have a linear relationship.

    The dependent

    variable must becontinuous and atleast interval-scaled.

    The variation in (Y-Y) orresidual must be the samefor all values ofY. Whenthis is the case, we say thedifference exhibitshomoscedasticity.

    The residuals should

    follow the normaldistributed with mean 0.

  • 7/29/2019 Multivariate Ana

    5/20

    Correlation Matrix

    A correlation matrix is used to show all possiblesimple correlation coefficients among the variables.

    The matrix is useful for locating correlatedindependent variables.

    It shows how strongly each independent variable is

    correlated with the dependent variable.

    CorrelationCoefficients Cars Advertising Sales force

    Cars 1.000

    Advertising 0.808 1.000

    Sales force 0.872 0.537 1.000

  • 7/29/2019 Multivariate Ana

    6/20

    6

    Because determining b1, b2, etc. is

    very tedious, a software package

    such as Excel or MINITAB may be used.

    The least squares criterion is used

    to develop this equation.

    Multiple Regression Analysis

  • 7/29/2019 Multivariate Ana

    7/20

    7

    ANOVA TABLE

    Source df SS MSRegression k-1 SSRS(YY)2 SSR/(k-1)Error n-k-1 SSE

    S(Y-Y)2SSE/(n-k-1)

    Total n-k-1 SS Total

    S(Y-Y)Total Variation

    ANOVA Explained Variation

    Unexplained or Random Variation

    Variation not accounted for by the

    independent variables.

    Variation

    accounted

    for by theset of

    independent

    variables.

  • 7/29/2019 Multivariate Ana

    8/20

    8

    A market researcher for Super

    Markets is studying the yearly

    amount families of four or more

    spend on food. Three

    independent variables are

    thought to be related to yearlyfood expenditures (Food). Those

    variables are: total family

    income (Income) in $00, size offamily (Size), and whether the

    family has children in college

    (College).

    EXAMPLE 1

  • 7/29/2019 Multivariate Ana

    9/20

    9

    The variable college is called a dummy orindicator variable. It can take only one ofthe two possible outcomes i.e. a child is a

    college student or not.Examples of dummy variables: gender, thepart is acceptable or not, the voter will orwill not vote for the incumbent governor etc.

    We usually code one value of the dummyvariable as 1 and the other 0.

    Expenditure = a + b1*(Income) +b2(Size) + b3(College)

  • 7/29/2019 Multivariate Ana

    10/20

    10

    Example 1 continued

    Fam ily Food Incom e Size Student

    1 3900 376 4 0

    2 5300 515 5 13 4300 516 4 0

    4 4900 468 5 0

    5 6400 538 6 1

    6 7300 626 7 17 4900 543 5 0

    8 5300 437 4 0

    9 6100 608 5 1

    10 6400 513 6 111 7400 493 6 1

    12 5800 563 5 0

  • 7/29/2019 Multivariate Ana

    11/20

    11

    Example 1continued

    From the analysis provided by MINITAB,

    the estimated multiple regression equation is:

    Y=954 +1.09X1 + 748X2 + 565X3

    ? What food expenditure would you estimate for afamily of 4, with no college student, and an income

    of $50,000 (which is input as 500)?

    Food Expenditure= 954+1.09*income+748*size

    +565*college

  • 7/29/2019 Multivariate Ana

    12/20

    12

    Example 1 continued

    Each additional $100 dollars of income per year willincrease the amount spent on food by $109 per year.

    An additional family member will increase the

    amount spent per year on food by $748.A family with a college student will spend $565 moreper year on food than those without a college student.

    Food Expend.=$954+$1.09*income+$748*size+$565*college

    So a family of 4, with no college students, and an

    income of $50,000 will spend an estimated $4,491.

    Food Expend.=$954+$1.09*500+$748*4+$565*0

  • 7/29/2019 Multivariate Ana

    13/20

    13Example 1 continued

    The regression equation is

    Food = 954 + 1.09 Income + 748 Size + 565 Student

    Predictor Coef SE Coef T P

    Constant 954 1581 0.60 0.563

    Income 1.092 3.153 0.35 0.738

    Size 748.4 303.0 2.47 0.039

    Student 564.5 495.1 1.14 0.287

    S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%

    Analysis of Variance

    Source DF SS MS F P

    Regression 3 10762903 3587634 10.94 0.003

    Residual Error 8 2623764 327970

    Total 11 13386667

  • 7/29/2019 Multivariate Ana

    14/20

    14

    Correlation matrix

    The coefficient ofdetermination is80.4percent. Thismeans that morethan 80 percent ofthe variation in the

    amount spent onfood is accountedfor by the variablesincome, family

    size, and student.

    The strongest correlation between the dependentvariable and an independent variable is betweenfamily size and amount spent on food.

    Food Income Size College

    Food 1.000

    Income 0.587 1.000

    Size 0.876 0.609 1.000

    College 0.773 0.491 0.743 1.000

  • 7/29/2019 Multivariate Ana

    15/20

    15Example 1 continued

    H H0 2 1 2

    0 0: :b b

    Conduct an individual test to determinewhich coefficients are not zero. This is the

    hypotheses for the independent variablefamily size.

    From the MINITABoutput, the onlysignificant variable is

    FAMILY (family size)using the p-values.The other variables canbe omitted from the

    model.

    Thus, using the 5%level of

    significance, rejectH0 if the p-value