regression analysis

17
PROJECT REPORT ON REGRESSION ANALYSIS [IMBA ] PRESENTED BY NAME: - D.SRIKANTH

Transcript of regression analysis

Page 1: regression analysis

PROJECT REPORTON

REGRESSION ANALYSIS

[IMBA]

PRESENTEDBY

NAME: - D.SRIKANTH ENROLL NO: - 6NI14059

Page 2: regression analysis

INTRODUCATION:- Trevor Bull - Managing Director Mr. Trevor Bull joined Tata AIG Life as Managing Director in January 2006. Prior to this, Trevor was Senior Vice President and General Manager at American International Assurance in Korea

Tate AIG Life Insurance Company Ltd. and Tata AIG General Insurance Company Ltd. (collectively "Tata AIG") are joint venture companies, formed from the Tata Group and American International Group, Inc. (AIG). Tata AIG combines the power and integrity of the Tata Group with AIG's international expertise and financial strength. Tata Group holds 74 per cent stake in the two insurance ventures, with AIG holding the balance 26 per cent stake.

Tata AI G Life Insurance Company Ltd. provides insurance solutions to individuals and corporate. Tata AI G Life Insurance Company was licensed to operate in India on February 12, 2001 and started operations on April, 2001. Tata AIG Life offers a broad array of life insurance coverage to both individuals and

Page 3: regression analysis

groups, providing various types of add-ons and options on basic life products to give consumers flexibility and choice.

Tata AIG Life Insurance Company offers products

in Ahmedabad, Bangalore, Chandigarh, Chennai, Guwhati,

Hyderabad, Jaipur, Jamshedpur, Jodhpur, Kochi, Kolkata, Mangalore,

Muinbai, New Delhi, Pune, Rajkot, Trichi, - Vijay Wada and

Lucknow

Objective of the Study The objective of this study is to measure the regression analysis method used by TATA AIG in the city of Hyderabad.

Questionnaire Development

For the purpose of this study, a structured questionnaire was developed. In this stage, an exploratory study was carried out using personal and focus group interviews

Collection of Data The above mentioned questionnaire was used to collect the primary data. For secondary data, research papers, journals and magazines were referred.

Page 4: regression analysis

Regression analysis

In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable (also called response variable or measurement) and of one or more independent variables (also known as explanatory variables or predictors). The dependent variable in the regression equation is modeled as a function of the independent variables, corresponding parameters ("constants"), and an error term.

The error term is treated as a random variable. It represents unexplained variation in the dependent variable. The parameters are estimated so as to give a "best fit" of the data. Most commonly the best fit is evaluated by using the least squares method, but other criteria have also been used.

Regression can be used for prediction (including forecasting of time-series data), inference, hypothesis testing, and modeling of causal relationships. These uses of regression rely heavily on the underlying assumptions being satisfied. Regression analysis has been criticized as being misused for these purposes in many cases where the appropriate assumptions cannot be verified to hold. One factor contributing to the misuse of regression is that it can take considerably more skill to critique a model than to fit a model

Page 5: regression analysis

Underlying assumptions

Classical assumptions for regression analysis include:

The sample must be representative of the population for the inference prediction.

The error is assumed to be a random variable with a mean of zero conditional on the explanatory variables.

The independent variables are error-free. If this is not so, modeling may be done using errors-in-variables model techniques.

The predictors must be linearly independent, i.e. it must not be possible to express any predictor as a linear combination of the others. See Multicollinearity.

The errors are uncorrelated, that is, the variance-covariance matrix of the errors is diagonal and each non-zero element is the variance of the error.

The variance of the error is constant across observations (homoscedasticity). If not, weighted least squares or other methods might be used.

These are sufficient (but not all necessary) conditions for the least-squares estimator to possess desirable properties, in particular, these assumptions imply that the parameter estimates will be unbiased, consistent, and efficient in the class of linear unbiased estimators. Many of these assumptions may be relaxed in more advanced treatments.

Regression Analysis that involves two variables is termed bi-variate linear Regression Analysis. Regression Analysis that involves more than two variables is termed as “Multiple Regression Analysis”.

Page 6: regression analysis

The Bi-variate linear Regression Analysis involves Analyzing the straight line relationship between two continues variables the Bi-variate linear Regression can be expressed as:

Y = α + β X

Where,

Y represents the dependent variable

X is independent

α and β are two constraint which are know as regression coefficient.

β is slope of coefficient

β can be symbolically represented as ∆Y/∆X

α= Yi-Xiβ

β = (Yi-Yj)/ (Xi-XJ)

Least square method

The method of least squares or ordinary least squares (OLS) is used to solve over determined systems. Least squares are often applied in statistical contexts, particularly regression analysis.

Least squares can be interpreted as a method of fitting data. The best fit in the least-squares sense is that instance of the model for which the sum of squared residuals has its least value, a residual being the difference between an observed value and the value given by the model. The method was first described by Carl Friedrich Gauss around 1794.[1] Least squares correspond to the maximum likelihood criterion if the experimental errors have a normal distribution and can also be

Page 7: regression analysis

derived as a method of moments estimator. Regression analysis is available in most statistical software packages.

The relationship between the amount spent on advertisement per month & number of customer visited because of advertisement given by TATA AIG Life Insurance Co.

The equation for regression line assume by least square is shown below Y=a+bX+ci

Where,

Y is dependent variable

X is independent variable

a is a Y intersect

b is a slope of line

The below table shows the amount spent on advertisement & number of customer visited through advertisement.

AMOUNT

SPENT

ON ADVERTISING

(IN CRORES)[X]

N.O OF CUSTOMERS

VISITED

(IN 000’S) [Y]

JAN 3.6 9.3FEB 4.8 10.2MAR 2.4 9.7APR 7.2 11.5MAY 6.9 12JUN 8.4 14.2JUL 10.7 18.6AUG 11.2 28.4SEP 6.1 13.2OCT 7.9 10.8

Page 8: regression analysis

NOV 9.5 22.7DEC 5.4 12.3

The constant b can be calculated using formula

b=m∑ (XY)-∑X ∑Y/n ∑(X2)-(∑X) 2

X is dependent variable

Y is independent variable

a is calculated as shown below:

a = Ῡ-b

Where,

Ῡ = the mean of value of dependent variable

= the mean of value of independent variableei= is the error. It is called as residual value.The criterion for the least squar method is given below.

Σ e2i

i=1

Where ei = Yi Ŷi

Yi is the actual value of the Dependent variable

Ŷi is the value lying on the Estimated regression line.

Page 9: regression analysis

Let a solve the example previously discussed using the least square method.

We need to determine the constant a&b to develop the regression equation. The required calculation for determining the constant are shown in table

AMOUNT

SPENT

ON ADVERTISING

(IN CRORES)[X]

N.O OF

CUSTOMERS

VISITED

(IN 000’S) [Y]

XY X2

3.6 9.3 33.48 12.96

4.8 10.2 48.96 23.04

2.4 9.7 23.28 5.76

7.2 11.5 82.8 51.84

6.9 12 82.8 47.61

8.4 14.2 119.28 70.56

10.7 18.6 199.02 114.49

11.2 28.4 318.08 125.44

6.1 13.2 80.52 37.21

7.9 10.8 85.32 62.41

Page 10: regression analysis

9.5 22.7 215.65 90.25

5.4 12.3 66.42 29.16

Σx=84.1 ΣY=172.9 ΣXY=1355.61 ΣXY=1355.61

b = 12(1355.61)- (84.1)(172.9)/12(670.73)-(84.1)2

= 1.768The step is to calculate “a”To calculate the value of small “a” we need to first determine the mean of value of variable X&Y

= 84.1/12 =7.0 Ῡ = 172.9/12 =14.40

Substituting the value in equation

a = 14.40-(1.768)(7) = 14.40-12.39 = 2.01

We know develop the estimated regression equation by substituting the value of a & b in equations

Ŷ = 2.01+1.768X

Ŷ represents the estimated value of dependent variable for a given value of X The Strength of Association – R2

R2 can be calculated using the following formula:

Page 11: regression analysis

R2 = explained variance/total variance

Total variance = explained variance – unexplained variance

Explained variance = total variance – unexplained variance

Therefore R2

= total variance – unexplained variance/total variance

R2 = 1-unexplained variance/total variance The unexplained variance is given by Σ(Yi – Ŷ) 2

The total variance by Σ(Yi - Ῡ) 2

R2 = 1-Σ(Yi – Ŷ) 2 / Σ(Yi - Ῡ) 2

X Y XY X 2 Ŷ Y- Ŷ (Y- Ŷ) 2 (Ŷ- Ῡ) 2

(Y- Ῡ) 2

3.6 9.3 33.48 12.96 8.3748

0.9252

0.85599504

36.30304

26.01

4.8 10.2 48.96 23.04 10.4964

-0.2964

0.08785296

15.23809

17.64

2.4 9.7 23.28 5.76 6.2532

3.4468

11.88043024

66.37035

22.09

7.2 11.5 82.8 51.84 14.7396

-3.2396

10.49500816

0.115328

8.41

6.9 12 82.8 47.61 14.2092

-2.2092

4.88056464

0.036405

5.76

8.4 14.2 119.28 70.56 16.8 - 7.08198 6.057 0.04

Page 12: regression analysis

612 2.6612

544 505

10.7 18.6 199.02 114.49 20.9276

-2.3276

5.41772176

42.60956

17.64

11.2 28.4 318.08 125.44 21.8116

6.5884

43.40701456

54.93181

196

6.1 13.2 80.52 37.21 12.7948

0.4052

0.16418704

2.576667

1.44

7.9 10.8 85.32 62.41 15.9772

-5.1772

26.80339984

2.48756

12.96

9.5 22.7 215.65 90.25 18.806

3.894 15.163236

19.41284

68.89

5.4 12.3 66.42 29.16 11.514

0.786 0.617796

8.328996

4.41

Σx=84.1

=7.0

ΣY=172.9

Ῡ =14.40

ΣXY=1355.61

ΣXY=1355.61

Σ (Y- Ŷ) 2

=126.855

Σ (Ŷ- Ῡ) 2

=254.4682

Σ (Y- Ῡ) 2

=381.29

Therefore

R2 = 1- (Yi – Ŷ) 2 / Σ(Yi - Ῡ) 2

= 1- 126.885/381.29

= 1- 0.33 = 0.67

= 67%

Conclusion

Page 13: regression analysis

This implies that of the total variation of Y, nearly 67% is explain by the variation in X.

Hence there is strong linear relationship between the two variables.