A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were...

18
A Multiple Regression Analysis on Financial Beta Coefficients in the Oil and Natural Gas Industry William Lamberti

Transcript of A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were...

Page 1: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

A Multiple Regression Analysis on Financial Beta Coefficients

in the Oil and Natural Gas Industry

William Lamberti

Page 2: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 1

Introduction

The goal of this report was to find a model that can best explain beta values for the oil

and natural gas industry. We specifically looked at all active stocks with the SIC codes of 4923,

4924, 4925, 4931, 4932, and the 2900’s in the New York Stock Exchange. It was also desirable

to say which variables are the most important in predicting the beta value. In particular, we

wished to know if older companies had safer beta values in comparison to newer firms. The data

was collected from www.mergentonline.com. Additionally, the beta values were collected on

the same day from Yahoo Finance as Mergent does not have this information readily available.

The general approach utilized was a weighted least squares multiple regression fit of the data. It

should be noted that β was denoted as the financial beta statistic of interest and not the multiple

regression parameter coefficient symbol.

It was important to understand what factors might help explain a given beta value for a

stock because betas are heavily relied upon in the financial industry. Firms and individuals

depended on this statistic in one form or another. In particular, individual investment plans

heavily relied upon this statistic in building their portfolio so that they are empowered to invest

for their own needs and desires. Therefore, it was valuable to fully understand what variables

can help to explain a given beta value.

Analysis on beta statistics has been active since its introduction by Sharpe and Litner.1 2

They defined the beta statistic to be

𝛽 =𝐶𝑜𝑣(𝑠𝑎, 𝑠𝑏)

𝑉𝑎𝑟(𝑠𝑏)

1 Lintner, John. "The Valuation of Risk Assets and the Selection of Risky Investments in Stock Portfolios and

Capital Budgets." The Review of Economics and Statistics 47, no. 1 (1965): 13. Accessed May 23, 2015. 2 Sharpe, William F. "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk." The Journal

of Finance XIX, no. 3 (1964): 425. Accessed May 23, 2015.

Page 3: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 2

where 𝑠𝑎 and 𝑠𝑏 indicated the two respective stocks or markets of interest. The beta statistic’s

goal was to measure the risk of an investment. Betas that had values of exactly 1 indicated

perfect alignment with the market’s volatility. Betas with less than 1 indicated a stock that was

less volatile than the market, while stocks greater than 1 were more volatile than the market.

Therefore, they indicated safer and riskier stocks respectively. For instance, an investor could

have compared two stocks as a relative risk estimate. However, in the modern age and for what

we were primarily concerned about, the beta was calculated when comparing a stock to the

market. Betas were also used to help diversify portfolios so that investors could achieve the

optimal investment options given a family of indifference curves.3

One of the first analyses on betas was done by Ronald W. Melicher. He performed a

factor analysis and multiple regression on firms operating in the electric utility industry.4 He

chose this sector because of the “implied homogeneity in terms of regulation and business risk”.5

He was motivated to do this in particular because of the few studies that were done on beta

values up to this point. Additionally, some severely suffered from multicollinearity in their other

regression models. Melicher used factor analysis to find variables on which to build his

regression model. During the time, this was a helpful way in building a model as computers

were not nearly as efficient and powerful as they were in recent years. However, factor analysis

was a technique that was not preferable as it could have given biased results. It was built upon

the assumption that you will find latent variables even if those variables do not exist in practical

applications. Nonetheless, Melicher performed factor analysis and found that financial leverage,

size, earnings trend and stability, operating efficiency, financing policy, return on investment,

3 Ibid. 4 Melicher, Ronald W. "Financial Factors Which Influence Beta Variations within an Homogeneous Industry

Environment." The Journal of Financial and Quantitative Analysis, 1974, 231. Accessed May 25, 2015. 5 Ibid.

Page 4: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 3

and market activity were able to explain about 85% of the variation in the data. He then built a

few models utilizing multiple and stepwise regression techniques. The 𝑅2’s for the models

ranged between about 33% to 41%.

However, more recent studies focus on if a replacement of the beta statistic should be

altered. Furthermore, it was difficult to find easily accessible similar analyses exploring what

variables help to explain beta values. Still, some time series analyses have been done on the

nature of the beta value. For instance, Chang and Weiss found that beta acts more like a random

variable than as a fixed parameter value.6

For our analysis, the dependent variable was the beta value for the stock. The

independent variables initially were the number of full time employees in 2013, the year that the

company was incorporated, the long term debt, the 2013 revenue, the market cap, the earnings

per share, the dividend per share, and the PE ratio. The general approach utilized was a partial

least squares multiple regression fit of the data. Our analysis differs from others as it is

specifically looking at the variables that help to explain the given value of the beta statistic.

Furthermore, it was on a sector of the economy that was crucial for its daily operation. Without

energy to keep businesses and individuals running, the economy would severely suffer.

Therefore, it was desirable to understand what factors explain the beta statistics in this sector.

Methods:

The general approach entailed a partial least squares multiple regression fit to the data.

We used SAS to perform some of the heavier computational analyses while also using Excel and

R to perform some more elementary calculations and needed analyses. We first checked the

dependent variable correlation and variance inflation to ensure that no multiculinearity existed.

6 Chang, Wei-Chien, and Donald E. Weiss. "An Examination of the Time Series Properties of Beta in the Market

Model." Journal of the American Statistical Association, 1991, 883. Accessed May 25, 2015.

Page 5: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 4

This led to the removal of the market cap variable. None of the other variables needed to be

removed as the variance inflation factors were all below 10. With the remaining regressors,

multiple regression was performed using SAS Studio 3.2. Analysis included following a careful

procedure. The data was collected from Mergent’s database and Yahoo Finance. Yahoo Finance

calculated beta by comparing the monthly price change of a stock relative to the relative monthly

price change of S&P500 over the past 3 years.7 Of the 159 records on Mergent’s database, 64

were active. The final number of observations used in the analysis was 53 as 11 had missing

values and were not able to be used for analysis. Next, a multiple regression first order model

was proposed. This model had 7 regressors, but it was expected that this would have been

refined upon further analysis. Next, parameters were estimated throughout our analysis through

partial least squares regression while following the Guass-Markov theorem. Next, the errors

were estimated as normally distributed to allow for the approximation of the variance and

standard deviations. Following that, the F value and t values were considered for further

analysis. Next, the normality of the error distribution was established by residual visualization.

This was conducted for every proposed model. Lastly, for the best model, the overall regression

model and confidence intervals on parameter estimates were derived using the assumption of the

normally distributed errors.

The first order was refined by considering interactions. While all interactions were

considered, the model was not able to help explain more of the variation in the data. Even trying

backwards, forwards, and stepwise methods provided models that explained less of the variation

in the data or models that were not valid. We then checked each interaction individually and

included those that helped to explain more variation in the data. Additionally, we believed that

7 "Key Statistics Definitions." Yahoo Finance. Accessed May 26, 2015.

https://help.yahoo.com/kb/finance/SLN2347.html?impressions=true.

Page 6: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 5

two higher order terms were needed. Therefore, the final model resulted in 7 first order model

terms, 3 interactions, and 2 higher order terms. Analysis of residuals and confidence intervals

provided additional support for the final model. We then concluded with an analysis on the

missing observations.

Results:

The variance inflation factor was compared to check for multicollinearity. The VIF

attempted to show those independent variables that were highly associated with one another.

Those that have values above 10 were considered variables that indicated the model had a severe

multicollinearity problem. The first VIF analysis was shown in Table 1. Since 2 variables have

VIF statistics above 10, we removed the highest one, Market Cap. After removing this

observation, the VIF analysis was run again. At this point, all of the VIF statistics were below

10. The results are provided in Table 2. Since all of the variables are less than 10, the rest of the

variables were retained. Therefore, with the remaining regressor variables, the following first

order model was

Table 1: Initial VIF Analysis

Variable VIF

Intercept 0

num_full_temp 9.08873

year_started 1.17284

long_term_debt 2.22526

Revenue 14.71564

Market_Cap 17.36815

EPS 3.68571

DPS 1.45508

PE 1.91887

Table 2: Final VIF Analysis

Variable VIF

Intercept 0

num_full_temp 7.02489

year_started 1.15978

long_term_debt 2.07018

Revenue 5.92337

EPS 3.13654

DPS 1.36319

PE 1.90276

Page 7: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 6

𝑀𝑜𝑑𝑒𝑙 𝐼 = �̂� = 1.748 − 0.000036(𝑛𝑢𝑚_𝑓𝑢𝑙𝑙_𝑡𝑒𝑚𝑝) − 0.000989(𝑦𝑒𝑎𝑟_𝑠𝑡𝑎𝑟𝑡𝑒𝑑)

− 9.68𝑋10−12(𝑙𝑜𝑛𝑔_𝑡𝑒𝑟𝑚_𝑑𝑒𝑏𝑡) + 5.32𝑋10−9(𝑅𝑒𝑣𝑒𝑛𝑢𝑒) + 0.159(𝐸𝑃𝑆)

+ 0.165(𝐷𝑃𝑆) + 0.0201(𝑃𝐸)

The F value for this model had a value of 7.95 with a

corresponding p value of less than .0001. Therefore, the model

was significant. Additionally, the 𝑅2 and 𝑅𝑎2 had values of 0.553

and .4835 respectively. It should be noted that all variables had t

values that were significant at the .05 threshold value except the

year and long term debt variables. However, they were retained as

they helped to explain variation in the data. We found evidence

that a multiple regression analysis is appropriate for this data.

The residual plot versus the predicted plot of the data

suggested that a transformation was necessary. This was shown

in Figure 1. Attempting both a square root and a logarithm

transformation, the square root transformation appropriately

transformed the residuals. The residuals appeared randomly

distributed around 0 after this transformation. This was shown

in Figure 2. We also assessed the QQ Plot after the

transformation to confirm that the residuals have uniform

scedasicity. Overall, we found that the QQ Plot was fairly

linear. Therefore, the transformation appeared appropriate.

Next, possible interactions were explored. Trying all

possible interactions did not result in meaningful results.

Figure 1: The residuals versus the predicted value plot for the first order model.

Figure 2: The residual versus predicted plot of the model after the square root transformation.

Figure 3: The QQ Plot of the transformed model.

Page 8: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 7

Additionally, trying backwards, forwards, and

stepwise methods for selecting variables did not

produce meaningful results because the models

we produced were biased, had lower 𝑅𝑎2, or were

not valid models. Therefore, we analyzed the

residuals to try and find possible interactions or

higher order terms that might be present. This was

provided in Figure 3. Since some pinching was

observed in the year variable, we believed that year

could be interacting with other variables. Therefore, we tested interactions with the other

variables individually to see if the results were meaningful. Additionally, we found that the DPS

variable had a curvature in the residuals suggesting that there could be a higher order term. This

was similar for the revenue variable, but is only evident when the plot is expanded.

We added each interaction term for the year variable with the others individually. Each

time we assessed the 𝑅𝑎2 to see if it helped to explain the variation in the data. In the end, the

year variable had meaningful interactions with the number of full time employees and revenue.

The other interactions with year did not help explain the variation in the data nor were

significant. All other interactions were considered individually as well. However, only an

interaction between long term debt and EPS were meaningful. Additionally, the higher order

terms for revenue and DPS were explored and were found to improve the 𝑅𝑎2. Therefore they

were retained as well. Thusly we resulted in a model that had an additional 3 interaction terms

and 2 higher order terms. The resulting model was

Figure 3: The residual plots of the 7 variables. The pinching was indicated on the 2nd plot of the first row for the year variable. The curvature was indicated on the 3rd plot of the 2nd row for DPS.

Page 9: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 8

𝑀𝑜𝑑𝑒𝑙 𝐼𝐼 = √𝛽̂

= −0.763 + 0.000475(𝑛𝑢𝑚_𝑓𝑢𝑙𝑙_𝑡𝑒𝑚𝑝) + 0.000649(𝑦𝑒𝑎𝑟_𝑠𝑡𝑎𝑟𝑡𝑒𝑑)

+ 3.71𝑋10−12(𝑙𝑜𝑛𝑔_𝑡𝑒𝑟𝑚_𝑑𝑒𝑏𝑡) + 6.67𝑋10−8(𝑅𝑒𝑣𝑒𝑛𝑢𝑒) + 0.117(𝐸𝑃𝑆)

− 0.117(𝐷𝑃𝑆) + 0.011(𝑃𝐸) − 2.97X10−11(𝑦𝑛𝑢𝑚) − 2.97X10−11(𝑦𝑟)

− 4.205X10−12(𝑙𝑡𝑒𝑝𝑠) − 2.087X10−17(𝑟2) + 0.06(𝑑2)

Model II was significant with an F value of 8.46 and a corresponding p value of less than

.001. This model had a corresponding 𝑅2 and 𝑅𝑎2 values of 0.717 and 0.633 respectively. A

brief description of the model is provided in Table 3. Furthermore, with a Cp of 13, we have

evidence that the model produced is not biased.

The Cook’s D plot can be seen in Figure 4. We note that the third observation was

pulling the model towards itself heavily.

Therefore, we perform the Jackknife

procedure until we were satisfied that the

model was not being pulled significantly

towards a single observation. Therefore,

this was performed on observations 3 and 9.

Therefore, the new model produced was

Table 3: Model II summary with transformation,

interactions, and higher order terms

Source DF Sum of

Squares

Mean

Square

F

Value

P value

Model 12 3.21035 0.26753 8.46 <.0001

Error 40 1.26471 0.03162

Corrected

Total

52 4.47506

Figure 4: The Cook’s D plot of Model II. The 3rd observation has a very high Cook’s D statistic.

Page 10: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 9

𝑀𝑜𝑑𝑒𝑙 𝐼𝐼𝐼 = √𝛽̂

= 0.5879 + 0.00049(𝑛𝑢𝑚_𝑓𝑢𝑙𝑙_𝑡𝑒𝑚𝑝) − 0.00001(𝑦𝑒𝑎𝑟_𝑠𝑡𝑎𝑟𝑡𝑒𝑑)

− 1.99𝑋10−12(𝑙𝑜𝑛𝑔_𝑡𝑒𝑟𝑚_𝑑𝑒𝑏𝑡) − 1.68𝑋10−8(𝑅𝑒𝑣𝑒𝑛𝑢𝑒) + 0.094(𝐸𝑃𝑆)

− 0.220(𝐷𝑃𝑆) + 0.0098(𝑃𝐸) − 2.65X10−7(𝑦𝑛𝑢𝑚) + 9.52X10−11(𝑦𝑟)

− 2.88X10−12(𝑙𝑡𝑒𝑝𝑠) − 1.11X10−16(𝑟2) + 0.0696(𝑑2)

Table 4: Model III summary with transformation, interactions,

and higher order terms after the Jackknife

Source DF Sum of

Squares

Mean

Square

F Value P value

Model 12 3.17563 0.26464 8.62 <.0001

Error 38 1.16675 0.03070

Corrected

Total

50 4.34238

Model III was significant with an F value

of 8.62 and a corresponding p value of less than

.001. This model had a corresponding 𝑅2 and 𝑅𝑎2

values of 0.731 and 0.647 respectively. A brief

description of the model was provided in Table 4.

Furthermore, with a Cp of 13, we have evidence

that the model produced is not biased.

A residual analysis was performed on the

final model. The distribution of the errors

appeared to be fairly normally distributed and centered at 0. There was minimal kurtosis present.

The residual plot showed that most of the observations were within 2 standard deviations of the

mean, and only a couple were beyond 2 standard deviations. The QQ Plot was fairly linear.

Figure 5: The residual plots for analysis of Model III.

Page 11: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 10

This satisfied the 3 parts of the Gauss-Markov Theorem and the 4th assumption of the normally

distributed errors. The predicted versus actual value did fairly well at predicting the higher

values but was more scattered for those values less than .75. This was logical as the 𝑅2 was .73.

This was provided in Figure 5.

Discussion:

The final model, Model III, included 7 regressors, 3 interactions, and 2 higher order terms. As

stated previously, the final model was

𝑀𝑜𝑑𝑒𝑙 𝐼𝐼𝐼 = √𝛽̂

= 0.5879 + 0.00049(𝑛𝑢𝑚_𝑓𝑢𝑙𝑙_𝑡𝑒𝑚𝑝) − 0.00001(𝑦𝑒𝑎𝑟_𝑠𝑡𝑎𝑟𝑡𝑒𝑑)

− 1.99𝑋10−12(𝑙𝑜𝑛𝑔_𝑡𝑒𝑟𝑚_𝑑𝑒𝑏𝑡) − 1.68𝑋10−8(𝑅𝑒𝑣𝑒𝑛𝑢𝑒) + 0.094(𝐸𝑃𝑆)

− 0.220(𝐷𝑃𝑆) + 0.0098(𝑃𝐸) − 2.65X10−7(𝑦𝑛𝑢𝑚) + 9.52X10−11(𝑦𝑟)

− 2.88X10−12(𝑙𝑡𝑒𝑝𝑠) − 1.11X10−16(𝑟2) + 0.0696(𝑑2)

where ynum, yr, lteps were the interactions between years and number of employees, years and

revenue, and long term debt and EPS. R2 and d2 represented the higher order terms for revenue

squared and DPS squared.

Table 5: Parameter Estimates

Variable Parameter

Estimate

Standard

Error

Type I SS 95% Confidence Limits

Intercept 0.588 1.929 N/A -3.317 4.49

num_full_temp 0.00049 0.00026 0.533 -0.00003 0.001

year_started -0.000010 0.00095 0.0001 -0.00193 0.002

long_term_debt -1.9859E-12 6.92841E-12 0.072 -1.6012E-11 1.20399E-11

Revenue -1.67648E-7 1.712373E-7 1.197 -5.14299E-7 1.790042E-7

EPS 0.094 0.035 0.103 0.024 0.164

DPS -0.22 0.118 0.154 -0.458 0.018

PE 0.0098 0.00405 0.279 0.0016 0.018

Ynum -2.65166E-7 1.334934E-7 0.371 -5.35409E-7 5.077335E-9

Yr 9.51586E-11 8.99055E-11 0.033 -8.6846E-11 2.77163E-10

Lteps -2.8852E-12 2.34561E-12 0.099 -7.6337E-12 1.86319E-12

r2 -1.1093E-16 5.58336E-17 0.095 -2.2396E-16 2.10264E-18

d2 0.0696 0.0249 0.239 0.019 0.12

Page 12: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 11

Model III had captured about 73% of the variation

within the model as indicated by the 𝑅2 of 0.731. With a root

MSE of about 0.175, this means that about 95% of the

observations will fall within 2(0.175), or .35, of the predicted

value. While the year variable and its interactions did not contribute the most to 𝑅2, they did

help partially. Additionally, the variable that had the biggest impact on the 𝑅2 was revenue and

its interaction. This was deduced by means of the Type I SS Error statistic. These figures were

summarized in Tables 5 and 6.

We then analyzed those observations with missing beta values. The three with the

missing values were ExxonMobil, Chevron, and ONE Gas, or observations 3, 9 and 60

respectively. The predicted values for each are -15, -2.1, and 0.699 respectively. The 95%

confidence intervals for a given beta value were -38 to 6, -7 to 2.5, and 0.32 to 1.07 respectively.

The 95% confidence intervals for the mean beta value were -38 to 6, -7 to 2.5, and 0.57 to 0.82

respectively. This was summarized in Table 7.

We note that it was obvious that ExxonMobil’s and Chevron’s betas were far off from

their observed values. We have these values as they were removed during the Jackknife

procedure. The observed beta values were 1.13 and 1.2 for ExxonMobil and Chevron

respectively. However, we were not entirely surprised as these firms are often cited as outliers in

the market. They were consistently noted as one of the best stocks to invest in. ExxonMobil was

Table 6: Important Figures

for Model III

Root MSE 0.175

Dependent Mean 0.817

𝑹𝟐 0.7313

𝑹𝒂𝟐 0.6465

Table 7: Estimates for Missing Values

Obs Predicted

Value

Std Error

Mean Predict

95% CL Mean 95% CL Predict

3 -15.8917 10.8608 -37.8782 6.0949 -37.8811 6.0978

9 -2.0756 2.2597 -6.6501 2.4990 -6.6639 2.5127

60 0.6988 0.0590 0.5794 0.8182 0.3245 1.0731

Page 13: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 12

the largest publicly traded oil company. They were considered to have a world class

manufacturing efficiency and scale. They were highly recommendable for an investment

portfolio.8 Additionally, Chevron was the world’s 4th largest oil firm. They were also

considered one of the top stocks with above average dividend yields.9 As mentioned previously,

we performed the Jackknife on these two observations. The model was being pulled by these

outliers extremely strongly. Further, when the Jackknife was performed, it made the model more

accurate for typical observations but not as accurate for outliers. Therefore, we were not

surprised that these companies were unable to be estimated correctly by the model as they were

understood to be special and unique.

Conclusion:

The results of this analysis were somewhat comparable to Melicher’s work. We found

that very different measures were able to provide much more explanation than the variables used

in his analysis. Additionally, we found that the time in which a company was incorporated does

impact the beta value. Those that are older would have larger beta values than those that are

newer given that the other variables are constant. However, since the years a company interacts

with other variables, it was difficult to say with how precise our claim was. Specifically, the

interaction between years and the number of employees suggests that the beta values will be

smaller for new companies. The interaction term between year and revenue suggest that the beta

value will be larger for the newer companies. Nonetheless, the year a firm had been incorporated

was an important factor in helping to explain beta values. While we were able to produce a

useful model for the oil industry, we recognize that it may be problematic to apply this same

model to other industries. However, an analysis comparing how this model does in comparison

8 The Value Line Investment Survey. 38th ed. Vol. LXX. New York: Arnold Bernhard, 2015. 504-505. Print. 9 Ibid.

Page 14: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Lamberti 13

to models from other sectors or even other oil and natural gas industries in other countries would

be enlightening.

Page 15: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Appendix

/** ultima code for finance ind study **/

/** Import an XLSX file. **/

PROC IMPORT DATAFILE="/folders/myfolders/Finance/oil final.xlsx"

OUT=WORK.finance

DBMS=XLSX

REPLACE;

RUN;

/** Print the results. **/

PROC PRINT DATA=WORK.finance; RUN;

title "vif on all variables";

proc reg data= finance;

model beta = num_full_temp--pe / vif;

run;

title "vif on remaining variables";

proc reg data= finance;

model beta = num_full_temp--revenue EPS--pe / vif;

run;

/** vif complete **/

/** try interactions **/

data final;

set finance;

logy=log(beta);

sqrty=sqrt(beta);

asiny=ARSIN(beta);

lt2= long_term_debt*long_term_debt;

r2= revenue*revenue;

eps2= eps*eps;

pe2= pe*pe;

ynum=year_started*num_full_temp;

ylt=year_started*long_term_debt;

yr=year_started*revenue;

yeps=year_started*eps;

ydps=year_started*dps;

ype=year_started*pe;

ltr= long_term_debt*revenue;

lteps= long_term_debt*eps;

ltdps=long_term_debt*dps;

ltpe= long_term_debt*pe;

rpe= revenue*pe;

Page 16: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

reps=revenue*eps;

rdps=revenue*dps;

rps=revenue*pe;

epsdps=eps*dps;

epspe=eps*pe;

dpspe=dps*pe;

ltrpe= long_term_debt*pe*revenue;

yepspe= year_started*eps*pe;

d2=dps*dps;

r2=revenue*revenue;

run;

/** trying transformations **/

title "Finance asin(y)";

proc reg data= final;

model asiny= num_full_temp--revenue EPS--pe;

run;

title "Finance Log(y)";

proc reg data= final;

model logy = num_full_temp--revenue EPS--pe;

run;

title "Finance Sqrt(y)";

proc reg data= final;

model sqrty= num_full_temp--revenue EPS--pe;

run;

/** selection techniques did not work try brute force**/

title "Finance Variable Selection";

proc reg data= final plots(label) = (cooksd RSTUDENTBYPREDICTED);

model sqrty= num_full_temp--revenue EPS--pe ynum yr lteps r2 d2;

run;

PROC IMPORT DATAFILE="/folders/myfolders/Finance/RE oil final.xlsx"

OUT=WORK.refinance

DBMS=XLSX

REPLACE;

RUN;

data refinal;

set refinance;

logy=log(beta);

sqrty=sqrt(beta);

asiny=ARSIN(beta);

lt2= long_term_debt*long_term_debt;

Page 17: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

r2= revenue*revenue;

eps2= eps*eps;

pe2= pe*pe;

ynum=year_started*num_full_temp;

ylt=year_started*long_term_debt;

yr=year_started*revenue;

yeps=year_started*eps;

ydps=year_started*dps;

ype=year_started*pe;

ltr= long_term_debt*revenue;

lteps= long_term_debt*eps;

ltdps=long_term_debt*dps;

ltpe= long_term_debt*pe;

rpe= revenue*pe;

reps=revenue*eps;

rdps=revenue*dps;

rps=revenue*pe;

epsdps=eps*dps;

epspe=eps*pe;

dpspe=dps*pe;

ltrpe= long_term_debt*pe*revenue;

yepspe= year_started*eps*pe;

/**yltepspe=long_term_debt*yepspe;**/

/**ryepspe=revenue*year_started*eps*pe;**/

/**ltyr=long_term_debt*year_started*revenue;**/

/**ltrpe=long_term_debt*revenue*pe;**/

yy=year_started*year_started;

lt2=long_term_debt*long_term_debt;

d2=dps*dps;

r2=revenue*revenue;

/**yepsdps=year_started*eps*dps; doesnt help**/

ltepsy=lteps*year_started;

yepsper= yepspe*r2;

run;

title "Finance Removed";

proc reg data= refinal plots(label) = (cooksd RSTUDENTBYPREDICTED);

model sqrty= num_full_temp--revenue EPS--pe ynum yr lteps r2 d2/ cli

clb clm;

run;

title "GLM model";

proc glm data=refinal;

model sqrty= num_full_temp--revenue EPS--pe ynum yr lteps r2 d2;

run;

Page 18: A Multiple Regression Analysis on ... - Billy F Lamberti · Lamberti 3 and market activity were able to explain about 85% of the variation in the data. He then built a few models

Works Cited

Chang, Wei-Chien, and Donald E. Weiss. "An Examination of the Time Series Properties of

Beta in the Market Model." Journal of the American Statistical Association, 1991, 883.

Accessed May 25, 2015.

"Key Statistics Definitions." Yahoo Finance. Accessed May 26, 2015.

https://help.yahoo.com/kb/finance/SLN2347.html?impressions=true.

Lintner, John. "The Valuation of Risk Assets and the Selection of Risky Investments in Stock

Portfolios and Capital Budgets." The Review of Economics and Statistics 47, no. 1

(1965): 13. Accessed May 23, 2015.

Melicher, Ronald W. "Financial Factors Which Influence Beta Variations within an

Homogeneous Industry Environment." The Journal of Financial and Quantitative

Analysis, 1974, 231. Accessed May 25, 2015.

Sharpe, William F. "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of

Risk." The Journal of Finance XIX, no. 3 (1964): 425. Accessed May 23, 2015.

The Value Line Investment Survey. 38th ed. Vol. LXX. New York, New York: Arnold Bernhard

and, 2015. 504-505.