Stat 682 Final Project: 60/40? Dynamic Portfolios for the
Retirement Investor
Yeshaya Adler, Zhibai Chen, Sara El Hakim, Deyu Fu, Binlei Gong
December 5, 2014
Abstract
We develop a dynamic two-asset portfolio for retirement investors, which relies on the conventional
advice to place sixty percent of one’s assets in stocks and the remaining forty percent in bonds. We rely
on publically available macroecnomic indicators to dynamically adjust the ratio of stocks to bonds, such
that bonds are favored during periods of negative macroeconomic outlook. We rely on penalized linear
models to identify macro indicators of most concern. We then develop a simple rule aggregation based
on the number of positive and negative indicators in a given period. We use this rule to adjust the 60/40
split higher in periods of economic expansion and lower in periods of contraction. The resulting dynamic
portfolio reduces the volalility of the fixed 60/40 allocation while preserving most of the long run returns.
It also is shown to outperform the fixed allocation in periods of economic downturn.
1
Introduction
A quintessential heuristic for allocating portfolios is known as the 60/40 stock to bond ratio. It is so oftenpro↵ered that we could not originate the source of this wisdom. It is a mildly risk-averse diversificationstrategy that has resulted in respectable returns at a lower volatility than a competing strategy of holdingindex funds over the longer term ( 15 year horizons). This underscores its success as a retirement or pensioninvestment strategy.
The strategy is based on the assumption that in economic downturns holding bonds will lower yourvolatility while providing positive returns to hedge your stock losses. In the plot in figure 1 we see theVanguard’s largest mutual fund, VFINX, in red, and PIMCO’s largest Bond fund, PTTDX, in blue. Theseare the largest mutual fund and bond fund companies respectively. We have chosen them for their size,and due to the fact that they are actively managed. The retirement investor could merely hold these twofunds and expect to be somewhat diversified against specific (a.k.a. market) risk. If we invest a 60/40 splitinto these two assets, we obtain similar returns with considerably less volatility. We note, that due to thehigher volatility, the prices of the less volatile 60/40 asset mix are about the same at the trough of the 2009recession as the 100% VFINX large cap mutual fund. We are aware of the benefits of this. In times ofeconomic downturn, many people who have experienced sharp stock losses are liable to sell their shares andnot participate in the rally. By lowering the volatility, the 60/40 split helps the retirement investor hold onfor the rally.
$10,000
$30,000
$50,000
1990 1995 2000 2005 2010 2015
Tech Bubble Recession
Portfolio Composition
100% VFINX
100 % PTTDX
60/40 VFINX/PTTDX
Value of a $10,000 initial investment
Figure 1: Here we see the elbow plot of Residual Sum of Squares. We see that regressing all best subsets of sizes 1
through 12 begins to have diminishing returns after about size 8.
While the 60/40 split is very attractive as a rule of thumb, what if if we could dynamically adjust this ratiobased on macroeconomic indicators? In doing so, we would increase our exposure to the risk averse bondcategory during times of low macroecnomic outloo. During periods of optimistic macroeconomic outlookwe would buy an even greater percentage of stocks. We will backtest to identify leading macroecnomicindicators that correlate negatively with equity prices and positively with bond prices. We extracted 40macroeconomic indicators. Suppose we put 40% into stocks and 60% into bonds during 2000-2003: At theend of this recession, we would have $52689.30 instead of $36100.24. If we could dynamically allocate for alleconomic climates, the gains would be greater.
2
Choosing Macroeconomic Indicators
We downloaded a large list of 40 macroeconomic indicators. Although it is obvious that the more macroeco-nomic indicators the better you will be able to predict market behaviors using machine learning algorithms,we want to make this as simple as possible. Thus, it would be something attainable by the retirementinvestor as he or she reads the daily newspaper. Below, in table 1 in the appendix, we show a list of theforty macroeconomic indicators that we obtained from the Federal Reserve of St. Louis’ FRED website.
We decide that in order to select which of the indicators the retirement investor should watch, we willregress on the 60/40 split. The resulting regression was mostly predictive, and significant for almost all theregressors. In order to select the most significant ones, we ran best subsets selection. This forward stepwiseprocedure selects the subsets of size 1 through 12 of which the residual sum of squares in reduced by addingthat variable into the regression. For each of the resulting 12 linear regression models, we measured theresidual sum of squares of the model. This way, we could get a picture of exactly how much each regressorwas improving the model. We note that the RSS curve in figure 2 begins to level o↵ around 8 predictorsadded to the model. That is 8 predictors, plus an intercept term. In all 12 models that were run, the least’predictive’ of these indicators, the Consumer Price Index, shows up in 5 out of 12 of the models.
1e+05
2e+05
3e+05
4e+05
5e+05
1 3 5 7 9 11Number of Variables
Res
idua
l Sum
of S
quar
es
Figure 2: Here we see the elbow plot of Residual Sum of Squares. We see that regressing all best subsets of sizes 1
through 12 begins to have diminishing returns after about size 8.
Towards a more advanced selector
We tried to select regressors using penalized methods, perhaps we did not fine tune well enough, but using50 fold cross validation, we were unable to select a model that was structurally sound. In the plot in figure4 we note that the regularization of the L1 penalty are so strict for this regression, that the paths are alltending to zero. This means that the lasso does not think that any of our 40 indicators are predictive ofstock market behaviors. Perhaps this is the case, but we are more inclined to believe that our model wassomehow misspecified. As such, we do not use any of the complicated methods that relied on regularization
3
0.0
2.5
5.0
7.5
10.0
12.5
Variable
Num
ber o
f Tim
es A
ppea
red
All.Employees.Total.Private
Compensation.of.Employees.Wages
Consumer.Price.Index.for.All.Urban.Consumers.All.Items
Corporate.Profits.After.Tax
Disposable.personal.income
Effective.Federal.Funds.Rate..monthly.
Federal.Debt.Total.Public.Debt
Gross.Domestic.Product
Gross.Private.Domestic.Investment
Household.Debt.Service.Payments.as.a.Percent.of.Disposable.Personal.Income
Industrial.Production..Monthly.
Intercept
Monetary.Base.Total
Personal.Consumption.Expenditures
Spot.Oil.Price.West.Texas.Intermediate.Crude
Trimmed.Mean.PCE.Inflation.Rate
Figure 3: Here we see, for all the predictors that appeared in the best subsets forward stepwise selection, exactly how
many times they appeared. The legend is in alphabetical order from left to right. Note how the intercept appears in
all 12 models, but design.
that we attempted, such as LARS, L0, LASSO, Elastic Net, or the Ridge Regression. In short, the variablesthat we picked to determine our consensus of ’experts’ are: Compensation of Employees, Disposable personalincome, Monetary Base, Total Corporate Profits After Tax, Household Debt Service Payments as a Percentof Disposable Personal Income, West Texas Intermediate Crude Spot Prices, Industrial Production, andMonthly Consumer Price Index.
4
−6
−3
0
3
−6 −4 −2 0 2log(lambda)
value
X30.Year.Conventional.Mortgage.Rate
X30.Year.Fixed.Rate.Mortgage.Average
All.Employees.Total.Private
Average.Hourly.Earnings.of.Production.and.Nonsupervisory.Employees.Total.Private
Average.Monthly.Sales.Price.for.New.Houses.Sold.in.the.United.States
Bank.Net.Worth
Civilian.Employment
Civilian.Labor.Force
Civilian.Labor.Force.Participation.Rate
Civilian.Unemployment.Rate
Compensation.of.Employees.Wages
Consumer.Confidence.Survey..Monthly.
Consumer.Price.Index.for.All.Urban.Consumers.All.Items
Corporate.Profits.After.Tax
Disposable.personal.income
Effective.Federal.Funds.Rate..monthly.
Employment.Population.Ratio
Federal.Debt.Total.Public.Debt
Gold.Fixing.Price.in.London.Bullion.Market..U.S..Dollars
Gross.Domestic.Product
Gross.Private.Domestic.Investment
Homeownership.Rate.for.the.United.States
Household.Debt.Service.Payments.as.a.Percent.of.Disposable.Personal.Income
Households.and.Nonprofit.Liability.Level
Housing.Price.Index.USA
Industrial.Production..Monthly.
M2.Money.Stock.Weekly
Monetary.Base.Total
New.Car.Average.Finance.Rate.at.Auto.Finance.Companies
New.Housing.Units.Started
Personal.Consumption.Expenditures
Personal.income
Personal.Saving.Rate
Producer.Price.Index.All.Commodities
Real.Estate.Loans.All.Commercial.Banks..seasonal.adjusted.
Rental.Vacancy.Rate.for.the.United.States
Spot.Oil.Price.West.Texas.Intermediate.Crude
Trade.Weighted.U.S..Dollar.Index.Major.Currencies
Trimmed.Mean.PCE.Inflation.Rate
Unemployed
Figure 4: Note how the lasso path indicates that all coe�cients in the model go to zero ultimately. This suggests
that we might have selected stocks that are not indicative of stock and bond performance. It also might suggest that
we have overly tuned our lasso model.
5
An ‘expert’ system
At this point, we wanted to make a system that resembles what a retirement investor might think of thefinancial news that they encounter on a daily basis. Thus, we took the financial indicators, and binarizedthem such that if the indicator was dropping, then we gave the score a -1, and if it was rising a +1, but if itwas steady, we gave it a 0. We were thus able to compute a score for the 8 predictors ranging from -8 to + 8for each trading day. We could then compute the total score between quarters at which point the retirementinvestor chooses how to reallocate his investment. A high score, and the retirement investor shifts more than60% of his money to stocks. A low score and the retirement investor shifts his money to bonds.
In order to determine which of our 8 predictors were indicative of a stock of bond move in the economy,we ran some correlation analysis. In order to determine which indicator should be correlated with eitherstocks or bonds, we took the di↵erence in correlations between stocks and bonds. A plot of these results isfound in figure 5.
−0.1
0.0
0.1
0.2
Economic Indicators
Diff
eren
ce in
Cor
rela
tion
Compensation.of.Employees.Wages
Consumer.Price.Index.for.All.Urban.Consumers.All.Items
Corporate.Profits.After.Tax
Disposable.personal.income
Household.Debt.Service.Payments.as.a.Percent.of.Disposable.Personal.Income
Industrial.Production..Monthly.
Monetary.Base.Total
Spot.Oil.Price.West.Texas.Intermediate.Crude
Figure 5: The plot shows which variables are more positively correlated with stocks than bonds. Negative correlation
indicates a greater a�nity towards bonds.
From the plot in figure 5, we are not able to see the raw correlations between either stocks and bonds and
6
the 8 variables. These numbers were all over the map, with correlation between VFINX and our indicatorsranging from -0.07524752 to 0.92609427, and with correlations between PTTDX and our indicators rangingfrom -0.2526991 to 0.9511515. This might indicate that indicators are overall more highly correlated withstocks. Rather than arriving at this conclusion, we arrive at the opposite conclusion. Of our 8 inficators5 of them are more highly correlated with bond performance. However, the indicators with the highestmagnitude of di↵erence in correlation are Household Debt service payments as a percent of disposableincome, and Industrial Production. Those with the highest magnitude for bonds were the total monetarybase and the spot price of WTI crude.
Calculating the ‘score’
We convert the daily data to monthly data by keeping only the last observation of each month. We used thisto calculate the di↵erence in correlations which inform us if an indicator ‘prefers’ stocks or ‘prefers’ bonds.Next, we calculate the monthly change in each of the 8 variables. For the variables that “prefer” stock: If thevariable increases, we set the change to 1. If the variable decreases, we set the change to -1. If the variableremains the same, we set the change to 0. For the variables that “prefer” bond: we simply multiply thescore by -1, thus indicating a preference for bonds. Then we sum the 8 monthly ternary inicators to get the“score” between -8 to 8. A higher score indicates the need to increase the allocation to stocks. A plot of thescores for each month is given in figure 7. Note the variability as well as the tendency to favor bonds. Thisis an artifact of the indicators that we chose favoring bonds 5/8 times. We could rebalance these so thatthe score scales to zero, or we could select another indicators which favors stocks, thus making the score bydesign more balanced. Either one is a topic of future study. We also note that the range never reaches -7 or+7 even once. This might indicate that we chose indicators which do not have the appropriate volatility tomatch the stock and bond market behaviour, resulting in ‘safe’ choices.
−6
−3
0
3
6
Date
Score
Figure 6: The plot shows the cummulative score over time, ranging from an -8 to an 8. We notice that the average
score is about -1.5. This will innately favor a portfolio that is less has more than 40% bonds on average.
7
A Dynamic Portfolio
After we get the monthly data of the “score” based on the 8 selected variables as well as return on stockand return on bond, the total return of our dynamic portfolio and the benchmark 60/40 portfolio can becalculate. Our dynamic portfolio is updated once in a quarter at the end of March, June, September, andDecember. It is worth to notice the timing. We use the “score” on February, May, August, and Novemberto decide the new portfolio in the following month. Then, this portfolio will be hold for three months beforewe renew it again. For example, in our monthly data, we use February’s “score” to update the portfolio onMarch and receive three months return on June before we renew the portfolio based on the “score” on May.Finally, we accumulate the quarter returns of the dynamic portfolio to calculate the total return.
The dynamic portfolio is the function of the “score”:
P1 =
✓S1, B1
◆=
✓60 + ↵X2, 40� ↵ ⇤X2
◆
Pi =
✓Si, Bi
◆=
✓Si�1 + ↵X3,i�1, Si�1 � ↵X3,i�1
◆8i � 2
where Pi is, the portfolio of the ith quarter, Si is the percentage of the stock share and Bi is the percentageof the bond share in the ith quarter. Xi is the “score” in the ith month, and ↵ is a constant scaling factorwhich scales the e↵ect of the “score”. Higher values for alpha indicate larger shifts in the dynamic scale.Future work might entail making this scaling factor asymmetric because it is known that downside realizedvolatility is more harmful to stock returns than upside volatility.
20000
40000
60000
1990 1995 2000 2005 2010 2015
Portfolio Composition
60/40 VFINX/PTTDX FIXED
Dynamically Allocated
Value of a $10,000 initial investment
Figure 7: A comparison of a portfolio which sticks to the retirement benchmark, and one that dynamically allocated
based on 8 macroeconomic indicators. Note the much lower volatility of the dynamic portfolio with almost the same
returns.
8
Remarks and Conclusions
We note that the dynamically allocated benchmark shown in figure 7 in red has significantly lower volatility.The standard deviation was 0.08699514% lower while preserving almost all of the returns of the benchmark.We are certain that our dynamic selection favors bonds on average, thus making our dynamic selection lessvolatile. However, we note during periods of strong economic downturn, our dynamic portfolio stronglyoutperforms the 60/40 split. This occurred roughly between 2009 and 2013. This is something that warrantsfurther investigation. It is reasonable for a retirement investor to keep track of whether 8 indicators havegone down or gone up more times in the past quarter and to adjust their holdings on that basis. As such,while it is a little early to recommend our strategy to retirement investors, we are optimistic that this strategycan be fine-tuned.
Future Work
Specifically, we would like to take more advantage of downside volatility by moving to bonds more. Wewould like to shift the score so that the average is closer to zero, either by changing one of our indicators, orby scaling and standardizing the score so that it has an average of 0 and so that the standard deviations ofthe di↵erent scores for each indicator are the same. Additionally, we would like to adjust the cummulativescore to account for all possible accumulated changed within a quarter, not just those that occur by theend of the month. We would also like to adapt this so that we use a weighted score which takes advantageof momentum in the macroeconomic indicators. A reason for accounting for this momentum is the greatervariability in scores. Perhaps if a score changes at the end of the month, it should have a greater weight inthe cummulative factor, and a greater impact on the accumulated benchmark.
9
Macroeconomic Variables
1 30-Year Conventional Mortgage Rate2 30-Year Fixed Rate Mortgage Average3 All Employees Total Private4 Average Hourly Earnings of Production and Nonsupervisory Employees Total Private5 Average Monthly Sales Price for New Houses Sold in the United States6 Bank Net Worth7 Civilian Employment8 Civilian Labor Force Participation Rate9 Civilian Labor Force
10 Civilian Unemployment Rate11 Compensation of Employees Wages12 Consumer Confidence Survey (Monthly)13 Consumer Price Index for All Urban Consumers All Items14 Corporate Profits After Tax15 Disposable personal income16 E↵ective Federal Funds Rate (monthly)17 Employment-Population Ratio18 Federal Debt Total Public Debt19 Gold Fixing Price in London Bullion Market, U.S. Dollars20 Gross Domestic Product21 Gross Private Domestic Investment22 Homeownership Rate for the United States23 Household Debt Service Payments as a Percent of Disposable Personal Income24 Households and Nonprofit Liability Level25 Housing Price Index USA26 Industrial Production (Monthly)27 M2 Money Stock Weekly28 Monetary Base Total29 New Car Average Finance Rate at Auto Finance Companies30 New Housing Units Started31 Personal Consumption Expenditures32 Personal income33 Personal Saving Rate34 Producer Price Index All Commodities35 Real Estate Loans All Commercial Banks (seasonal adjusted)36 Rental Vacancy Rate for the United States37 Spot Oil Price West Texas Intermediate Crude38 Trade Weighted U.S. Dollar Index Major Currencies39 Trimmed Mean PCE Inflation Rate40 Unemployed
10
Appendix:Code
##################################################
######### LOAD LIBRARIES #########
##################################################
pkg = c("quantmod", "lubridate", "xts", "timeSeries",
"ggplot2", "stringr", "reshape2","plyr", "dplyr",
"glmnet", "gplots", "MASS","leaps","lars","xts")
pkg.list = pkg %in% rownames(installed.packages())
if (sum(!pkg.list) !=0)
install.packages(pkg[!pkg.list])
sapply(pkg, require, character.only =T)
##################################################
########## PARSE ECONOMIC INDICATORS ##########
##################################################
# setwd("./prospectus/")
filenames = list.files("./Data_redo")
datasets = vector("list", length(filenames))
for (i in 1:length(filenames)){
datasets[[i]] = read.csv(paste(
"./Data_redo", sep = "/",
filenames[i]))
}
filenames <- sapply(str_split(filenames, ".csv"), "[[", 1)
names(datasets) <- filenames
#append the variable name to each list
datasets = lapply(seq_along(datasets), function(x)
cbind(datasets[[x]], predictor = filenames[x]))
merged.data.frame = Reduce(function(...) merge(..., all=T), datasets)
merged.data.frame = plyr::rename(merged.data.frame,
replace = c('DATE' = 'date','VALUE' = "value"))
casted = dcast(merged.data.frame, date ~ predictor)
casted[,-1] = apply(casted[,-1],2, as.numeric)
final = na.omit(na.locf(casted))
11
final[,1] <- ymd(final[,1])
# assetsLiab = read.csv('~/Dropbox/STAT_682_FF2014/prospectus/Data/Assets and Liabilities of Commercial Banks in the United States.csv')# assetsLiab <- assetsLiab[-c(1:4),c(1,25:26)]
# names(assetsLiab) <- head(assetsLiab,1)
# assetsLiab <- assetsLiab[-1,]
# assetsLiab[,2:3] <- apply(assetsLiab[,2:3], 2, as.numeric)
# assetsLiab[,1] <- ymd(assetsLiab[,1])
# names(assetsLiab) <- c('date', 'assets', 'liab')# netAssets = assetsLiab[,2] - assetsLiab[,3]
# bankNetAssets = cbind(assetsLiab[,1], netAssets)
# read in mututal fund data
invisible(getSymbols("VFINX", src="yahoo",
from = "1980-01-01"))
invisible(getSymbols("PTTDX", src="yahoo",
from = "1980-01-01"))
dat <- as.data.frame(VFINX)
dat <- data.frame(date = ymd(row.names(dat)),
vfinx = dat[,6])
dat2 <- as.data.frame(PTTDX)
dat2 <- data.frame(date = ymd(row.names(dat2)),
pttdx = dat2[,6])
tot = join(dat, dat2, by = "date", type = "full")
tot = na.omit(tot)
final = join(tot, final, by = "date", type = "left", match = 'first')# write.csv(final, "final.shaya.csv", row.names=F)
##################################################
############# Variable Selection ##############
##################################################
#Assume previous code has been run, load final
final <- read.csv('final.shaya.csv')# we using the weight 60/40 as benchmark
# Method_1 Create y = w*vf + (1-w)*ptt
# w = weights
w = 0.60
Y = w*final$vfinx + (1-w)*final$pttdx
X = final[,4:ncol(final)]
12
sum(is.na(X))
# centerized X and Y
Y = as.numeric(Y)
Y = Y - mean(Y)
X = scale(X,center=T,scale=F)
r1 <- regsubsets(Y ~ X,
data = final,
nvmax = 12,
nbest = 1)
# plot(summary(r1)$rss,
# xlab = "Number of Variables",
# ylab = "RSS", type = "l")
rssdf = data.frame(varnum = 1:12,
rss = summary(r1)$rss)
ggplot(data = rssdf, aes(x = varnum, y = rss)) +
geom_line() +
scale_x_discrete(breaks = seq(1,14, by =2)) +
labs(x = 'Number of Variables',y = "Residual Sum of Squares") +
geom_vline(xintercept = 8, color = 'red',linetype = "longdash")
# best subsets might be 8??
apply(summary(r1)$which,2,as.numeric)
par(mar = c(30,1,1,1))
plot(r1, scale = 'bic')
df= data.frame(variable = c("Intercept", colnames(X)),
num = unname(colSums(apply(summary(r1)$which,2,as.numeric))))
df[df[,2] == 0,] <-NA
str(df)
qplot(df$variable, df$num, geom = 'bar',stat = 'identity',fill = df$variable) +
theme(axis.text.x = element_blank(),
legend.title = element_blank(),
axis.ticks.x = element_blank()) +
labs(x = "Variable",
y = "Number of Times Appeared") +
geom_hline(yintercept = 5,
color = 'red', linetype = 'dashed')
# setwd("U:/prospectus")
# ### get rid of duplicated dates by chosing the first one
# df = read.csv("final.csv")
13
# date = unique(df$date)
# indx = tapply(1:nrow(df), df$date, function(x) x,simplify = F)
# #indx
# indx1 = sapply(indx, function(x) x[1])
# df2 = df[indx1,]
#
# final = df2
# missing data
final = na.omit(final)
Y = w*final$vfinx + (1-w)*final$pttdx
X = final[,4:ncol(final)]
sum(is.na(X))
# centerized X and Y
Y = as.numeric(Y)
Y = Y - mean(Y)
X = scale(X,center=T,scale=F)
# ################################
# ##Lasso Regression
# ################################
#
#
# fitl = glmnet(x=X,y=Y,family="gaussian",alpha=1)
# plot(fitl,col=1:dim(X)[2], main = "Lasso with 47 variables")
# legend(x=0, y = -3,legend=colnames(X),col=1:dim(X)[2],
# lty=rep(1,dim(X)[2]),cex=.75)
#
#
# ### select 6 most significant variables
# fitl$df
# index = which(fitl$df==6)[1];index
# # lambda
# lambda = fitl$lambda[index];lambda
# betal = fitl$beta
# vars = which(betal[,index] != 0)
# main.factors = unrowname(data.frame(Factor_Lasso =
# colnames(X)[vars],
# Orders_Lasso = vars))
# betal_6 = betal[vars,]
# # Penalty of betas
# X_l1 <- apply(betal, 2, function(x) sum(abs(x)))
# plot(range(X_l1), range(betal_6), type = "n",
# ylab="Beta Values",xlab="L1 Norm",main = "Lasso Paths")
14
# for(i in 1:6){
# lines(X_l1, betal_6[i,], col = i,lwd=1.7)
# }
# legend(x=-1, y = -2,legend=names(vars),col=1:6,lty=rep(1,6), cex = 0.7)
# ############# LASSO USING LARS
#
# fit <- lars(x = X, y = Y, type = "lar")
# beta <- scale(coef(fit),FALSE,1/fit$normx)
# arclength <- rowSums(abs(beta))
#
# path <- data.frame(melt(beta),arclength)
# names(path)[1:3] <- c("step","variable","standardized.coef")
#
# #how many variables to show
# numvars = 8
# vars = names(sort(abs(beta[nrow(beta),]))[(ncol(X)-(numvars-1)):ncol(X)])
#
#
# p <- ggplot(path[path$variable %in% vars,],
# aes(step,standardized.coef,colour=variable))+
# geom_line(aes(group=variable))+
# ggtitle("LASSO path calculated using Least Angle Regression")+
# xlim(0,77) +ylim(-2000, 2000)
# p
##############################################
####### compare ridge and lasso
##############################################
reg.l2 <- glmnet(x = X, y= Y, alpha=0)
reg.l1 <- glmnet(x = X,y = Y , alpha=1)
models.l2 <- data.frame(t(rbind(matrix(reg.l2$lambda, nrow=1),
as.matrix(reg.l2$beta))))
# models.l2 <- models.l2[,colnames(models.l2)%in%
# na.omit(df[order(-df$num),])[2:9,]$variable]
colnames(models.l2)[1] <- "lambda"
models.l2 <- melt(models.l2, c("lambda"))
coef_l2 <- ggplot(models.l2) + aes(x=log(lambda), y=value,
color=variable) +
geom_line()
#lasso
models.l1 <- data.frame(t(rbind(matrix(reg.l1$lambda, nrow=1),
as.matrix(reg.l1$beta))))
15
colnames(models.l1)[1] <- "lambda"
models.l1 <- melt(models.l1, c("lambda"))
coef_l1 <- ggplot(models.l1) + aes(x=log(lambda),
y=value, color=variable) +
geom_line()
cv<-cv.glmnet(X,Y,family="gaussian",alpha = .5, nfolds=50,standardize=FALSE)
coef(cv, s=cv$lambda.1se)
# ## Elestic net Regression with weight = 0.5
# fitel = glmnet(x=X,y=Y,family="gaussian",alpha=0.9)
# plot(fitel,col=1:dim(X)[2],main = "ElasticNet (alpha=0.5) with 47 variables")
# legend(x=0, y = -3,legend=colnames(X),col=1:dim(X)[2],lty=rep(1,dim(X)[2]),cex=.75)
#
# fitel$df
# # choose the index giving the first 10
# index2 = which(fitel$df == 6)[1]
# beta2 = fitel$beta
# vars2 = which(beta2[,index2] != 0);vars2
# main.factors.el = data.frame(Factor_Elanet = colnames(X)[vars2], Orders_Elanet = vars2)
# main.factors.el
#
#
# betael = beta2[vars2,]
# ## sum of absolute of betas
# X_l1_2 = apply(betael, 2, function(x) sum(abs(x)))
# plot(range(X_l1_2), range(betael), type = "n",
# ylab = "Beta Values", xlab = "L1 Norm",
# main = "Elastic Paths 10 Variables")
# for (i in 1:6){
# lines(X_l1_2, betael[i,], col = i,lwd=1.7)
# }
# legend(x=0, y=0.45,legend=names(vars2),col=1:11,lty=rep(1,6), cex = .8)
#
#
# ### compare the two different
# mf.compare = cbind(main.factors, main.factors.el)
# mf.compare
#
#
# ### heat map
# my_palette <- colorRampPalette(c("green", "white", "red"))(n = 1000)
# heatmap.2(as.matrix(X), col=my_palette, scale="row", key=T, keysize=1.5,
# density.info="none", trace="none",cexCol=0.9, labRow=NA)
#
#
16
# ### lm testing
# sub.idx = main.factors.el[,2];sub.idx
# fit.ele = lm(Y~ X[,sub.idx])
# summary(fit.ele)
#############################
#### choose the 8 vars best subsets regression
#############################
names.slct = na.omit(df[order(-df$num),])[2:9,]$variable
# extract the index
index.slct = sapply(c(1:length(names.slct)), function(x)
which(colnames(final)== names.slct[x] ))
index.slct
index.slct = c(1:3, index.slct )
# construct a new dataframe with 6 selected vars with date, VG and Ptt indexes
final.slct = final[, index.slct]
#write.csv(final.slct, "final.8vars.csv", row.names=F)
#####################
#### +1 = risk
# adjust the direction
# from collinearity
#####################
df.8vars = read.csv("final.8vars.csv",header=T)
# decide the signs for correlation between VG~ X, and PTT ~~X
# because Vg means risk ; PTT means nonrisk
# +1 means risk so if corVG ~X > corPTT~x then it should be +1
comp = data.frame( Vars = colnames(df.8vars[,-c(1:3)]),
index = index.slct[-c(1:3)],
compare = ceiling( cor(df.8vars[,-c(1:3)],
df.8vars$vfinx)- cor(df.8vars[,-c(1:3)],
df.8vars$pttdx) ) )
View(comp)
# compsapply(c(1:dim(comp)[1]),
# function(x) if(comp$compare[x]>0){comp$compare[x] = 1}
# else if(comp$compare<0){comp$compare[x] = -1}else{comp$compare[x]=0} )
for (i in 1:8){
if (comp$compare[i] == 0){
comp$compare[i] = -1
}
}
##################################################
############# BINARIZE ##############
17
##################################################
final = df.8vars
# Then we binarize them
names = colnames(final[,-c(2:3)])
X.bi = as.data.frame(sapply(c(3:dim(final)[2]), function(x) final[,x][-1] = diff(final[,x])) )
X.bi[,1] = final$date[-1]
## binerized X
X.bi.fact = as.data.frame
(
sapply(c(2:dim(X.bi)[2]),
function(x) X.bi[,x] = sapply(c(1:dim(X.bi)[1]),
function(n) {
if(X.bi[,x][n]>0){X.bi[,x][n] = 1}
else if(X.bi[,x][n]<0){X.bi[,x][n]=-1}
else{X.bi[,x][n]=0}
}
)
)
)
View(X.bi.fact)
###########
### unifiy the same direction of +1 and -1
###########
names[-1]
names.slct
X.bi.fact[,dim(X.bi.fact)[2]+1] = final[-1,1]
X.bi.fact.adj = X.bi.fact
# X.bi.fact.adj[] = sapply(c(1:dim(comp)[1])),
# function(x) X.bi.fact[,x] * comp$compare[x]
#
#
for(i in c(1:dim(comp)[1])){
X.bi.fact.adj[,i] = X.bi.fact.adj[,i]* comp$compare[i]
}
# final[,2] = VG; the other is PTT
# we need to append return rate VG and PTT
index.values = cbind(diff(df.8vars[,2])/df.8vars[-1,2], diff(df.8vars[,3])/df.8vars[-1,3])
X.bi.fact.adj = cbind(X.bi.fact.adj,index.values)
View(X.bi.fact.adj)
## try to add score
score = sapply( c(1: dim(X.bi.fact.adj)[1]), function(x) sum(X.bi.fact.adj[x,1:6]))
X.bi.fact.adj[, dim(X.bi.fact.adj)[2]+1] = score
#write.csv(final.slct, "final.score.csv", row.names=F)
18
######################
#### monthlyrize data
######################
# df = X.bi.fact.adj
# #library(xts)
# time = strptime(df[,1], "%Y-%m-%d")
# df2 = xts(df[,-1], order.by=time)
# # pick the average data
# df3 = apply.monthly(df2,mean)
# View(df3)
#
# # main.factors.el[2]
# index = main.factors.el[,2]
#
# X.select = X.bi.fact[,index]
# score.select = sapply( c(1:dim(X.bi.fact)[1]), function(x) sum(X.select[x,]))
#
#
#
# X.bi.fact = cbind(final$date[-1], X.bi.fact)
# colnames(X.bi.fact) = c(names, "Total Score", "Selected Score")
# write.csv(X.bi.fact, file = "X.binerized.csv")
#
# X.bi.fact = sapply(c(2:dim(X.bi)[2]),
# function(x) X.bi[,x] if(X.bi[,x]>0){X.bi[,x] = 1} else if(X.bi[,x][n]<0){X.bi[,x]=-1}else{X.bi[,x]=0} )
#
# temp1 =c()
# for(i in c(2:dim(X.bi)[2])){
# temp1 = X.bi[,i]
# for(j in c(1:dim(X.bi)[1]) ){
# if(X.bi[,x]>0){X.bi[,x] = 1
# } if(X.bi[,x][n]<0){X.bi[,x]=-1}else{X.bi[,x]=0}
# }
# }
ret = tot
ret[,2:3] = apply(ret[,2:3], 2, returns)
# ggplot(tot) +
# geom_line(aes(date, vfinx)) +
# geom_line(aes(date, pttdx))
19
cum = cbind(date = ret[-1,1], cumprod(1+ret[-1,2:3])-1)
cum[,2:3] = apply(cum[,2:3], 2, "*", 10000)
g <- ggplot(cum) +
geom_rect(aes(xmin = ymd("2000-1-1"),
xmax = ymd("2003-1-1"),
ymin = -Inf,
ymax = Inf,
fill = "Tech Bubble Recession")) +
geom_line(aes(date, vfinx, color = 'black')) +
geom_line(aes(date, pttdx, color = 'blue')) +
geom_line(aes(date, .6*vfinx + .4*pttdx, color = "red")) +
scale_color_manual(name = "Portfolio Composition",
values = c("black" = "black",
"blue" = "blue",
"red" = "red"),
labels = c("100% VFINX",
"100 % PTTDX",
"60/40 VFINX/PTTDX")) +
scale_y_continuous(breaks= seq(10000, 80000, by = 20000),
labels = paste0("$", str_trim(format(
seq(10000, 80000, 20000),
big.mark= ",",
scientific=F)))) +
scale_fill_manual("", breaks = "Tech Bubble Recession",
values = "grey60") +
labs(x = "", y = "",
title = 'Value of a $10,000 initial investment')
g
# try to rebalance
recess <- cum[which(year(cum$date) <2004 & year(cum$date)>1999),]
recessStart = recess[1,]
# range(recess$date)
currVal <- sum(recessStart[,2:3])
cum2 = cbind(date = tot[tot$date%in%recess$date,][,1],
cumprod(1 +
tot[tot$date%in%recess$date,][,2:3])-1)
recessVal <- cum2[,2]*(.4*currVal) + cum2[,3]*(.6*currVal)
cumRecess <- recessVal + currVal
20
sellDate <- tail(recess,1)
after <- tot[tot$date > sellDate$date ,]
cum3 <- cbind(date = after$date,
cumprod(1 + after[,2:3])-1)
afterVal <- cum3[,2]*(.6*tail(cumRecess,1)) +
cum3[,3]*(.4*tail(cumRecess,1))
c(.6 * cum[cum$date < head(recess,1)$date,][,2] +
.4* cum[cum$date < head(recess,1)$date,][,3],
cumRecess, afterVal)
# setwd("C:/Users/E218373/Desktop/682")
#use the dataset under "code.shaya" file on dropbox
df = read.csv("final.8vars.csv")
#daily data to monthly data
df$month=month(df$date)
df$year=year(df$date)
n=length(df$date)
diff=month(df$date)[1:(n-1)]-month(df$date)[2:n]
df=df[1:(n-1),]
df$diff=diff
df1=df[which(df$diff!=0),]
#df1 is the data that includes only the last business date of very month.
#correlation
m=length(df1$date)
cor(df1$vfinx,df1[,4])-cor(df1$pttdx,df1[,4]) #prefer stock
cor(df1$vfinx,df1[,5])-cor(df1$pttdx,df1[,5]) #prefer bond
cor(df1$vfinx,df1[,6])-cor(df1$pttdx,df1[,6]) #prefer bond
cor(df1$vfinx,df1[,7])-cor(df1$pttdx,df1[,7]) #prefer bond
cor(df1$vfinx,df1[,8])-cor(df1$pttdx,df1[,8]) #prefer stock
cor(df1$vfinx,df1[,9])-cor(df1$pttdx,df1[,9]) #prefer bond
cor(df1$vfinx,df1[,10])-cor(df1$pttdx,df1[,10]) #prefer stock
cor(df1$vfinx,df1[,11])-cor(df1$pttdx,df1[,11]) #prefer bond
cordat = as.data.frame(t(cor(df1$vfinx, df1[,4:11]) -
cor(df1$pttdx, df1[,4:11])))
ggplot(cordat) +
geom_bar(aes(x = rownames(cordat),
y = cordat[,1],
21
fill = rownames(cordat)),
stat = 'identity') +
theme(legend.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
xlab('Economic Indicators') +
ylab('Difference in Correlation')
qplot(x = df1$date,y = df1$score) +
stat_smooth(aes(group = 1)) +
theme(legend.title = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
xlab('Date') +
ylab('Score')
v1=df1[2:m,4]-df1[1:(m-1),4]
v2=df1[2:m,5]-df1[1:(m-1),5]
v3=df1[2:m,6]-df1[1:(m-1),6]
v4=df1[2:m,7]-df1[1:(m-1),7]
v5=df1[2:m,8]-df1[1:(m-1),8]
v6=df1[2:m,9]-df1[1:(m-1),9]
v7=df1[2:m,10]-df1[1:(m-1),10]
v8=df1[2:m,11]-df1[1:(m-1),11]
df1=df1[2:m,]
df1=cbind(df1,v1,v2,v3,v4,v5,v6,v7,v8)
d1=ifelse(df1[,15:22] > 0,1,0)
d2=ifelse(df1[,15:22] < 0,-1,0)
df1[,15:22]=d1+d2
df1[,c(16,17,18,20,22)]=df1[,c(16,17,18,20,22)]*-1 #all prefer stock
df1$score=rowSums (df1[,15:22], na.rm = FALSE, dims = 1) #Score
#monthly to quarterly data
score=df1$score[which(df1$month==2|df1$month==5|df1$month==8|df1$month==11)]
stock=df1$vfinx[which(df1$month==3|df1$month==6|df1$month==9|df1$month==12)]
rs=stock[2:length(stock)]/stock[1:(length(stock)-1)]
bond=df1$pttdx[which(df1$month==3|df1$month==6|df1$month==9|df1$month==12)]
rb=bond[2:length(bond)]/bond[1:(length(bond)-1)]
final=df1$date[which(df1$month==3|df1$month==6|df1$month==9|df1$month==12)]
final=final[-c(1,2)]
clean=data.frame(final,score[1:(length(score)-2)],
rs[2:length(rs)],rb[2:length(rb)])
#first observation, use 1987-08 score to change the
22
#portfolio at 1987-09-31 and get the first reutrn at 1987-12-31
names(clean)=c("date","score","stockreturn","bondreturn")
#can change c to scale the effect of the score, higher
# c means we change more on the portfolio.
c=0.005
clean$benchmark=clean$stockreturn*0.6+clean$bondreturn*0.4
nn=length(clean$benchmark)
dynamic=matrix(0,nn,1)
dynamic[1]=0.6+c*clean$score[1]
for (i in 2:nn)
{
dynamic[i]=dynamic[i-1]+c*clean$score[i]
}
dynamic
clean$portfolio=clean$stockreturn*dynamic+clean$bondreturn*(1-dynamic)
accumulate_benchmark=matrix(0,nn,1)
accumulate_benchmark[1]=clean$benchmark[1]
accumulate_dynamic=matrix(0,nn,1)
accumulate_dynamic[1]=clean$portfolio[1]
for (i in 2:nn)
{
accumulate_benchmark[i]=accumulate_benchmark[i-1]*clean$benchmark[i]
accumulate_dynamic[i]=accumulate_dynamic[i-1]*clean$portfolio[i]
}
clean$accumulate_benchmark=accumulate_benchmark
clean$accumulate_dynamic=accumulate_dynamic
ggplot(clean) +
geom_path(aes(x = ymd(date),
y = 10000* accumulate_benchmark,
color = 'blue')) +
geom_path(aes(x = ymd(date),
y = 10000*accumulate_dynamic,
color = 'red'))+scale_color_manual(name = "Portfolio Composition",
values = c("blue" = "blue",
"red" = "red"),
labels = c("60/40 VFINX/PTTDX FIXED",
"Dynamically Allocated")) +
labs(x = "", y = "",
title = 'Value of a $10,000 initial investment')plot(clean$accumulate_benchmark)
lines(accumulate_dynamic, type="o", pch=22, lty=2, col="red")
23
24
Top Related