BACHELOR THESIS IN ECONOMICSecon1.altervista.org/econ/edu/cup/reports/2013/cluster.pdf · Sheila...

BACHELOR THESIS IN ECONOMICS

Application of Hierarchical Structure in Stock Classification andPortfolio Construction

by

Sheila Farrahi and Amirhossein Heydarizadeh

Kandidatarbete i nationalekonomi

Supervisor:Johan Lindén

Division Of Business, Society and EngineeringMÄLARDALEN UNIVERSITY

SE-721 23 VÄSTERÅS, SWEDEN

Bachelor Thesis in Economics (NAA303)

Date:2013,05,29

Project name:Application of Hierarchical Structure in Stock classification and Portfolio construction

Author(s):Sheila Farrahi and Amirhossein Heydarizadeh

Supervisor:Johan Lindén

Comprising:15 ECTS credits

Abstract

The aim of this report is to figure out if constructing a portfolio using Cluster Analysis anddaily return of stocks is possible or not and how efficient this portfolio could be. We used Rprogramming language constructed a portfolio with implementing one of Cluster Analysis’methods on daily return of stocks traded in three different financial markets of USA andSweden.

We calculated the correlation matrix between stocks in each of our three financial marketsand afterwards converted these matrices to distance matrix. Then we implemented Hierarch-ical Cluster Analysis on the distance matrices to classify the stocks in groups. Finally weconstructed our portfolios based on the data extracted from the analysis which explained thestocks behaviour with each other in the group that they belong to and other groups.Moreoverwe compared the constructed portfolio with the simple equal-weight and optimal portfolio tosee how efficient our portfolio is.

Contents

1 Introduction 11.1 Classification and Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Detecting Clusters Graphically . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Methods of Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Analysis 82.1 Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 OMXS30 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.3 DJIA Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.4 S&P500 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1 Markowitz Portfolio Theory . . . . . . . . . . . . . . . . . . . . . . 172.2.2 Comparision with Equal-Weight Portfolio . . . . . . . . . . . . . . . 182.2.3 Comparison with Optimal Portfolio . . . . . . . . . . . . . . . . . . 22

3 Summary and Some Future Studies 24

A The source code 27

List of Figures

1.1 An example of exploratory data analysis . . . . . . . . . . . . . . . . . . . . 21.2 A simple example of dendrogram tree . . . . . . . . . . . . . . . . . . . . . 31.3 Distance between two clusters . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Dednrogram of five objects clustered together . . . . . . . . . . . . . . . . . 6

2.1 OMXS30 Dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 DJIA dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 S&P500 Dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 weights of stocks of OMXS30 after introducing constraints . . . . . . . . . . 202.5 Weights of stocks of DJIA after introducing constraints . . . . . . . . . . . . 212.6 Weights of stocks of OMXS30 in Optimal Portfolio . . . . . . . . . . . . . . 222.7 Weights of stocks of DJIA in Optimal Portfolio . . . . . . . . . . . . . . . . 23

List of Tables

2.1 List of Stocks in OMXS30 and Their Sectors . . . . . . . . . . . . . . . . . 112.2 List of Stocks in DJIA and Their Sectors . . . . . . . . . . . . . . . . . . . . 14

Acknowledgment

First of all we would like to thank our supervisor Johan Linden,Sinor Lecturer, for hiscontinuous support and patient in every part of writing this report.

We also would like to thank all our teachers at Mälardalen University for giving us thesufficient knowledge throughout our study.

Also a great thanks to our families and friends for their support and encouragement duringour study in Sweden. We could not have finished our study abroad without their support.

Chapter 1

Introduction

1.1 Classification and ClusteringAn intelligent being cannot treat every object as a unique unlike anything else in the universe.It has to put objects in categories so that it may apply its hard-won knowledge about similarobjects encountered in the past, to the object at hand.1

Classification or grouping of similar objects is the fundamental to most branches of sci-ence. Classification is a suitable method to classify a large amount of data so it can be under-stand easily and the information would be obtained more efficient by summarizing the largedata into smaller number of groups.

Cluster analysis or clustering is in fact the action of setting objects in groups so that allthe objects in the same group or cluster have more similarities with each other than with otherobjects in other groups. Cluster analysis is determining clusters in data, this method is con-cerned with investigating data to figure out whether or not the data can be summarized in smallnumber of groups of objects which share similarities that differs them from other individualsin other groups.

In this report we want to apply Cluster Analysis to group financial data and see how wellthis methods works in this area.

1.2 Detecting Clusters GraphicallyGraphical view of data is a very important aspect in any kind of data analysis. It can give theanalyst an overview of the data and their structure. Since cluster analysis is a part of explorat-ory data analysis, graphical view of data is one of the main part of it.For example, a scatter plot is a type of mathematical diagram that can show various correl-

1How the Mind Works 1997, Steven Pinker

1

ations between data which is displayed as set of points. Consider the scatter plot shown infigure 1.1. in which it is obvious that there are two clusters of points.

Figure 1.1: An example of exploratory data analysis

1.3 Methods of ClusteringHierarchical Cluster Analysis (HCA)

There are two main methods of clustering,

• Hierarchical cluster analysis, which is divided into two sub groups:

– Agglomerative HCA

– Divisive HCA

• Non-hierarchical cluster analysis

In this report we used only the Agglomerative hierarchical cluster analysis because of itsspecial characteristics listed below:

1. Sectioning a tree at the particular distance produces a partition into n number of disjointgroups

2. If two groups are chosen from different partitions they are either disjoint or one groupcontains the other one

3. A numerical value is associated with each partition of the tree where branches join to-gether; this value is a measure of distance or dissimilarity between two merged clusters.

4. Different distance measures give rise to different hierarchical

2

In HCA observation vectors are grouped together on the basis of their mutual distances andthe result is usually visualized through a hierarchical tree called dendrogram tree. This treeis mostly used to show the classification of clusters produced by hierarchical clustering. infigure 1.2 you can see an example of dendrogram tree.

Figure 1.2: A simple example of dendrogram tree

In figure 1.2 the x-axis represents the objects and the y-axis is the threshold distance. Herewe have five objects, as it is clear from the graph, object 1 and 3 are grouped together with thelowest distance and then object 4 and object 5 are grouped at a higher distance. Afterwardsobject 1, 3, 4 and 5 are grouped together. In the end object 2 joined this cluster at a highestdistance.Before we start our analysis we shortly present the steps in agglomerative HCA and a simpleexample of it to make it more understandable.

Steps of Agglomerative HCA

The algorithm of agglomerative Hierarchical Cluster Analysis:

1. Begin with n clusters, each containing single object

2. Construct a symmetric distance matrix that its main diagonal is equal zero. (since thedistance of each object with itself is zero). i and j are objects’ number in the followingmatrix.

D = d(i j) =

0 · · · d1n... . . . ...

dn1 · · · 0

3

3. Merge two similar objects to form new clusters

4. Update the distance matrix.

5. Repeat step 3 and 4 for n−1 times and record the clusters that are merged and the leveland the distance that they have been merged, since we use this information to build thedendrogram tree.

Measuring the Distance between Two Clusters and a Simple Example

There are different ways to measure the distance between two groups such as single linkage,complete linkage, average linkage, centroid linkage and medium linkage. In this report weused the complete linkage method.In complete Linkage method, the distance between two groups is set as the most distant pairof individuals in the groups, for example the distance between cluster I and II in the Figure 1.3is the maximum distance of the pairs of d13,d14,d15,d23,d24 and d25 . So the distance betweencluster I and II is d14, which is the most distant among all.

Figure 1.3: Distance between two clusters

Example

Consider the distance matrix for 5 objects as below:

D =

1 2 3 4 51 0 9 3 6 112 9 0 7 5 103 3 7 0 9 24 6 5 9 0 85 11 10 2 8 0

4

Then we should search the distance matrix for the nearest pair of objects by finding the smal-lest number in the matrix, as you can see object 3 and 5 have the shortest distance of 2 unitsso we merge them and make a new cluster called (3,5).

(3,5)→ 2

The next step is to update the distance matrix by deleting the rows and columns corres-ponding to object 3 and 5, and adding row and column for our new object (3,5) with its distancefrom other objects in our updated matrix

For updating matrix D we need to calculate the distance between the new cluster and theold cluster. This distance is equal the maximum distance of the old object and the componentsof the new object, as it is formulated below:

d(3,5),1 = Max(d1,3,d1,5) = Max(3,11) = 11

d(3,5),2 = Max(d2,3,d2,5) = Max(7,11) = 10

d(3,5),4 = Max(d4,3,d4,5) = Max(9,8) = 9

The updated matrix:

D =

(3,5) 1 2 4

(3,5) 0 11 10 91 11 0 9 62 10 9 0 54 9 6 5 0

repeat the same procedure until you grouped all the objects in the matrix together. for exampleif you search the matrix above, you can see that the the next lowest distance is between object2 and 4 at level 5.

(2,4)→ 5

Now calculating the new distance to update the matrix as follow:

d(2,4),1 = Max(d4,1,d2,1) = Max(10,6) = 10

d(3,5),(2,4) = Max(d(3,5),2,d(3,5),4) = Max(10,9) = 10

Update the matrix:

D =

(3,5) (2,4),1

(3,5) 0 10 11(2,4) 10 0 91 11 9 0

5

The next step is to merge object group of (2,4) with 1 at distance level of 9:

((2,4),1)→ 9

Calculating the new distance between the last two clusters is given as:

d((2,4),1),(3,5) = Max(d(1,(3,5),d(2,4),(3,5)) = Max(11,10) = 11

The distance matrix of the last two clusters:

D =

( (3,5) (1,(2,4))(3,5) 0 11(1,(2,4)) 11 0

)The last two groups merged together at the distance level of 11

(((1,(2,4)),(3,5))→ 11

Now using the information extracted from the distance matrix, we can construct the dendro-gram tree. We found out that the object 3 and 5 becomes a group with lowest distance at thelevel of 2 ((3,5)→ 2) and the next two objects that we merged them together were object2 and 4 at the distance level of 5. ((2,4)→ 5). Then object 1 joined the group of (2,4) atlevel 9.(((2,4),1)→). Finally the two group of (3,5) and (1,(2,4)) merged at the level of 11.(((1,(2,4)),(3,5))→ 11.

In the figure 1.4 you can see the constructed dendrogram tree of all the five objects

Figure 1.4: Dednrogram of five objects clustered together

6

In the next part of the report we used agglomerative HCA into the financial securitieswhich are traded in Sweden and U.S, then try to extract economical information out of ourclassification of stocks to contribute the required information to build portfolio.

7

Chapter 2

Analysis

We divided the analysis into two main parts:

1. Detecting the hierarchical structure of stocks: The aim of this part is to find any possibleclassifications of stocks by using only the information of the daily adjusted stock prices.

2. Portfolio construction using the data obtained from part 1

2.1 Part IIn the first part of the analysis we detected the hierarchical structure of stocks traded in thefinancial market of USA and Sweden. In order to obtain the hierarchical structure of stocks ofspecial portfolio we used the correlation coefficient of the daily return.

2.1.1 Data DescriptionWe got the data from the Dow Jones Industrial Average (DJIA) index, stocks which aretraded in Standard and Poor‘s 500 (S&P500) and OMX Stockholm 30 (OMXS30) index, from2009/04/01 to 2013/04/01.

First we downloaded the adjusted daily price of stocks and calculated the daily return ofeach stock in our three indices during the specified time in R programming language . Thenwe calculated the correlation coefficient between each pair of stocks in related indices. Wecalculated the correlation coefficient, because correlation is a way of finding similarity or dis-similarity between objects. We put these correlation coefficients in three matrices (one matrixfor each index).

The value of correlation coefficient varies between -1 to +1. The -1 value means that pairof stocks reacts completely differently to an specific financial events or shocks,while the +1value of correlation between two pairs mean they are moving in exactly same direction, when

8

there is a financial events or shocks and 0 value means ,two stocks are uncorrelated.

As mentioned in the introduction part of HCA, we need the distance between each pair ofstocks in order to makes dendrogram tree of objects and do the clustering, We cannot use thecorrelation matrix to do the hierarchical cluster analysis because it does not satisfy all threerequirements of distance matrix as follow:

1. d(i, j) = 0 i f f i = j

2. d(i, j) = d( j, i)

3. d(i, j)≤ d(i,k)+d(k, j)

Therefore we converted the correlation matrix to distance matrix by the following formula: 1

d(i, j) =√

2(1−ρi j)

Finally we plotted the dendrogram trees of each index to be able to see the possible existingclusters in them.

1High-frequency cross-correlation in a set of stocks, Givonanni Bonanno

9

2.1.2 OMXS30 IndexOMXS30 is the Stockholm Stock Exchange index consists of the 30 most actively tradedstocks on the Stockholm Stock Exchange. The reason that we chose this index is that wewanted to make sure that our analysis would work on both US and non-US indices. Thedendrogram of OMXS30 showed us very interesting results, as it is noticeable in Figure 2.1most of the stocks in the same industry are grouped together. It was very interesting that howHCA classified the stocks in the same industry together.

Figure 2.1: OMXS30 Dendrogram

In Table 2.1 you can see the list of all 30 stocks of OMXS30, Using the information in

10

this table you can see that almost all stocks in the same sector are grouped together because oftheir similarities, for instance Atlas Copco (class A) and Atlas Copco (class B) which are inthe sub-industry of machinery are grouped together at the lowest height or distance, this meansthat these two stocks have a correlation coefficient close to +1 and move perfectly together.Then this cluster is again grouped with other stocks in industrial sectors. It is also obvious thatthe financial stocks are grouped together too, Swedbank (SWED-A), Nordea (NDA-SEK),SEB (SEB-A) and Svenska Handelsbanken (SHB-A).

Stock Symbols Company SectorsABB ABB Industrial MachineryALFA Alfa Laval Atlas Copco (class A)ASSA B Assa Abloy Building ProductsAZN AstraZeneca PharmaceuticalsATCO A Atlas Copco (class A) Industrial MachineryATCO B Atlas Copco (class B) Industrial MachineryBOL Boliden Diversified Metals & MiningELUX B Electrolux Household appliancesERIC B Ericsson Communication EquipmentGETT B Getinge Health CareHM B Hennes & Mauritz Apparel RetailINVE B Investor Multi Sector HoldingsLUPE Lundin Petroleum Oil & Gas Exploration& ProductionMTG B Modern Times Group BroadcastingNOKI SEK Nokia Communication EquipmentNDA SEK Nordea Diversified BanksSAND Sandvik Industrial MachinerySCA B SCA Paper ProductsSCV B Scania Construction & Farm Machinery; Heavy TrucksSEB A SEB Diversified BanksSECU B Securitas Diversified Commercial & Professional ServicesSKA B Skanska Construction & EngineeringSKF B SKF Industrial MachinerySSAB SSAB SteelSHB A Svenska Handelsbanken Diversified BanksSWED Swedbank Diversified BanksSWMA Swedish Match TobaccoTEL2 Tele2 TelecomTLSN TeliaSonera TelecomVOLV B Volvo Group Construction & Farm Machinery; Heavy Trucks

Table 2.1: List of Stocks in OMXS30 and Their Sectors

11

The stocks which are grouped together in lower heights have the correlation coefficientcloser to +1 and the stocks which are grouped together in higher heights have the correlationcoefficient closer to -1, so the dendrogram tree shows how stocks are related to each otherin their own sectors and with stocks in other sectors. In constructing a portfolio, it is moreoptimal to choose stocks that are grouped together in a higher level since they are negativelyrelated or uncorrelated. In portfolio construction the main purpose is diversification so havingstocks which are highly correlated is not optimal.

As you can see the classification of stocks from their distance matrix has interesting eco-nomical information, stocks in the same industry are almost grouped together. Maybe we caneven use HCA to see if stocks belong to same industry or not. As mentioned earlier we usedonly daily adjusted close price of each stock and nothing more, this proves that the daily pricesof stocks have very useful information inside them.

12

2.1.3 DJIA IndexThe Dow Jones Industrial Average is one of several indices created by Wall Street Journaleditor and Dow Jones and Company co-founder Charles Dow. JDIA is an index that showshow 30 large publicly owned companies based in the United States have traded in the stockmarket.

Figure 2.2: DJIA dendrogram

The next index which we built the dendrogram is based on stocks in DJIA index. You cansee that it is even more accurate classification of stocks. In the table 2.2 you can find all stockstraded in DJIA and the sectors that they belong to, using this table you can see in Figure 2.2

13

that the stocks which grouped together belong to same industry, for example all four techno-logical companies (Microsoft, Intel, IBM and CISCO) are grouped together, or stocks that arein basic materials like CVX and XOM are fit together as well.

Stock Symbols Company SectorsMCD MC Donald’s Consumer GoodsKO Coca Cola Consumer GoodsPG Procter & Gamble Consumer GoodsPFE Pfizerc Inc Health CareMPK Merk and Co Inc Health CareJNJ Johnson & Johnson Health CareUNH United Health Group Health CareVZ Verizon Commiunication Inc TelecomT AT & T TelecomINTC Intel co TechnologyIBM International Business Machine TechnologyMSFT Microsoft TechnologyCSCO Cisco System TechnologyHPQ Hewlett-Packard TechnologyUTX United Technologies Co Industrial GoodsCAT Caterpillar Industrial GoodsAA Alcoa Basic MaterialDD DuPont Bacic MaterialMMM 3M ConglomeratesGE General Electrics ConglomeratesDIS Walt Disney EntertainmentXOM ExxonMobile Oil & GasCVX Shevron Co Oil & GasBA Boeing AerospaceHD The Home Depot ServiceWMT Wall-Mart ServiceJPM JPMorgan Chase FinancialBAC Bank of America FinancialAXP America Express FinancialTRV Travelers Financial

Table 2.2: List of Stocks in DJIA and Their Sectors

14

2.1.4 S&P500 IndexThe Standard & Poor’s 500 is a stock market index of 500 large companies publicly traded inthe USA stock market.

The same investigation is done for stocks of S&P500 index. The difference here is thatinstead of having only 30 stocks like DJIA index and OMXS30 index, we have 500 stockswhich allow us to find a very good hierarchical structure of stocks. Because of the big size ofdata and very large complicated dendrogram tree, we could not show the detailed graph, but araw sketch is in figure 2.3. However with running the code which is available in Appendix Ayou can have a clear vision of it.

After investigation of S&P500 graph, we find out 10 big sectors such as health care, en-ergy, financial, materials, industrials, consumer discretionary, consumer staples, informationtechnology, telecommunication and utilities and their sub sectors all together. In simple words,all stocks that have a lot in common from economical point of view grouped together.

15

Figure 2.3: S&P500 Dendrogram

Up to here, we announced a classification method to link securities in financial marketstogether and these classification of stocks all have clear economical reasons.

16

2.2 Part IIIn the second part of the analysis, we want to study if the dendrogram tree of any of the threeindices gives useful information to The investors or not. But let’s have a short introduction toportfolio theory for the beginning:

2.2.1 Markowitz Portfolio TheoryIt is not an optimal for an investor to hold single assets, instead holding a group of assets ismore efficient. Portfolio is a group of financial securities that is created based on the investor’sobjectives. The main objective of constructing a financial portfolio is to reduce the risk bychoosing different securities based on their individual risk and return and their correlationwith each other.

The most well-known portfolio theory is Markowitz portfolio theory. The main idea of thistheory is to to define the risk as standard deviation and based on that he defined an optimalportfolio that maximize the return for specified risk or minimize the risk for a given level ofreturn.

There are different optimization problems related to Markowitz theory, but here we focuson the most two important problems as follow:

The Minimum Risk Portfolio

Min wT Σw

s.t:wT µ = r̄wT 1 = 1

The formula means that we want to minimized the portfolio of the risk, whereΣ is the covari-ance matrix of stocks. w is the vector of weights assigned to each stocks subject to conditionthatwT 1 = 1 . wT 1 = 1 Is a condition that all the capital that, the investor has, should be in-vested completely in portfolio. The target return r̄ is expressed by wT µ = r̄, where µ is thevector of expected return of stocks

Maximum Return Portfolios

Opposite to minimum risk portfolios, that we minimized the risk for target return, here wemaximize the return for a given risk. This is the mathematical formula of the problem asfollow:

Max wT µ

s.t:

17

wT Σw = σ2

wT 1 = 1

with above two optimization problems, we can define a set of efficient portfolios that offer themaximum return for a different defined level of risk or the lowest risk for a different definedlevel of return. These set of optimal portfolios that we can draw in a risk-return space calledefficient frontier.

2.2.2 Comparision with Equal-Weight PortfolioFirstly we construct equal weight portfolio in R using stocks of OMX Stockholm 30 index.Equal weight portfolio means that we equally distribute the same amount of money to eachstock. For example here since we have 30 stocks, the amount of money we invest in eachstock would be 0.0333. The return of the constructed portfolio, risk and the value at risk areas follow:

r̄p = 0.0043 σp = 0.0294 VaR = 0.041

Then we construct another portfolio from stocks of OMXS30 and we use only informationthat we extract from dendrogram of OMXS30.Here the objective of portfolio is to have the same return as return of equal weight portfolio(r̄p = 0.0043) but try to minimize the risk with the help of information from dendrogram tree.Here are the added constraints to the portfolio we made based on dendrogram tree.

1. The minimum weight for each stock is zero (the same constraint for equal weight port-folio too)]

2. The weight for Atlas copco A and B should be equal to zero

3. Maximum amount in our three main industrial sector stocks is 10%

4. Maximum amount that we would like to invest in all financial stocks is 20%

5. Minimum weight for stocks with symbols HM-B.ST, AZN.ST, ERIC-B.ST,SWMA.STand SCV-B.ST is at least 5%

The reason for having such constraints is as follow:

The first constraint is the most well-known and common constraint, the long only con-straints. It means that you cannot do the short selling.

The second constraint is for two stocks of Atlas copco A and Atlas copco B. As you cansee from OMXS30 dendrogram tree, these two have the lowest distance from each other thanany other pair of stocks and these two has very low distance from industrial sector stocks too.since we construct our dendrogram from correlation matrix of stocks, lower distance means

18

higher correlation , and higher correlation is not welcome in portfolio construction, so we in-vest no money in Atlas Copco A and B

The third constraint is about the industrial group of Volvo, Svenska Kullagerfabriken andSandvik company. These three are affected by same economical event and tend to move to-gether. So they would not contribute much to diversification of portfolio. Because of thatreason, we put limit on them that the total money in all these three should not be more than10% of all money.

The fourth constraint is for financial group of stocks of, Swedbank, Nordea, SEB, andSvenska Handelsbanken. The same reason from constraint three applies to financial clustertoo. We put the limit that the maximum amount in these group should not be more than 20%of total investment.

The last but not the least constraint is for the stocks which lie in upper right corner ofdendrogram tree. these stocks are like single member cluster and have high distance with eachother and other groups of stock. They have higher distance with other groups which meansthey have close to zero or even negative correlation. So they should have more weight thanthe weight they had in equal weight portfolio. We say that stocks with symbols HM-B.ST,AZN.ST, ERIC-B.ST,SWMA.ST and SCV-B.ST should have at least 5% or even more oftotal investment in each of them.

After running the program we got these value for return,risk and value at risk of portfolio:

r̄p = 0.0043 σp = 0.0196 VaR = 0.0247

As you can see from the results, the risk of portfolio has been reduced magnificently. Thisproves the help of information that we extracted from dedrogram. As you can see for the samereturn on portfolio, we reduced the risk by 33% and reduced the value at risk of portfolio by40%.

If you take a look at the weights of portfolio in Figure 2.4, you can see that it satisfies all ofour five constraints. From our weight pie chart, one can see that most of the money is investedin stocks which are in upper corners of our dendrogram tree. These stocks have the highestdistance with other stocks and cluster of stocks.

Having the same target risk as equal weight portfolio but with lower risk of portfoliois what every investor with different investment strategy and taste of risk would appreciate.You can see that we use only our dendrogram tree as the source of information from stocks,and with analysing dendrogram tree carefully we construct a better portfolio from stocks ofOMXS30.

19

Figure 2.4: weights of stocks of OMXS30 after introducing constraints

Then we decide to construct another portfolio with stocks of Dow Jones Index. Firstly weconstruct an equal-weight portfolio of all 30 stocks. Each stock has weight of 0.0333.The return , risk and VaR of the constructed portfolio is as follow:

r̄p = 0.0043 σp = 0.0243 VaR = 0.0361

After equal weight portfolio of stocks of Dow Jones Industrial, then we construct another port-folio with information that we pull out of dendrogram of DJIA.For following portfolio we fixed the return equal to return of equal weight portfolio, becausewe do not want less return, but try to reduce the risk by adding more constraints.We should mention here that, the following constraints which are extract from dendrogramtree of DJIA are by our own analysis and may not be the best constraints for every investors.Here are the constraints :

1. The minimum weight for each stock is zero (the same constraint for equal weight port-folio too)

2. Stocks with symbols MCD,KO,PG,WMT,UNH and TRV should have at least 5% shareof investment each.

3. Do not invest any money in CVX and XOM.

4. The maximum sum of investment in technology sector is 20% or less.

5. The maximum sum of investment in industrial goods and basic material should not bemore than 10% of all investment.

The first constraints is the long only constraints, short selling is not allowed.

20

The second constraints is about the stocks which have highest distance with other stocksand are only single member clusters. We believe that they will help a lot to diversify the port-folio, so we define that one should invest at least 5% or much more of his investment in eachof those stocks.

The third constraint is to put no investment in Chevron Corporation(CVX)and Exxon Mo-bil Corporation (XOM). These two has the highest correlation(lowest distance)and both are inexactly same sector and sub sector industry so we omit them.

The forth constraint is to put upper limit on four companies in financial sector. All thesestocks in financial sectors should not together has ore than 20% of all investment

The last constraint is to define the upper limit again for stocks in industrial goods. It shouldnot have more than 10% of totall investment.

The return,risk and VaR of the constructed portfolio is:

r̄p = 0.0043 σp = 0.0164 VaR = 0.0238

As you can see , the return of portfolio is the same but the risk is reduced by 32% and the VaRis reduced by 34%. And the pie chart below shows the weight of portfolio that satisfy all ofour constraints

Figure 2.5: Weights of stocks of DJIA after introducing constraints

21

2.2.3 Comparison with Optimal PortfolioIn order to compare the efficiency of our constructed portfolio we also compared it with theoptimal portfolio which was made using the build in function ”fportfolio” in R.

Below you can see the result of the optimal portfolio of OMXS30:

r̄p = 0.0043 σp = 0.0196 VaR = 0.0250

The pie chart below shows the weight of Optimal portfolio that has been constructed by R.

Figure 2.6: Weights of stocks of OMXS30 in Optimal Portfolio

Comparing this result with the portfolio that we have already made using HCA we havethe same level of risk but lower value at risk which indicates that our portfolio works better forthe worst-case scenario. As you can see from comparing Figure 2.4 and Figure 2.6, we havedifferent weight contributed to each stock too.

Next we construct the optimal portfolio with the help of built-in function of R for DJIAstocks. Here are the result of return, risk and value at risk of The portfolio:

r̄p = 0.0043 σp = 0.0164 VaR = 0.0241

The pie chart below shows the weight of Optimal portfolio of DJIA stocks that has beenconstructed by R.

22

Figure 2.7: Weights of stocks of DJIA in Optimal Portfolio

Analysing our portfolio which is constructed base on the information extracted from theHCA and the optimal portfolio constructed in R, we observed that although the portfolios havedifferent weights of stocks, both of them has the same risk for the same target return but ourporfolio has lower VaR comparing to the optimal one.

23

Chapter 3

Summary and Some Future Studies

In the first part of our analysis we showed a statistical data analysis method to classify the se-curities traded in DJIA, S&P500 and OMXS30 indices.We showed that the clusters of stocksthat we created has a meaningful economical informatioin. We can also get to the conclusionthat, it is possible to make a useful and understandable dendrogram tree of stocks with onlydistance matrix of securities from equation that covert our correlation matrix to distance mat-rix. Lastly we should mention that the dendrogram is obtained by using the time series ofstock prices only which means that stock prices have very important economical information.

In the second part of our analysis, we showed that the Hierarchical cluster analysis (HCA)gives very good overall picture of all stocks and with the help of the dendogram tree we can ex-tract meaningful financial inter-relation between stocks. Then we costructed a portoflio basedon our own analysis of dendogram and showed that our portfolio defeats the equal weightportfolio in both markets. In this part of our analysis, we compared the constructed portfoliowith the optimal portfolio constructed by R and we got exactly the same amount of risk butbetter VaR for both OMXS30 and DJIA stocks.These results strongly prove that hierarchical structure does help in stock classification andportfolio construction.

All in all, clustering helps to a better classification gives some exploratory analysis to in-vestors, but before it will be used in real world much more study is needed.

There are still some questions related to this topic that should be answered, for example:

- Here we grouped the stocks according to their correlations, what if we do the classifica-tion according to each sectors that each stock belongs to?- Another work is to do clustering based on the profitability ratios such as Return on Equity,Return on Asset, Cost of borrowing, etc.- One may work on finding an optimal value for distance level. How can we define the optimaldistance level?- How about other methods of cluster analysis? Here we use here the Agglomerative Hier-archical Cluster Analysis . Does divisive HCA give us the same result?

24

Bibliography

[1] Everitt, Brian. Cluster analysis., Chichester, West Sussex, U.K.: Wiley, 2011

[2] Elton, Edwin J. 2011. Modern portfolio theory and investment analysis: Internationalstudent version. New York: J. Wiley & Sons, 2011.

[3] TOLA,V."Cluster analysis for portfolio optimization." Journal of Economic Dynamicsand Control 32 (2008): 235-58.

[4] Mantegna,R.N,"Hierarchical structure in financial markets."The European PhysicalJournal B 11(1999): 193-97.

[5] Bonanno, G., F. Lillo, and R.N. Mantegna. "High-frequency cross-correlation in a set ofstocks."Quantitative Finance 1 (2001): 96-104.

26

Appendix A

The source code

1 #codes for Economic thesis2 ##Dow Jones Industrial average3 stocks <-c("AA","AXP","BA","BAC","CAT","CSCO","CVX","DD","DIS","GE","HD","HPQ","IBM","INTC","

JNJ","JPM","KO","MCD","MMM","MRK","MSFT","PFE","PG","T","TRV","UNH","UTX","VZ","WMT","XOM")

4 install.packages("fPortfolio")5 install.packages("PerformanceAnalytics")6 install.packages("tseries")7 install.packages("quantmod")8 install.packages("lubridate")9 install.packages("timeSeries")

1011 require("quantmod")12 tickers <- getSymbols(stocks, from="2009-03-30", to="2013-03-30")13 # combine the adjusted close values in one (xts) data.frame14 dataset <- Ad(get(tickers[1]))15 for (i in 2:length(tickers)) { dataset <- merge(dataset, Ad(get(tickers[i]))) }1617 # handle missing values18 data_omit <- na.omit(dataset) # omit values with NA values19 data_locf <- na.locf(dataset) # last observation carried forward20 data_approx <- na.approx(dataset) # linear approximation21 data_spline <- na.spline(dataset) # cubic spline interpolation2223 # calculate returns using quantmod function ROC24 return_lag = 5 # (crude) weekly returns25 data = na.omit(ROC(data_spline, n=return_lag, type="discrete"))26 names(data) <-27 library("timeSeries")28 ts_datadow = as.timeSeries(data)2930 #calculating the correlation matrix31 risk_cordow = cor(ts_datadow)3233 #converting the correlation matrix to distance matrix34 d<-sqrt(2*(1-risk_cordow))35 distance<-as.dist(d)36 #construct the Hierarchical tree of DJIA37 hclustering<-hclust(distance)38 title(main="test")39 plot(hclustering,main="DJIA dendroram")4041 #portfolio construction42 library("fPortfolio")4344 #equal weight portfolio of stocks of DJIA45 spec<-portfolioSpec()

27

46 nAssets<-ncol(ts_datadow)47 setWeights(spec)<-rep(1/nAssets,times=nAssets)48 constraints=’LongOnly’49 ewPortfolio<-feasiblePortfolio(ts_datadow,spec,constraints)50 print(ewPortfolio)5152 #group constraints and box constraints based on Hierarchical tree of DJIA53 data<-ts_datadow54 conspec<-portfolioSpec()55 setTargetReturn(conspec)<-0.004356 box1<-"minW[1:nAssets]=0"# it is long only portfolio57 box4<-"minW[c(29,26,25,17,18,23)]=rep(0.05,6) " #mcd,ko,pg,wmt,unh,trv58 box2<-"maxw[c(7,30)]=c(0,0)" #cvx,xom59 box6<-"maxW[c(1,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29)]=

rep(0.2,28)"60 box3<-"maxsumW[c(6,13,14,21)]=0.2" # INTC,IBM,MSFT,CSCO61 box5<-"maxsumW[c(1,5,8,19,27)]=0.1" #sume of volvo and sand and skf62 constraints<-c(box1,box2,box3,box5,box4)63 gruportfolio<-efficientPortfolio(ts_datadow,conspec,constraints)64 print(gruportfolio)65 col = divPalette(ncol(ts_datadow), "Spectral")66 weightsPie(gruportfolio,box=FALSE,col=col,radius=0.5,labels=TRUE)67 mtext(text=’grouped stocks to minimize the risk’,side=3,line=1.5,68 font=2,cex=0.7)697071 #emtehan72 data<-ts_datadow73 conspec<-portfolioSpec()74 setTargetReturn(conspec)<-0.004375 efficient<-efficientPortfolio(ts_datadow,conspec)76 print(efficient)77 col = divPalette(ncol(ts_datadow), "Spectral")78 weightsPie(efficient,box=FALSE,col=col,radius=0.5,labels=TRUE)79 mtext(text=’efficient porftolio ’,side=3,line=1.5,80 font=2,cex=0.7)8182 ###########################################################83 ## OMX STOCKHOLM84 stocks <-c("ABB.ST","ALFA.ST","ASSA-B.ST","ATCO-A.ST","ATCO-B.ST","AZN.ST","BOL.ST","ELUX-B.

ST","ERIC-B.ST","GETI-B.ST","HM-B.ST","INVE-B.ST","LUPE.ST","MTG-B.ST","NDA-SEK.ST","NOKI-SEK.ST" ,"SAND.ST" ,"SCA-B.ST" ,"SCV-B.ST","SEB-A.ST","SECU-B.ST","SHB-A.ST","SKA-B.ST","SKF-B.ST","SSAB-A.ST","SWED-A.ST" ,"SWMA.ST","TEL2-B.ST","TLSN.ST","VOLV-B.ST")

85 install.packages("fPortfolio")86 install.packages("PerformanceAnalytics")87 install.packages("tseries")88 install.packages("quantmod")89 install.packages("lubridate")9091 require("lubridate")92 require("quantmod")93 tickers <- getSymbols(stocks, from="2009-03-30", to="2013-03-30")9495 # combine the adjusted close values in one (xts) data.frame96 dataset <- Ad(get(tickers[1]))97 for (i in 2:length(tickers)) { dataset <- merge(dataset, Ad(get(tickers[i]))) }9899 # handle missing values

100 data_omit <- na.omit(dataset) # omit values with NA values101 data_locf <- na.locf(dataset) # last observation carried forward102 data_approx <- na.approx(dataset) # linear approximation103 data_spline <- na.spline(dataset) # cubic spline interpolation104105 # calculate returns using quantmod function ROC106 return_lag = 5 # (crude) weekly returns107 data = na.omit(ROC(data_spline, n=return_lag, type="discrete"))

28

108 names(data) <- stocks109 library("timeSeries")110 ts_datast = as.timeSeries(data)111112 #calculating the correlation113 risk_corst = cor(ts_datast)114115 # convert the correlation to distance matrix116 d<-sqrt(2*(1-risk_corst))117 distance<-as.dist(d)118 #ploting the Hierarchical tree119 hclustering<-hclust(distance)120 plot(hclustering,main="OMXS30 dendrogram")121122 #Equal weight portfolio of stocks of OMXS123 library("fPortfolio")124125 spec<-portfolioSpec()126 nAssets<-ncol(ts_datast)127 setWeights(spec)<-rep(1/nAssets,times=nAssets)128 constraints=’LongOnly’129 ewPortfolio<-feasiblePortfolio(ts_datast,spec,constraints)130 print(ewPortfolio)131132 #group constraints and box constraints133 data<-ts_datast134 conspec<-portfolioSpec()135 #setWeights(conspec)<-rep(1/nAssets,times=nAssets)136137 setTargetReturn(conspec)<-0.004365655138 box1<-"minW[1:nAssets]=0"# it is long only portfolio139 box4<-"minW[c(6,9,11,18,27)]=rep(0.05,5) " #"HM-B.ST", "AZN.ST" , "ERIC-B.ST","SWMA.ST","SCV

-B.ST"140 box2<-"maxw[c(4:5)]=c(0,0)" #here we say that the weight should be zero for atlas copco A

AND B141 box3<-"maxsumW[c(15,20,22,26)]=0.2" # here we put constraint that the weight for financial

industry not more than 0.2142 box5<-"maxsumW[c(17,24,30)]=0.1" #sume of volvo and sand and skf143 constraints<-c(box1,box2,box3,box5,box4)144 gruportfolio<-efficientPortfolio(ts_datast,conspec,constraints)145 print(gruportfolio)146 col = divPalette(ncol(ts_datast), "Spectral")147 weightsPie(gruportfolio,box=FALSE,col=col,radius=0.5)148 mtext(text=’grouped stocks to minimize the risk’,side=3,line=1.5,149 font=2,cex=0.7)150151 data<-ts_datast152 conspec<-portfolioSpec()153 setTargetReturn(conspec)<-0.004365655154 efficient<-efficientPortfolio(ts_datast,conspec)155 print(efficient)156157 col = divPalette(ncol(ts_datast), "Spectral")158 weightsPie(efficient,box=FALSE,col=col,radius=0.5)159 mtext(text=’efficient portfolio’,side=3,line=1.5,160 font=2,cex=0.7)161 #####################################################162 #S&P500163 stocks <-c("MMM","ABT","ABBV","ANF","ACE","ACN","ACT","ADBE","ADT","AMD","AES","AET","AFL","A

","GAS","APD","ARG","AKAM","AA","ALXN","ATI","AGN","ALL","ALTR","MO","AMZN","AEE","AEP","AXP","AIG","AMT","AMP","ABC","AMGN","APH","APC","ADI","AON","APA","AIV","APOL","AAPL","AMAT","ADM","AIZ","T","ADSK","ADP","AN","AZO","AVB","AVY","AVP","BHI","BLL","BAC","BK","BCR","BAX","BBT","BEAM","BDX","BBBY","BMS","BRK.B","BBY","BIIB","BLK","HRB","BMC","BA","BWA","BXP","BSX","BMY","BRCM","BF.B","CHRW","CA","CVC","COG","CAM","CPB","COF","CAH","CFN","KMX","CCL","CAT","CBG","CBS","CELG","CNP","CTL","CERN","CF","SCHW","CHK","CVX","CMG","CB","CI","CINF","CTAS","CSCO","C","CTXS","CLF","CLX","CME","CMS","COH","KO","CCE","CTSH",

29

"CL","CMCSA","CMA","CSC","CAG","COP","CNX","ED","STZ","GLW","COST","CVH","COV","CCI","CSX","CMI","CVS","DHI","DHR","DRI","DVA","DF","DE","DELL","DLPH","DNR","XRAY","DVN","DO","DTV","DFS","DISCA","DG","DLTR","D","DOV","DOW","DPS","DTE","DD","DUK","DNB","ETFC","EMN","ETN","EBAY","ECL","EIX","EW","EA","EMC","EMR","ESV","ETR","EOG","EQT","EFX","EQR","EL","EXC","EXPE","EXPD","ESRX","XOM","FFIV","FDO","FAST","FDX","FIS","FITB","FHN","FSLR","FE","FISV","FLIR","FLS","FLR","FMC","FTI","F","FRX","FOSL","BEN","FCX","FTR","GME","GCI","GPS","GRMN","GD","GE","GIS","GPC","GNW","GILD","GS","GT","GOOG","GWW","HAL","HOG","HAR","HRS","HIG","HAS","HCP","HCN","HNZ","HP","HES","HPQ","HD","HON","HRL","HSP","HST","HCBK","HUM","HBAN","ITW","IR","TEG","INTC","ICE","IBM","IFF","IGT","IP","IPG","INTU","ISRG","IVZ","IRM","JBL","JEC","JDSU","JNJ","JCI","JOY","JPM","JNPR","K","KEY","KMB","KIM","KMI","KLAC","KSS","KRFT","KR","LTD","LLL","LH","LRCX","LM","LEG","LEN","LUK","LIFE","LLY","LNC","LLTC","LMT","L","LO","LOW","LSI","LYB","MTB","M","MRO","MPC","MAR","MMC","MAS","MA","MAT","MKC","MCD","MHP","MCK","MJN","MWV","MDT","MRK","MET","PCS","MCHP","MU","MSFT","MOLX","TAP","MDLZ","MON","MNST","MCO","MS","MOS","MSI","MUR","MYL","NBR","NDAQ","NOV","NTAP","NFLX","NWL","NFX","NEM","NWSA","NEE","NKE","NI","NE","NBL","JWN","NSC","NTRS","NOC","NU","NRG","NUE","NVDA","NYX","ORLY","OXY","OMC","OKE","ORCL","OI","PCAR","PLL","PH","PDCO","PAYX","BTU","JCP","PNR","POM","PEP","PKI","PRGO","PETM","PFE","PCG","PM","PSX","PNW","PXD","PBI","PCL","PNC","RL","PPG","PPL","PX","PCP","PCLN","PFG","PG","PGR","PLD","PRU","PEG","PSA","PHM","PVH","QEP","PWR","QCOM","DGX","RRC","RTN","RHT","RF","RSG","RAI","RHI","ROK","COL","ROP","ROST","RDC","R","SWY","SAI","CRM","SNDK","SCG","SLB","SNI","STX","SEE","SRE","SHW","SIAL","SPG","SLM","SJM","SNA","SO","LUV","SWN","SE","S","STJ","SWK","SPLS","SBUX","HOT","STT","SRCL","SYK","STI","SYMC","SYY","TROW","TGT","TEL","TE","THC","TDC","TER","TSO","TXN","TXT","HSY","TRV","TMO","TIF","TWX","TWC","TJX","TMK","TSS","TRIP","TSN","TYC","USB","UNP","UNH","UPS","X","UTX","UNM","URBN","VFC","VLO","VAR","VTR","VRSN","VZ","VIAB","V","VNO","VMC","WMT","WAG","DIS","WPO","WM","WAT","WLP","WFC","WDC","WU","WY","WHR","WFM","WMB","WIN","WEC","WPX","WYN","WYNN","XEL","XRX","XLNX","XL","XYL","YHOO","YUM","ZMH","ZION","PBCT")

164165 require("lubridate")166167 require("quantmod")168 tickers <- getSymbols(stocks, from="2009-03-30", to="2013-03-30")169 # combine the adjusted close values in one (xts) data.frame170 dataset <- Ad(get(tickers[1]))171 for (i in 2:length(tickers)) { dataset <- merge(dataset, Ad(get(tickers[i]))) }172173 # handle missing values174 data_omit <- na.omit(dataset) # omit values with NA values175 data_locf <- na.locf(dataset) # last observation carried forward176 data_approx <- na.approx(dataset) # linear approximation177 data_spline <- na.spline(dataset) # cubic spline interpolation178179 # calculate returns using quantmod function ROC180 return_lag = 5 # (crude) weekly returns181 data = na.omit(ROC(data_spline, n=return_lag, type="discrete"))182 names(data) <- stocks183 ts_data = as.timeSeries(data)184185 #calculating the correlation186 risk_cor = cor(data)187188 #Converting the correlation matrix to distance matrix189 d<-sqrt(2*(1-risk_cor))190 distance<-as.dist(d)191192 #constructing the hierarchical tree of S&P500193 hclustering<-hclust(distance)194 plot(hclustering,main="S$P500 dendrogram")

econcode.r

30

BACHELOR THESIS IN ECONOMICSecon1.altervista.org/econ/edu/cup/reports/2013/cluster.pdf · Sheila...

Documents

Transcript of BACHELOR THESIS IN ECONOMICSecon1.altervista.org/econ/edu/cup/reports/2013/cluster.pdf · Sheila...