Mean-variance portfolio optimization when means and...

Introduction and reviewA high dimensional plug-in covariance matrix estimator

A modified Markowitz frameworkConclusion

Mean-variance portfolio optimization when means

and covariances are estimated

Zehao Chen

June 1, 2007

Joint work with Tze Leung Lai (Stanford Univ.) and Haipeng Xing (ColumbiaUniv.)

Zehao Chen M.V. optimization when means and covariances are estimated



Outline

1 Introduction and reviewThe Markowitz frameworkDifferent efficient frontier definitions under stochastic setting

2 A high dimensional plug-in covariance matrix estimatorThe L2 boosting estimatorSimulation and empirical study

3 A modified Markowitz frameworkThe frameworkAn example and simulation

4 Conclusion




The Markowitz frameworkDifferent efficient frontier definitions under stochastic setting

The basic formulation

We denote the returns of p risky assets (e.g. stock returns) by ap × 1 vector R , and an unobserved future return by r ,

E(R) = µ, Cov(R) = Σ

The mean-variance optimization solves for an asset allocation w ,which minimizes the portfolio risk σ2

w , while achieving a certaintarget return µ∗, i.e,

minw

wTΣw = minw

σ2w

subject to

wTµ ≥ µ∗, w1 + w2 + ... + wp = 1





The basic formulation

This formulation has a closed form solution for w ,

w∗ = f (µ, µ∗,Σ−1)

The weights sometimes have additional constraints, i.e.

li ≤ wi ≤ ui , i = 1, ..., p

If li ≥ 0, the constraint is also called no short selling constraint.The solution under this additional constraint requires quadraticprogramming.





Efficient frontier





”Plug-in” efficient frontier

For w = w(X , µ∗) and reasonable µ and Σ estimates µ(X ) andΣ(X ),

minw

wT Σw

subject towT µ ≥ µ∗

Define the ”plug-in” efficient frontier as parametrized by µ∗

C (µ∗) = (wT Σw ,wTµ)





”Plug-in” covariance estimates

Factor model (with domain knowledge)

R = α + BF + ǫ

Σ = W1 + W2 = BΩBT + Cov(ǫ)





”Plug-in” covariance estimates

Factor model (with domain knowledge)

R = α + BF + ǫ

Σ = W1 + W2 = BΩBT + Cov(ǫ)

Shrinkage estimator

αF + (1 − α)Σ





Random efficient frontier?

0.04 0.06 0.08 0.1 0.12 0.14 0.160

1

2

3

4

5

6

7

8

9x 10

−3

0.04 0.06 0.08 0.1 0.12 0.14 0.160

1

2

3

4

5

6

7

8

9x 10

−3

0.04 0.06 0.08 0.1 0.12 0.14 0.160

1

2

3

4

5

6

7

8

9x 10

−3

0.04 0.06 0.08 0.1 0.12 0.14 0.163

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8x 10

−3

Figure: n = 50, p = 5 i.i.d multivariate normal






The data dependent efficient frontier is not well defined, andis a random curve.







By plugging the estimates of mean µ, it’s essentiallyconstraining on

wT µ ≥ µ∗







By plugging the estimates of mean µ, it’s essentiallyconstraining on

wT µ ≥ µ∗

Conceptually, one should constrain on

E(wT r) ≥ µ∗





Resampled efficient frontier

For bootstrapped samples X ∗

1 ,...,X ∗

B of X ,

wM(X , µ∗) =1

B

B

Σi=1

w(X ∗

i , µ∗)

Define the resampled efficient frontier as

CM(µ∗) = (wTMΣwM ,wT

Mµ)





Previous definitions of efficient frontiers


minw

wT Σw , wT µ ≥ µ∗

C (µ∗) = (wT Σw ,wTµ)


wM(X , µ∗) =1

B

B

Σi=1

w(X ∗

i , µ∗)

CM(µ∗) = (wTMΣwM ,wT

Mµ)




The L2 boosting estimatorSimulation and empirical study

If one really wants to use the plug-in

Estimate Σ or Σ−1?







Employ a sparsity assumption (in practice, residuals from somefactor model): reduce the number of parameters to estimate.







Employ a sparsity assumption (in practice, residuals from somefactor model): reduce the number of parameters to estimate.

Impose proper weight constraints.





A high dimensional covariance estimator

Use Modified Cholesky decomposition Σ−1 = T ′D−1T toreparametrize covariance matrix, and enforce sparsity on T .







Use coordinate-wise greedy (component-wise L2 boosting)with a modified BIC stopping criterion. Can be shown to be aconsistent estimator.







Use coordinate-wise greedy (component-wise L2 boosting)with a modified BIC stopping criterion. Can be shown to be aconsistent estimator.

Spectral norm convergence for both Σ and Σ−1.





Modified Cholesky decomposition

For a random vector Y = (y1, y2, ..., yp), one can write them in the”auto-regressive” form, for k ≥ 2

yk = µk + φk,1y1 + φk,2y2 + ... + φk,k−1yk−1 + ǫk

Let T be a p × p unit lower triangular matrix, with −φi ,j (i < j) onthe lower triangular. And let D be the p × p diagonal matrix

D = diag(var(y1), var(ǫ2), var(ǫ3), ..., var(ǫp))

The Modified Cholesky decomposition is

Σ−1 = T ′D−1T





The coordinate-wise greedy algorithm

The main ideas is to iteratively look for the predictor which ismostly correlated with the current model residual. Include thispredictor and re-calculate the residual.







This process will generate a sequence of nested modelsM1 ⊆ M2 ⊆ ... ⊆ Mmn .








BIC: log σ2 + log nn

· #Mk








BIC: log σ2 + log nn

· #Mk

Proposed Modified BIC: nσ2 + σ2 log p log n · #Mk





Convergence results

Definition of spectral norm:

||A||2 =√

λmax(A′A)

Convergence:||Σ − Σ||2 → 0

||Σ−1 − Σ−1||2 → 0





An example

Setting: n = 300, p = 600, number of nonzero parameters in T is900 (i.e. 3 per row), and T elements uniformly distributed in[0.5, 1].





MSE to true inverse covariance

50 100 150 200 250 3000

10

20

30

QE

1

50 100 150 200 250 3000

500

1000

1500

2000

p (p = 2n)

QE

1

BoostingPLM

Shrinkage





MSE to true covariance

50 100 150 200 250 3000

2

4

6

8Q

E2

/ 100

0

BoostingShrinkage

50 100 150 200 250 3000

200

400

600

p (p = 2n)

QE

2 / 1

000

PLM

Around 8





An empirical example: NASDAQ-100, 1990-2006

0.015 0.02 0.025 0.03 0.035 0.04 0.0450.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

BoostingShrinkageSample

Figure: Empirical performance curve with S&P500 index as market factor




The frameworkAn example and simulation

Efficient frontier when means and covariances are unknown

For w(X , µ∗), solveminw

var(wT r)

E(wT r) ≥ µ∗

Define the expected efficient frontier as

C (µ∗) = EX(wT Σw ,wTµ)





A comparison of efficient frontiers


minw

wT Σw , wT µ ≥ µ∗, C (µ∗) = (wTΣw ,wT µ)


wM(X , µ∗) =1

B

B

Σi=1

w(X ∗

i , µ∗), CM(µ∗) = (wTMΣwM ,wT

Mµ)

Expected efficient frontier

minw

var(wT r), E(wT r) ≥ µ∗, C (µ∗) = EX(wT Σw ,wTµ)





Solving for the expected efficient frontier

Introduce a risk-aversion parameter λ and reformulate the problemas

minw

−E(wT r) + λvar(wT r)





A not-so-bayesian stochastic control approach

Solve for a portfolio weight w(X , λ)

−E(wT r) + λvar(wT r) = −EwT r + λE(wT r)2 − λE2(wT r)






Solve for a portfolio weight w(X , λ)

−E(wT r) + λvar(wT r) = −EwT r + λE(wT r)2 − λE2(wT r)

However, one needs the following conditional form,

EX [−wTE(r |X ) + λwTE(rrT |X )w − λwTE(??|X )w ]






This conditional form entitles us to plug in reasonable guessesfor E (r |X ),E (rrT |X ) and E (??|X ).






This conditional form entitles us to plug in reasonable guessesfor E (r |X ),E (rrT |X ) and E (??|X ).

Use bayesian way to develop a procedure and approximate itin frequentist setting.





Embedding technique

Embedding technique1:This bayesian formulation can be shown to be equivalent to

EX [−ηwT E(r |X ) + λwTE(rrT |X )w ]

where η = 1 + 2λE((w∗

λ)T r) and w∗

λ is the solution to the originaloptimization problem with risk aversion λ.

1X. Y. Zhou and D. Li (2000)Zehao Chen M.V. optimization when means and covariances are estimated





Define a parametrized function w(X , λ, α, η) that solves

−ηwT E(r |X ) + λwTE(rrT |X )w

with E(r |X ) and E(rrT |X ) replaced by approximations.






Having a parametrized procedure w(X , λ, α, η), we wish toevaluate

R(λ, α, η) = −EwT r + λvar(wT r)

=EX [−E(wT r |X ) + λvar(wT r |X )] + λvar[E(wT r |X )]






Having a parametrized procedure w(X , λ, α, η), we wish toevaluate

R(λ, α, η) = −EwT r + λvar(wT r)

=EX [−E(wT r |X ) + λvar(wT r |X )] + λvar[E(wT r |X )]

Bootstrap

Bootstrap: X ∗

1 ,X ∗

2 , ...,X ∗

B






The above can be approximated by the bootstrapped empirical risk,

R(λ, α, η) = EX [−E(wT r |X )] + λEX [var(wT r |X )] + λvar[E(wT r |X )]

EX [−E(wT r |X )] =1

B

B

Σi=1

[−wT (X ∗

i , λ, α, η)µ(X ∗

i )]

EX [var(wT r |X )] =1

B

B

Σi=1

[wT (X ∗

i , λ, α, η)Σ(X ∗

i )w(X ∗

i , λ, α, η)]

var[E(wT r |X )] = var[wT (X ∗

i , λ, α, η)µ(X ∗

i )]

where µ(X ∗

i ) and Σ(X ∗

i ) are reasonable estimates for mean andcovariance matrix of X ∗

i .






Solve forminα,η

R(λ, α, η)

with reasonable initial values.





Example: n = 50, p = 5

0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.0752.5

3

3.5

4

4.5

5

5.5x 10

−3

Std.

Ret

urn

SBFShrinkageResamplingSampleTrue





Example: n = 50, p = 30

0.03 0.035 0.04 0.045 0.05 0.055 0.064.5

5

5.5

6

6.5

7

7.5

8

8.5

9x 10

−3

Std.

Ret

urn

SBFShrinkageResamplingSample





Example: n = 50, p = 50

4 4.5 5 5.5 6 6.5 7

x 10−3

0.01

0.011

0.012

0.013

0.014

0.015

0.016

Std.

Ret

urn

SBFShrinkageHCD




Conclusion

The proposed high dimensional covariance estimator isappropriate in sparse setting, and has better performance.




Conclusion

The proposed high dimensional covariance estimator isappropriate in sparse setting, and has better performance.

The modified Markowitz framework tries to optimize on theexpected efficient frontier. It’s a general framework and hasbetter performance in many practical scenarios.


Mean-variance portfolio optimization when means and...

Documents

Transcript of Mean-variance portfolio optimization when means and...