J O I JOIM - Home | NYU Tandon School of Engineering

JOIMwww.joim.com

Journal Of Investment Management, Vol. 18, No. 2, (2020), pp. 1–16

© JOIM 2020

USING MACHINE LEARNING TO PREDICTREALIZED VARIANCE

Peter Carr∗,a, Liuren Wu†,b and Zhibai Zhang‡,a

Volatility index is a portfolio of options and represents market expectation of the underlyingsecurity’s future realized volatility/variance. Traditionally the index weighting is basedon a variance swap pricing formula. In this paper we propose a new method for buildingvolatility index by formulating a variance prediction problem using machine learning. Wetest algorithms including Ridge regression, Feedforward Neural Networks and RandomForest on S&P 500 Index option data. By conducting a time series validation we show thatthe new weighting method can achieve higher predictability to future return variance andrequire fewer options. It is also shown that the weighting method combining the traditionaland the machine learning approaches performs the best.

1 Introduction

Estimating future return variance is an essen-tial part for investing. Similar to future return, asecurity’s return variance often exhibits stochasticbehavior which makes it challenging to forecast.For many derivative instruments, pricing is pre-dominantly determined by modeling the underly-ing asset’s volatility in the risk neutral measure.

aDepartment of Finance and Risk Engineering, NYUTandon School of Engineering, New York, USA.bZicklin School of Business, Baruch College, CityUniversity of New York, New York, USA.∗[email protected]†[email protected]‡[email protected], corresponding author.

There has been a broad literature on derivativepricing which involves various volatility models.These range from the simple setup with constantvolatility (Black and Scholes, 1973) to the onesthat use deterministic functions (Dupire, 1994),and to the complex models that treat volatilityas stochastic processes (Heston, 1993). On theother hand, the market price for options and otherderivative instruments on a security reflects auniversal expectation of the underlying’s futurereturn variance. Therefore, it is reasonable touse option market price and the implied volatilityto project future return variance even without aspecific volatility model.1

Since there are numerous options written on asecurity and each of them has different implied

Second Quarter 2020 1

2 Peter Carr et al.

volatility, aggregating the options to build asingle variance estimate is non-trivial. This canbe viewed as a classic indexing problem thatconsists of security-selection rules and a weight-ing scheme. The most well-known volatility indexin this category is the CBOE Volatility Index(VIX), which is composed of options with closeto 30-day maturity on S&P 500 index (SPX). Itis one of the most commonly used estimatorsfor 30-day future variance of SPX. There is anincreasing number of volatility indices on variousequity indices and major instruments in fixed-income, currency and commodity in recent years.For VIX-styled indices, the option weights aredetermined by a variance swap pricing formulasuch that the index’s squared payoff replicates theunderlying’s variance in the risk-neutral measure.Albeit its popularity, theVIX-styled indexing pos-sesses some caveats. First, as the weights areset in the risk-neutral measure, it is not cer-tain if the weighting scheme has the optimalforecastability to future volatility, especially out-of-sample (OOS) in the market measure. Addi-tionally, it involves a large number of out-of-themoney options with ascending illiquidity. Thismakes it expensive and impractical for hedgingany tradable products associated with the index,as it requires one to hold many thinly tradedoptions.

In this paper we propose a new volatility indexingmethod to improve predictability and liquidity.To do so, we formulate a regression problem topredict realized variance by using option price asfeatures and construct a weighting scheme fromthe loadings of the regression. We experimentwith algorithms including linear and machinelearning techniques such as Ridge, FeedforwardNeural Networks (FNN) and Random Forest, andimpose constraints on model selection to makesure that the prediction can be replicated by anoption portfolio. We test the algorithms with a

time series validation approach on SPX and itsoption data.

We discover that by combing the prediction modeland the VIX-styled weighting scheme, one canachieve an index that has improved predictabil-ity and liquidity. The best performing approachis to use machine learning regression to fore-cast the deviation between the realized volatilityand the VIX-styled index’s prediction, which is aproxy of the variance risk premium. Intuitively,this approach can be interpreted as applying amachine learning algorithm to minimize the devi-ation between a human model’s prediction andthe actual outcome. Therefore, our results repre-sent a successful combination of human learningand machine learning. We also discuss suitabilityof different regression algorithms for volatilityindexing. As we will show, the tradability con-dition in fact imposes a strong constraint onalgorithm selection, which most models do notsatisfy except for piece-wise linear ones such asFNN with an ReLU activation function. Addi-tionally, we employ a machine learning featureimportance method to test every option’s con-tribution to the prediction. We find that thethe options’out-of-the-moneyness is proportionalto their predictability to realized variance, andcalls on average have higher predictability thanputs.

This paper joins a large number of literatureon variance forecasting. In the past, there hasbeen great progress on time series-based modelssuch as ARCH/GARCH (Engle, 1982; Boller-slev, 1986) and HAR (Fulvio, 2009). It has beenshown that historic volatility measures exhibitpredictability at future realized volatility. To thisend, we also experiment with historic volatilityfeatures as alternative tests to the main modeland we find that their contribution is limited inour framework. There has also been abundant

Journal Of Investment Management Second Quarter 2020

Zhibai

Cross-Out

Author name: Corsi

Using Machine Learning to Predict Realized Variance 3

study on the predictability of option price andimplied volatility at realized volatility (Federicoet al., 2008; Andersen et al., 2007; Jeff, 1998;Busch et al., 2011). More recently, applica-tion of machine learning techniques in volatilityforecasting have emerged (Luong et al., 2018;Hamid, 2004). To the best of our knowledge, thispaper is the first attempt to apply machine learn-ing to predict volatility and build a volatility indexat the same time.

The rest of this paper is organized as follows: InSection 2 we review the details of VIX-styledvolatility indices’ construction and caveats. InSection 3 we show how to process option data toformulate a realized variance prediction problem.We also discuss model selection and evaluation.This is followed by Section 4, where we presentthe main result on prediction performance. Sec-tion 5 concludes and presents directions for futureresearch.

2 Volatility index and variance prediction

Established in 1993, the CBOE Volatility Index(VIX) is one of the first equity volatility indicesand it has been broadly used as the volatil-ity/variance benchmark for the entire US equitymarket. Initially, VIX is built from only theat-the-money implied volatility of the underly-ing. In 2003, CBOE changed the pricing methodto incorporate a variance swap pricing formulawhich is based on the price of a large numberof out-of-the-money options. This new methodproved to be more robust as it covers a muchwider range of the implied volatility surface.Since then, VIX and its tradable derivatives (i.e.,options, futures and exchange-traded products)have grown considerably more popular. In recentyears, CBOE has carried out a series of VIX-styled volatility indices on other assets, includingmajor equity indices, interest rates, FX and com-modities. In this section, we review the details of

the variance swap pricing-based index weightingmethod. Schematically, the index level is calcu-lated as the squared root of the value of a portfolioof out-of-the-money (OTM) SPX options withweights inversely proportional to the squaredstrike prices:

VIX2 ∼∑ 2�K

K2O(K, τ), (1)

where O(K, τ) is the price of an option withstrike price K and time-to-maturity τ, and �K

the constant spacing between strikes. The 2�KK2

weighting scheme comes from the variance swappricing formula in (Carr and Madan, 2001; Carrand Wu, 2009). The weights are determined suchthat the portfolio’s payoff perfectly replicates thevariance of the underlying in the risk-neutralmeasure.

As for the same underlying security there arenumerous active options with different strikes andtime-to-maturity’s, it is important to have selec-tion criteria for the options that go into Equation(1). Since VIX is designed for a 30-day horizon,ideally it is best to choose options expiring inexact 30 days. However, this is not realistic asmost of the time these options are not availableon the market. CBOE applies a more general two-step procedure to select option tenors. Namely,one selects ‘near-term’ options with maturity τ1

the closest to yet less than 30 days, and ‘next-term’options with maturity τ2 the closest to yet greaterthan 30 days. These two tenors are then linearlyinterpolated to formulate an exact 30-day horizon.Secondly, for each term, one selects as many out-of-the-money options as possible, with ascending(for calls) or descending (for puts) strikes K. Forboth terms, all the strikes are included until twoconsecutive strikes are missing a quote. As such,the number of strikes varies from time to time,highly depending on the liquidity of the optionmarket.

Second Quarter 2020 Journal Of Investment Management

4 Peter Carr et al.

Once the options are selected, one computes theimplied variance for each term:

σ21 =

2

τ1

∑h

�K

K2h

eR1τ1O(Kh, τ1)

− 1

τ1

(F

K0− 1

)2

,

σ22 =

2

τ2

∑h

�K

K2h

eR2τ2O(Kh, τ2)

− 1

τ2

(F

K0− 1

)2

,

(2)

where �K = Kh+1 −Kh is the spacing in strikeprices, Ri is the interest rate for each term and F

is the forward index value. The second term onthe r-h-s in each equation is considerably smallerthan the first term. Finally VIX is computed as thesquare root of a weighted sum of the r-h-s in theabove equations up to scaling by 100:

VIX

100=√

τ1σ21

(τ2 − 30

τ2 − τ1

)+ τ2σ

22

(30− τ1

τ2 − τ1

).(3)

When there is options with 30-day time-to-maturity, one can simply compute the variance

term in Equation (2) for them without furtherinterpolation. For more details, see (VIX WhitePaper, xxxx). It is worth noting that by its nature,this pricing formula-based weighting schemeoverweights OTM put options and underweightsOTM call options, as shown in Figure 1. Thisleads to puts overweighted than calls.

As mentioned above, the normalized2 level ofVIX is commonly considered a benchmark forS&P’s 30-day realized volatility forecast. Wewill show that this measure indeed exhibits pre-dictability to realized volatility, measured bypositive out-of-sample R2. However, the weightsdetermined in the risk-neutral measure may nothave the most optimal predictability. Further-more, the option selection criteria normally pro-duce a large number of options. For instance, inthe example in (VIX White Paper, xxxx), withthe spot price at $1960, the lowest put strike isat $1370 while the highest call strike is at $2125,the entire universe contains 149 options in total.Holding that many OTM options is extremelycostly as the liquidity is very low for deep OTMoptions in general. This makes hedging chal-lenging for VIX derivatives sellers. For thesereasons, we explore machine learning algorithms

Figure 1 VIX weights as of January 2, 2019.


Zhibai

Cross-Out

Redundant

Zhibai

Underline

CBOE

Zhibai

Highlight

CBOE


Figure 2 S&P 500 price (blue), its 30-day realized return volatility (orange) and VIX (red).

to improve the VIX weighting scheme’s pre-dictability and liquidity.

3 Formulate a machine learningregression problem

In this section we propose a new method tobuild volatility indices using a machine learn-ing approach. More specifically, we set up asupervised machine learning regression problemthat uses option price to forecast future returnvolatility. By taking the regression loadings asweights, we then construct an option portfolio asthe volatility index.

To begin with, we briefly review the basis ofthe supervised machine learning paradigm. With-out loss of generality, a supervised ML algorithmcan be summarized as a function approximationproblem aimed to find a function f(·) such that

y = f (�x),

f = arg min{Err(f(�x), y)}(4)

where Err(f(�x), y) is a pre-defined object func-tion defined on the sample data set (y, �x).For many ML algorithms, f(·) is either semi-parametric (with a large number of parameters)

or non-parametric (can only be carried out opera-tionally and does not have a closed-form expres-sion). y is usually referred to as the target valueand entries in �x are referred to as features. Next,let us specify the features and target value that issuitable for a volatility index.

3.1 Data processing and feature generation

We use daily data of options written on SPX witha time span from 1996 to 2016, which includesmore than 5,000 unique trading days. The optionmarket data is obtained from OptionMetrics. Oneach trading day, the data contains midquotesof options with multiple maturities and strikes.A sample data set is shown in Figure 3. Further-more, interest rate and SPX spot and forward pricedata are also used.

Generating features from option price time seriesproves to be a non-trivial task. There are twoaspects of this data set that raise problems for amachine learning formulation. First, the activelytraded options are those whose strikes are cen-tered around the spot, which varies all the time.As a result, the set of OTM options needs to bere-selected daily. Moreover, the options’ maturi-ties continue to decrease until expiry. In general,


david

Cross-Out

6 Peter Carr et al.

Figure 3 A sample of the OptionMetrics’ option market data. The fields relevant to the analysis are tradingdate (Date), expiration date (expiry), spot price of SPX (S), short-term interest rate (R), call/put type (CP), bidprice (Bid) and ask price (Ask).

most machine learning algorithms require thefeatures to be stationary. Since our goal is touse option price to forecast the realized volatil-ity with a fixed horizon, it is also importantto have the options’ maturities in sync with theforecast horizon. Fortunately, the construction ofVIX provides a solution to the varying matu-rity issue. Namely, we can linearly interpolatethe option with the closest maturities to the fore-cast horizon as in Equation (3). Since regressionproblems require a fixed number of features, weneed to select a constant number OTM optionswith strikes centered around the spot price. Puttogether, we generate raw option price features asfollows:

Ot(Kh, T ) = Ot(Kh, τ1)

(τ2 − T

τ2 − τ1

)

+Ot(Kh, τ2)

(T − τ1

τ2 − τ1

), (5)

where T is the forecast horizon (in the case ofVIX this is 30 day), τ1 and τ2 are the two closestexisting maturities to T of all available optionsat day t. For each day, we select N = 2n + 1of strikes Kh centered around the at-the-moneystrike K0:

{K−n, K−n+1, . . . , K−1,

K0, K1, . . . , Kn−1, Kn}, (6)

where �K = Ki − Ki−1 (for SPX options,�K = $5). It needs to be understood that theat-the-money strike K0 changes every day fol-lowing the spot price St . Therefore, the strikeset Equation (6) is determined on a daily basis.Notice that as n becomes larger, the correspond-ing strike Kn is more out-of-the-money and theoption tends to be less frequently traded. If for aspecific Kh, the option does not have a quote, welinearly interpolate the midquote using midquotesof the options with the two closest strikes to Kh.This way, Equation (5) gives rise to a consistentfeature generation once the number of features N

and forecast horizon T are chosen for an optiontime series. The detailed feature generating pro-cedure is shown in the code snippets at the end ofthis subsection.

In machine learning for most algorithms, it isimportant to normalize features to meet station-arity conditions. For each feature, a commonpractice is to subtract its sample mean and divideit by its sample standard deviation. We applythis technique and normalize the features usingthe training sets. In addition, there is sub-tlety related to financial time series, which isthat a security’s price is a non-stationary pro-cess as otherwise there is arbitrage. Therefore,option price features in Equation (5) contain thisspecific nonstationarity that cannot be removedby the standard machine learning normalization



procedure. A rationale is that, an option on SPXworth $5 in 2010 is not the same as an optionworth the same thing in 2015, as the level ofSPX has grown considerably over that period.For this reason, we apply the following pre-processing before the standard ML normalizationto the option price features in Equation (5):

Ot(Kh, T) = Ot(Kh, T)

K2h

. (7)

In addition to the main option price features, wealso experiment with returns-based idiosyncraticfeatures of SPX as alternative tests. These includerealized returns and variance with different look-back windows.

• Realized returns, rt,l = pt−pt−l

pt−lwith l ∈ {1, 5,

15, 30, 60, 90}• Realized variance, vart,l = 1

l

∑li=1(rt−i− r)2,

with l ∈ {15, 30, 60, 90}

We only consider these features in combinationwith the option features, but not their predictabil-ity individually. This is mainly because that oncethese features are added, the prediction is nolonger tradable as they cannot be replicated byany portfolio.

3.2 Target value and two regressionapproaches

Though volatility and variance are virtually thesame variable, we are predominantly focused onforecasting future realized return variance. Thisis to avoid the extra step of taking square rootin Equation (1). The realized return variancebetween time t and t + T is given by

VarTt =

1

T

T∑i=1

(rt+i − r)2, (8)

where rj = pj−pj−1pj

is the security’s daily returnat time j and r is the average return between t+1and t + T . Sometimes the mean return r is omit-ted as it is close to zero in most cases. Volatilityis the square root of the variance and it is moreoften quoted in the derivative markets. In thispaper, we use the terms volatility and varianceinterchangeably.

Our first approach is to directly model futurerealized variance as a function of option price.Mathematically, this is

VarTt = f({Ot(Ki, T

′j)})+ εt, (9)

where {Ot(Ki, T′j)} is all the options across

selected strikes and tenors and εt is a zero-mean

Algorithm 1. Strike selection

1 Input: {1, 2, . . . , D} = timestamps; ATMt= ATM strike on day-t; n=# of selected OTM put/call2 Output: KS = selected strikes3 KS ← ∅

4 for Each day t ∈ {1, 2, . . . , D} do5 KSp← ∅, KSc← ∅

6 for i ∈ {1, . . . , n} do7 Append (ATMt − i×�K) to KSp

8 Append (ATMt + i×�K) to KSc

9 end10 KSt ← KSp

⋃KSc

⋃{ATMt}11 Append KSt to KS

12 end


8 Peter Carr et al.

Algorithm 2. Data processing and feature engineering

1 Input: {1, 2, . . . , D} = timestamps; KS = selected strikes; T = forecast horizon2 Output: {O} = option features; VIX∗ = synthetic VIX index3 for Each day t ∈ {1, 2, . . . , D} do4 if On day-t there exists options with exact 30-day maturity then5 Select all options with tenor τ = 30 and strike K ∈ KSt

6 if There exists missing price for selected options then7 Fill missing price by liner interpolation in the selected strike set8 end9 Collect selected options’ price {Ot}; Compute VIX∗t with τ = 30

10 end11 else12 Find near-term τ1← argminτ<30|τ − 30|13 Find next-term τ2← argminτ>30|τ − 30|14 for τ ∈ {τ1, τ2} do15 if There exists missing price for selected options then16 Fill missing price by liner interpolation in the selected strike set17 end18 Collect selected options’ price {Ot}; Compute VIX∗t with τ’s using Equation (3)19 end20 Generate features using options’ price with τ1 and τ2 from Equation (5)21 end22 end

noise term. As mentioned above, the functionf will be determined by fitting the algorithmon the training data set. We call this approachRegression I.

An ML-based function approximation such asEquation (9) can be useful in estimating functionswith high non-linearity and non-parametricity.This makes Equation (9) desirable as realizedvariance is expected to be strongly non-linear.However, the large number of parameters andcomplex optimization involved in ML algorithmsmay also be problematic, as they can lead tooverfitting due to outliers in the training set (e.g.extremely high volatility events). Additionally,as VIX-styled weighting scheme already containssome predictive power, in principle we would like

to incorporate it with the regression approach too.This lead to our second regression approach:

VarTt = VIX ∗({Ot(Ki, T

′j)})2

+ f({Ot(Ki, T′j)})+ εt, (10)

where VIX∗ is the synthetic VIX index using theselected options. The only difference betweenthe two is that VIX∗ contains a fixed numberof options throughout time. We call approachEquation (10) Regression II.

Note that if one switches the VIX∗2 to the l-h-s,then it is equivalent to regress a variance riskpremium (VRP) proxy, which is

VRPt = VIX∗2t − Vart . (11)


Zhibai

Sticky Note

leads

david

Cross-Out


So this approach is to train the algorithm to fore-cast VRPt and combine it with VIX∗ to forecastrealized volatility. Though Equation (10) is a spe-cial case of Equation (9), the two approachesin general produce different results in non-linearcases. Regressing the difference between VarT

t

and VIX∗2 rather than directly regressing VarTt

can be considered as a normalization procedure.In non-linear models, proper normalization canoften substantially improve an algorithm’s fittingand performance. We will show that RegressionII indeed outperforms Regression I.

3.3 Algorithms

One advantage of machine learning is that thealgorithm can be highly adaptive. In the form ofEquation (4), this means that f usually does nothave a close form expression. Since we want to befocused on an indexing problem, this may causetrouble. In both of the two regression approachesEquations (9) and (10), if we require that theforecast can be replicated by an option portfo-lio, we need f(·) to be at least a piece-wiselinear function. This is actually a strong constraintthat renders many ML regression methods notsuitable. For instance, for K-nearest neighborsregression the forecast is made on the sub-samplemean of the training set and is clearly not piece-wise linear. In that case, obviously one cannotbuild an option portfolio whose payout is theforecast.

In this paper, we consider four regression algo-rithms: linear regression, ridge regression, feed-forward neural network regression with ReLUactivation function and random forest regres-sion. We start off with linear regression for itssimplicity. Compared with ML algorithms, lin-ear regression has the advantage that it doesnot involve tunable parameters which makes itthe least prone to overfitting. However, linearregression also has a few shortcomings, such as its

lacking of non-linearity. Another potential issueis that, since our main feature set is a basket ofoptions that have strong correlation, it may causetrouble for the ordinary least square fitting. Forthis reason, we consider ridge regression, whichhas the same functional form as linear regressionbut the error function is L2 regularized.

Feedforward neural network (FNN) is one of thesimplest neural network models. The graphic rep-resentation of a neural network is composed of anumber of layers, including an input layer cor-responding to the features and an output layercorresponding to the labels. The rest is referredto as hidden layers. Each layer contains severalneurons and different layers are connected by acertain topology. Mathematically, both neuronsand connections between neurons correspond tovariables of the prediction functions. A feedfor-ward neural network with one hidden layers hasthe following functional form

y(xxx,www) = O

⎛⎝∑

j

w(2)j · h

(∑i

w(1)ji xi + w

(1)j0

)

+w(2)0

), (12)

where O and H are so-called activation functionson the output and hidden layers, wp

ji is connectionweight that connects the i-th neuron on the (p−1)-th layer to the j-th neuron on the p-th layer.Activation functions are functions that filter theinput information determine the amount of outputinformation, and are usually non-linear. For ourpurposes, we select rectified linear unit (ReLU) asthe activation function for both the hidden layerand the output layer. This is because among themost commonly used activation functions, onlyReLU is both non-linear and piece-wise linear.ReLU is defined as

ReLU(x) = max{0, x}. (13)


Zhibai

Sticky Note

input information and determine ...

10 Peter Carr et al.

With an ReLU FNN, y is piece-wise linear:

y(xxx,www) =∑i,j

1(2),jw(2)j w

(1)ji xi + const. (14)

where 1(p),k is an indicator function that corre-sponds to the ReLU activation function on the k-thneuron on the p-th layer. Now it is evident thatReLU FNN is piece-wise linear and thus suitablefor our tradability constraint. For model simplic-ity, we construct FNN with only one hidden layer,whose number of neurons is the same as the inputlayer.

On the contrary, some ML algorithms are notpiece-wise linear. Random forest is a very pop-ular algorithm in this category. Random forestis an ensemble algorithm that consists of mul-tiple base algorithms called decision trees. Fordecision tree regression, the algorithm iterativelysplits the input data on the features such that theinformation gain from splitting a parent sampleto children samples is optimized. After the algo-rithm is trained, each sample (regardless of thetraining or the test set) can be run through thetree and land on one of the many sub-samplesafter splitting the original training sample. Theprediction is then given as the mean value of thedependent variable in the specific sub-sample. Inrandom forest regression, multiple decision treeregressions are fitted, each on a randomly sam-pled training set to increase robustness, and thefinal prediction is an average of the prediction ofall decision trees. So schematically, the forecastfrom random forest regression is

y(x) = 1

NI

∑i∈I

yi, (15)

where I is the subset the (x, y) belongs to, NI

is the total number of samples in this subset andyi is the value of the dependent variable of thei-th sample in the subset. Relating to our setting,this means that when using random forest, thepredicted variance is a function of the realized

variances of at selected past timestamps, whichclearly cannot be the payoff of an option portfolio.More straightforwardly, Equation (15) is not apiece-wise linear function. Nonetheless, we willkeep random forest in the test for predictabilitycomparison.

3.4 Validation, evaluation and modelcalibration

To evaluate all algorithms and feature combi-nations, we formulate an out-of-sample (OOS)test on the data set in a time series fashion. Todo so, we reserve the first 1,000 observationsfor both hyperparameter optimization and initialtraining, and we make OOS prediction on thefirst 30 observations on the remaining set. Were-train the model after every 30 observationswith all the available observations on a rollingbasis. Every time the training set is purged to getrid of the observations that has an informationaloverlap with the test set (see (Lopez de Prado,2018) for similar techniques). Obviously a dif-ferent combination of the size of each trainingset and the frequency of model retraining mayvary the performance of each model. We do nottest other validation settings as this may causeoverfitting due to the limited amount of data.A graphic representation of the OOS test is shownin Figure 4.

For prediction performance metrics, we useOOS R2

R2 = 1−∑

i(yi − pi)2∑

i(yi − y)2, (16)

where yi is the actual realized variance for samplei, y is the mean realized variance of all samplesand pi is the model predicted realized variancefor sample i. For Regression II (Equation (10)),even though the direct prediction is the nega-tive variance risk premium, we convert it back


david

Cross-Out


Figure 4 An illustration of the rolling validation process.

to realized variance by adding VIX∗2 when com-puting the R2. This way the R2 for both regressionapproaches is comparable.

Most if not all ML algorithms contain multiplehyperparameters that need to be tuned. Acommonpractice is to optimize a large number of hyperpa-rameter combinations using cross-validation andchoose the combination that performs the best.It is worth noting that tuning hyperparametersunavoidably raises the likelihood of overfitting asa trade-off. This is especially true with financialtime series data, which exhibit very low signal-to-noise ratio. Therefore, we need to keep thenumber and kind of hyperparameters as low aspossible. To achieve this, for each of the threeselected algorithms, we only tune a hyperparam-eter associated with the regularization strength.More specifically:

• for Ridge, the L2 regularization strength λ ∈{10−3, 10−2, 10−1, 1, 102, 103, 104}• for ReLU FNN, the L2 regularization strength

λ ∈ {10−3, 10−2, 10−1, 1, 102, 103, 104}• for Random Forest, maximum tree depth ∈{3, 5, 10,∞}.

We note that, the L2 regularization strength forboth Ridge and ReLU FNN in Regression IIEquation (10) has a fairly clear interpretation: Itrepresents how much the new weights deviatedfrom the old VIX’s weights. For each algorithm,we optimize the hyperparameter in the initial1,000 sample training set by taking the first 900samples as training and the rest 100 samples as

test. For FNN, we use Adaptive Moment Esti-mation (Adam) as the solver. All the algorithmsare implemented in Python using the Scikit-learnpackage (Pedregosa et al., 2011).

4 Main results

In this section we present our main finding. First,we show the OOS R2 for both regression meth-ods with all four algorithms. We conduct theexperiment with a varying number of consecutivestrike prices and two different forecast horizons.The performance is compared across differentalgorithms. Then we report the results of somealternative tests, including using non-consecutivestrike prices and adding historic volatility andreturns as extra features. Last but not least, weapply a robust machine learning feature impor-tance analysis on the option price features to testthe contribution of each option to forecast futurerealized volatility. To our surprise, it turns out thatcalls have higher importance than puts, in contrastto VIX’s weighting scheme.

4.1 T = 30 days

The first forecast horizon we test is 30 days,which is the designated horizon for VIX. For eachalgorithm, we use 2k + 1 consecutive options asfeatures (k OTM put, k OTM call and 1 ATM),with k ∈ {10, 20, 30, 40}. We present the per-formance for the two regression approaches, andhighlight the best performing algorithm with aspecific number of options.



• OOS R2 for Regression I,

# ofoptions VIX∗2 Linear Ridge RF FNN

21 0.169 0.313 0.276 −0.018 0.15441 0.313 0.191 0.329 −0.021 0.11461 0.366 0.163 0.339 −0.023 0.25181 0.389 0.153 0.334 −0.026 0.339

• OOS R2 for Regression II,


21 0.169 0.313 0.290 0.227 0.22741 0.313 0.191 0.329 0.327 0.32661 0.366 0.163 0.326 0.371 0.36881 0.389 0.153 0.312 0.392 0.394

Combining the two regression methods’ resultsand best performance of each algorithm for aspecific number of options, the overall modelcomparison is shown in Figure 5. First, it isapparent that Regression II produces greater per-formance than Regression I, as the OOS R2 forthe former is higher than that of the latter forall non-linear algorithms (for VIX∗ and linearregression, I and II are equivalent). It is alsoobvious that including more options enhancesOOS R2, except for linear regression. This isbecause adding more options whose prices are

Figure 5 T = 30 days. OOS R2 for different algo-rithms, optimal performance combining Regressions Iand II.

highly correlated deteriorates linear regression’sfitting. Nonetheless, when the number of optionsis small (Equation (21)), linear regression actuallyoutperforms the others. As the number of optionsincreases, FNN with Regression II becomes thebest one, though the difference to other methodsis only incremental3.

4.2 T = 60 days

Next we conduct a similar test with T = 60 days.We present the performance for the two regressionapproaches, and highlight the best performingalgorithm with a specific number of options.

• OOS R2 for Regression I,


21 0.300 0.264 0.226 −0.015 0.09441 0.206 0.269 0.276 −0.023 0.09961 0.014 0.247 0.303 −0.022 0.14081 −0.155 0.216 0.313 −0.025 0.146

• OOS R2 for Regression II,


21 0.300 0.264 0.247 0.296 0.29741 0.206 0.269 0.288 0.339 0.33961 0.014 0.247 0.299 0.299 0.29781 −0.155 0.216 0.294 0.243 0.244

Combining the two regression methods’ resultsand best performance of each algorithm for aspecific number of options, the overall modelcomparison is shown in Figure 6. Interestingly, asthe forecast horizon goes up the benefit of includ-ing more options diminishes. In most cases, theperformance becomes worse after the number ofoptions is above a certain value. For 60-day hori-zon the optimal number of options is 41, withFNN, Ridge and Random Forest being quite close.

4.3 Alternative tests

To further verify our models’ predictability, werun a couple of alternative tests in addition to the


david

Cross-Out


Figure 6 T = 60 days. OOS R2 for different algo-rithms, optimal performance combining Regressions Iand II.

results reported above. First, as mentioned in Sec-tion (3.1), we add returns-based features to themain option-based features. Since a key of ourapproach is to maintain both predictability andtradability, it is important to note that these fea-tures are not tradable and cannot be included inan index. Nonetheless, we add these features tosee how much predictability they may contributeto the main option features. To do this, we re-run the OOS test using Regression II for T = 30days, with 41 consecutive options and the 10 real-ized returns and variance features. The OOS R2’sare:

Options and realizedalgorithms Options returns/variance

Linear 0.191 0.124Ridge 0.329 0.335

Random 0.327 0.331ForestFNN 0.326 0.330

It shows that except for linear regression, includ-ing realized returns and variance features canimprove the performance, even though the changeis incremental.

Since the options of the same underlying tendto have strong correlations, it is interesting tosee if one can increase the spacing in strikeswhen selecting options. By selecting options withlarger distance in their strikes, one gets the ben-efit of using fewer options, which correspondsto greater liquidity. So far we have only testedthe algorithms with consecutive strikes, namely,the spacing is $5. Here we re-run FNN usingregression II with 21 and 41 options and increasedspacing of $10. The results are quite interesting:

# of options �K = 5 �K = 1021 0.227 0.25341 0.326 0.268

It shows that having consecutive strikes is in factnecessary to maintain the predictability, as whenthe number of options is 41 the performances with�K = 5 and �K = 10 diverge and the largerspacing set performs less well.

4.4 Feature importance for calls and puts

To better understand the mechanism of the newapproach, we investigate how the regressionmethods weight the options with strikes. To checkthis, a straightforward way is to compare the newweights in Equations (9) and (10) with VIX’sweights in Equation (3). However, this has adrawback: as the new weights are determined byregression loadings that are fitted on training data,the larger weights do not necessarily correspondto higher importance to OOS prediction. Forthis reason, we apply a feature importance mea-sure called mean-decreased-accuracy (MDA), amethod extensively used in the machine learningliterature.

The MDA procedure computes the importance ofeach feature as follows: (a) one trains an algorithmon the training set using all the original features;(b) predictions are made on the test set, and a



performance measure (e.g. R2) is recorded as p0;(c) values of one of the features, i, in the test setare randomly shuffled and predictions are re-madeon the test set; and (d) the performance associatedwith the shuffled feature i is recorded as pi. TheMDA feature importance for i is then

MDA(i) = p0 − pi

p0. (17)

We apply MDAto the data set with these specifica-tions: we take Regression II with T = 30 days and61 options using Ridge regression. The featureimportance is shown in Figure 7. We observe twointeresting patterns: (a) for both puts and calls,the more OTM, the higher the feature importanceis; (b) calls are on average more important thanputs. This is surprising as the importances aresomewhat different from that of VIX’s weight-ing scheme. It furthers shows that the distinctionbetween results drawn in the risk-neutral measureand OOS predictions.

5 Conclusion

In this paper we focused on a machine learning-based realized variance prediction and indexingproblem. Inspired by the predictability of VIX asa volatility index to SPX’s 30-day realized vari-ance, we proposed a new indexing method. Morespecifically, we formulated a regression problemto predict realized variance by using option priceas features using SPX and its option data. Wetested algorithms including linear and machinelearning regression methods such as Ridge, Feed-forward Neural Networks (FNN) and RandomForest. It was demonstrated that when the algo-rithm is piece-wise linear, the prediction can bereplicated by an option portfolio, which gives riseto a new volatility index.

We discovered that by combining the predictionmodel and the VIX-styled weighting scheme, thenew index can achieve greater predictability andliquidity. In this approach the machine learning

Figure 7 Feature importance across all options. The x-axis is the options’ moneyness, e.g. −30 stands for theOTM put struck −30 ·�K away from ATM. Hence the left half is for OTM puts and the right half is for OTMcalls. The y-axis is the feature importance of each option measure by the percentage improvement to the OOSvolatility prediction.



algorithms are applied to correct the deviationbetween VIX’s prediction and the actual real-ized volatility. It was shown that this approachoutperformed VIX-styled weighting scheme andmachine learning individually. Therefore, weargue that it represents a successful combina-tion of human learning and machine learning. Inaddition, we employed a machine learning fea-ture importance method to test every option’scontribution to the prediction. While in VIX’sweighting scheme the weights monotonicallydecrease with the strike price, we found that in thenew approach the more out-of-the-money optionshave greater predictability and calls on averagehave higher importance than puts.

Before closing, we highlight a few future direc-tions along the line. First it is worth lookinginto the predictability of other predictors such asmacro and cross-asset factors. If these featuresare tradable, namely, they can be replicated bytradable securities, then the method can be gener-alized to a broader volatility index consisting ofoptions and other instruments. Secondly, it willbe useful to test the framework with higher fre-quency data. This way, with potentially muchlarger data sets, the power of machine learningmay be enhanced. It will also be interesting to gen-eralize the Regression II approach in this paper toother forecast problems and investigate whetherML can improve existing parametric models.As shown in this paper, the human + machinelearning approach can outperform each of themindividually. It will be intriguing to investigate ifthis works in other areas.

Notes1 Throughout this paper, we will use the terms volatility

and variance interchangeably. For most cases, they areequivalent as volatility is just the square root of variance.Nonetheless, our prediction is formulated on varianceinstead of volatility as discussed in Section 3.

2 Normalization is to divide the level by 100 · √252.

3 In Gu et al. (2018), the authors report a similar incre-mental enhancement from ML in the case of assetpricing.

Acknowledgment

We would like thank Vasant Dhar, Rajesh Krish-namachari, Dacheng Xiu and Haoxiang Zhu fortheir valuable comments and suggestions. Theyare not responsible for any errors.

References

Andersen, Torben G., Per Frederiksen., and Arne D. Staal.(2007). “The Information Content of Realized VolatilityForecasts,” Northwestern University, Nordea Bank, andLehman Brothers.

Black, F. and Scholes, M. (1973). “The Pricing of Optionsand Corporate Liabilities,” Journal of Political Economy81(3), 637–654.

Busch, Thomas, Bent Jesper Christensen., and Morten Øre-gaard Nielsen. (2011). “The Role of Implied Volatility inForecasting Future Realized Volatility and Jumps in For-eign Exchange, Stock, and Bond Markets,” Journal ofEconometrics 160(1), 48–57.

Bollerslev, Tim. (1986). “Generalized Autoregressive Con-ditional Heteroskedasticity,” Journal of Econometrics31(3), 307–327.

Carr, Peter. and Dilip, Madan. (2001). “Towards a Theoryof Volatility Trading,” Option Pricing, Interest Rates andRisk Management, Handbooks in Mathematical Finance,458–476.

Carr, Peter. and Liuren, Wu. (2009). “Variance Risk Premi-ums,” The Review of Financial Studies 22(3), 1311–1341.

Dupire, Bruno. (1994). “Pricing with a Smile,” Risk 7.1,18–20.

Engle, Robert F. (1982). “Autoregressive Conditional Het-eroscedasticity with Estimates of the Variance of UnitedKingdom Inflation,” Econometrica 50(4), 987–1007.

Fulvio, Corsi. (2009). “A Simple Approximate Long-Memory Model of Realized Volatility,” Journal of Finan-cial Econometrics 7(2), 174–196.

Federico, M. Bandi, Jerey R., and Russell, Chen Yang.(2008). “Realized Volatility Forecasting and Option Pric-ing,” Journal of Econometrics 147(1), 34–46.

Gu, S., Kelly, B., and Xiu, D. (2018). “Empirical AssetPricing via Machine Learning,”

Hamid, Shaikh A. (2004). “Primer on Using Neural Net-works for Forecasting Market Variables,”



Heston, Steven L. (1993). “A Closed-Form Solution forOptions with Stochastic Volatility with applications toBond and Currency Options,” The Review of FinancialStudies 6.2, 327–343.

Jeff, Fleming. (1998). “The Quality of Market VolatilityForecasts Implied by S&P 100 Index Option Prices,”Journal of Empirical Finance 5(4), 317–345.

Luong, Chuong., and Nikolai, Dokuchaev. (2018). “Fore-casting of Realised Volatility with the Random ForestsAlgorithm,” Journal of Risk and Financial Management11(4), 61.

Lopez de Prado, M. (2018). Advances in Financial MachineLearning.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cour-napeau, D., Brucher, M., Perrot, M., and Duchesnay, E.(2011). 117 “Scikit-learn: Machine learning in Python,”Journal of Machine Learning Research 12, 2825–2830.

“VIX White Paper,” CBOE.

Keywords:


J O I JOIM - Home | NYU Tandon School of Engineering

Documents

Transcript of J O I JOIM - Home | NYU Tandon School of Engineering