A Tale of Two Time Scales: Determining Integrated...

18
A Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency Data Lan ZHANG, Per A. MYKLAND, and Yacine AÏT -SAHALIA It is a common practice in finance to estimate volatility from the sum of frequently sampled squared returns. However, market microstructure poses challenges to this estimation approach, as evidenced by recent empirical studies in finance. The present work attempts to lay out theoretical grounds that reconcile continuous-time modeling and discrete-time samples. We propose an estimation approach that takes advantage of the rich sources in tick-by-tick data while preserving the continuous-time assumption on the underlying returns. Under our framework, it becomes clear why and where the “usual” volatility estimator fails when the returns are sampled at the highest frequencies. If the noise is asymptotically small, our work provides a way of finding the optimal sampling frequency. A better approach, the “two-scales estimator,” works for any size of the noise. KEY WORDS: Bias-correction; Market microstructure; Martingale; Measurement error; Realized volatility; Subsampling. 1. INTRODUCTION 1.1 High-Frequency Financial Data With Noise In the analysis of high-frequency financial data, a ma- jor problem concerns the nonparametric determination of the volatility of an asset return process. A common practice is to estimate volatility from the sum of the frequently sampled squared returns. Although this approach is justified under the assumption of a continuous stochastic model in an idealized world, it runs into the challenge from market microstructure in real applications. We argue that this customary way of esti- mating volatility is flawed, in that it overlooks the observation error. The usual mechanism for dealing with the problem is to throw away a large fraction of the available data by sampling less frequently or constructing “time-aggregated” returns from the underlying high-frequency asset prices. Here we propose a statistically sounder device. Our device is model-free, takes advantage of the rich sources of tick-by-tick data, and to a great extent corrects for the adverse effects of microstructure noise on volatility estimation. In the course of constructing our es- timator, it becomes clear why and where the “usual” volatility estimator fails when returns are sampled at high frequencies. To fix ideas, let S t denote the price process of a security, and suppose that the process X t = log S t follows an Itô process, dX t = µ t dt + σ t dB t , (1) where B t is a standard Brownian motion. Typically, µ t , the drift coefficient, and σ 2 t , the instantaneous variance of the returns process X t , will be (continuous) stochastic processes. The pa- rameter of interest is the integrated (cumulative) volatility over one or successive time periods, T 1 0 σ 2 t dt , T 2 T 1 σ 2 t dt,... . A nat- ural way to estimate this object over, say, a single time interval from 0 to T is to use the sum of squared returns, [X, X] T = t i ( X t i+1 X t i ) 2 , (2) Lan Zhang is Assistant Professor, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 (E-mail: [email protected]). Per A. Mykland is Professor, Department of Statistics, The University of Chicago, Chicago, IL 60637 (E-mail: [email protected]). Yacine Aït-Sahalia is Professor, Department of Economics and Bendheim Cen- ter for Finance, Princeton University and NBER, Princeton, NJ 08544 (E-mail: [email protected]). This research was supported by National Science Foundation grants DMS-02-04639 (Zhang and Mykland) and SBR-0111140 (Aït-Sahalia). The authors are grateful for comments received at the IMS New Researchers’ Conference at UC Davis (July 2003), the Joint Statistical Meet- ings in San Francisco (August 2003) and the CIRANO Conference on Realized Volatility in Montréal (November 2003). They also thank Peter Hansen for interesting discussions and the referees for their feedback and suggestions. where the X t i ’s are all of the observations of the return process in [0, T ]. The estimator t i (X t i+1 X t i ) 2 is commonly used and generally called “realized volatility” or “realized variance.” A sample of the recent literature on realized and integrated volatilities includes works by Hull and White (1987), Jacod and Protter (1998), Gallant, Hsu, and Tauchen (1999), Chernov and Ghysels (2000), Gloter (2000), Andersen, Bollerslev, Diebold, and Labys (2001), Dacorogna, Gençay, Müller, Olsen, and Pictet (2001), Barndorff-Nielsen and Shephard (2002), and Mykland and Zhang (2002). Under model (1), the approximation in (2) is justified by the- oretical results in stochastic processes stating that plim t i ( X t i+1 X t i ) 2 = T 0 σ 2 t dt, (3) as the sampling frequency increases. In other words, the estima- tion error of the realized volatility diminishes. According to (3), the realized volatility computed from the highest-frequency data should provide the best possible estimate for the integrated volatility T 0 σ 2 t dt. However, this is not the general viewpoint adopted in the empirical finance literature. It is generally held there that the returns process X t should not be sampled too often, regard- less of the fact that the asset prices can often be observed with extremely high frequency, such as several times per second in some instances. It has been found empirically that the realized volatility estimator is not robust when the sampling interval is small. Such issues as larger bias in the estimate and nonrobust- ness to changes in the sampling interval have been reported (see, e.g., Brown 1990). In pure probability terms, the observed log return process is not in fact a semimartingale. Compelling visual evidence of this can be found by comparing a plot of the realized volatility as a function of the sampling frequency with theorem I.4.47 of Jacod and Shiryaev (2003); the real- ized volatility does not converge as the sampling frequency in- creases. In particular this phenomenon can be observed for any of the 30 stocks composing the Dow Jones Industrial Average. The main explanation for this phenomenon is a vast array of issues collectively known as market microstructure, including, but not limited to, the existence of the bid–ask spread. When © 2005 American Statistical Association Journal of the American Statistical Association December 2005, Vol. 100, No. 472, Theory and Methods DOI 10.1198/016214505000000169 1394

Transcript of A Tale of Two Time Scales: Determining Integrated...

Page 1: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

A Tale of Two Time Scales Determining IntegratedVolatility With Noisy High-Frequency Data

Lan ZHANG Per A MYKLAND and Yacine AIumlT-SAHALIA

It is a common practice in finance to estimate volatility from the sum of frequently sampled squared returns However market microstructureposes challenges to this estimation approach as evidenced by recent empirical studies in finance The present work attempts to lay outtheoretical grounds that reconcile continuous-time modeling and discrete-time samples We propose an estimation approach that takesadvantage of the rich sources in tick-by-tick data while preserving the continuous-time assumption on the underlying returns Under ourframework it becomes clear why and where the ldquousualrdquo volatility estimator fails when the returns are sampled at the highest frequenciesIf the noise is asymptotically small our work provides a way of finding the optimal sampling frequency A better approach the ldquotwo-scalesestimatorrdquo works for any size of the noise

KEY WORDS Bias-correction Market microstructure Martingale Measurement error Realized volatility Subsampling

1 INTRODUCTION

11 High-Frequency Financial Data With Noise

In the analysis of high-frequency financial data a ma-jor problem concerns the nonparametric determination of thevolatility of an asset return process A common practice isto estimate volatility from the sum of the frequently sampledsquared returns Although this approach is justified under theassumption of a continuous stochastic model in an idealizedworld it runs into the challenge from market microstructurein real applications We argue that this customary way of esti-mating volatility is flawed in that it overlooks the observationerror The usual mechanism for dealing with the problem is tothrow away a large fraction of the available data by samplingless frequently or constructing ldquotime-aggregatedrdquo returns fromthe underlying high-frequency asset prices Here we proposea statistically sounder device Our device is model-free takesadvantage of the rich sources of tick-by-tick data and to a greatextent corrects for the adverse effects of microstructure noiseon volatility estimation In the course of constructing our es-timator it becomes clear why and where the ldquousualrdquo volatilityestimator fails when returns are sampled at high frequencies

To fix ideas let St denote the price process of a security andsuppose that the process Xt = log St follows an Itocirc process

dXt = microt dt + σt dBt (1)

where Bt is a standard Brownian motion Typically microt the driftcoefficient and σ 2

t the instantaneous variance of the returnsprocess Xt will be (continuous) stochastic processes The pa-rameter of interest is the integrated (cumulative) volatility overone or successive time periods

int T10 σ 2

t dt int T2

T1σ 2

t dt A nat-ural way to estimate this object over say a single time intervalfrom 0 to T is to use the sum of squared returns

[XX]T=sum

ti

(Xti+1 minus Xti

)2 (2)

Lan Zhang is Assistant Professor Department of Statistics CarnegieMellon University Pittsburgh PA 15213 (E-mail lzhangstatcmuedu)Per A Mykland is Professor Department of Statistics The Universityof Chicago Chicago IL 60637 (E-mail myklandgaltonuchicagoedu)Yacine Aiumlt-Sahalia is Professor Department of Economics and Bendheim Cen-ter for Finance Princeton University and NBER Princeton NJ 08544 (E-mailyacineprincetonedu) This research was supported by National ScienceFoundation grants DMS-02-04639 (Zhang and Mykland) and SBR-0111140(Aiumlt-Sahalia) The authors are grateful for comments received at the IMS NewResearchersrsquo Conference at UC Davis (July 2003) the Joint Statistical Meet-ings in San Francisco (August 2003) and the CIRANO Conference on RealizedVolatility in Montreacuteal (November 2003) They also thank Peter Hansen forinteresting discussions and the referees for their feedback and suggestions

where the Xti rsquos are all of the observations of the return processin [0T] The estimator

sumti(Xti+1 minus Xti)

2 is commonly usedand generally called ldquorealized volatilityrdquo or ldquorealized variancerdquoA sample of the recent literature on realized and integratedvolatilities includes works by Hull and White (1987) Jacod andProtter (1998) Gallant Hsu and Tauchen (1999) Chernov andGhysels (2000) Gloter (2000) Andersen Bollerslev Dieboldand Labys (2001) Dacorogna Genccedilay Muumlller Olsen andPictet (2001) Barndorff-Nielsen and Shephard (2002) andMykland and Zhang (2002)

Under model (1) the approximation in (2) is justified by the-oretical results in stochastic processes stating that

plimsum

ti

(Xti+1 minus Xti

)2 =int T

0σ 2

t dt (3)

as the sampling frequency increases In other words the estima-tion error of the realized volatility diminishes According to (3)the realized volatility computed from the highest-frequencydata should provide the best possible estimate for the integratedvolatility

int T0 σ 2

t dtHowever this is not the general viewpoint adopted in the

empirical finance literature It is generally held there that thereturns process Xt should not be sampled too often regard-less of the fact that the asset prices can often be observed withextremely high frequency such as several times per second insome instances It has been found empirically that the realizedvolatility estimator is not robust when the sampling interval issmall Such issues as larger bias in the estimate and nonrobust-ness to changes in the sampling interval have been reported(see eg Brown 1990) In pure probability terms the observedlog return process is not in fact a semimartingale Compellingvisual evidence of this can be found by comparing a plot ofthe realized volatility as a function of the sampling frequencywith theorem I447 of Jacod and Shiryaev (2003) the real-ized volatility does not converge as the sampling frequency in-creases In particular this phenomenon can be observed for anyof the 30 stocks composing the Dow Jones Industrial Average

The main explanation for this phenomenon is a vast array ofissues collectively known as market microstructure includingbut not limited to the existence of the bidndashask spread When

copy 2005 American Statistical AssociationJournal of the American Statistical Association

December 2005 Vol 100 No 472 Theory and MethodsDOI 101198016214505000000169

1394

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1395

prices are sampled at finer intervals microstructure issues be-come more pronounced The empirical finance literature thensuggests that the bias induced by market microstructure effectsmakes the most finely sampled data unusable and many authorsprefer to sample over longer time horizons to obtain more rea-sonable estimates The sampling length of the typical choices inthe literature is ad hoc and usually ranges from 5 to 30 minutesfor exchange rate data for instance If the original data are sam-pled once every second say then retaining an observation every5 minutes amounts to discarding 299 out of every 300 datapoints

This approach to handling the data poses a conundrum fromthe statistical standpoint It is difficult to accept that throwingaway data especially in such quantities can be an optimal solu-tion We argue here that sampling over longer horizons merelyreduces the impact of microstructure rather than quantifyingand correcting its effect for volatility estimation Ultimately weseek a solution that makes use of all of the data yet results inan unbiased and consistent estimator of the integrated volatil-ity Our contention is that the contamination due to marketmicrostructure is to first order the same as what statisticiansusually call ldquoobservation errorrdquo In other words the transactionwill be seen as an observation device for the underlying priceprocess We incorporate the observation error into the estimat-ing procedure for integrated volatility That is we suppose thatthe return process as observed at the sampling times is of theform

Yti = Xti + εti (4)

Here Xt is a latent true or efficient return process that fol-lows (4) The εprime

ti s are independent noise around the true returnIn the process of constructing our final estimator for the in-

tegrated volatility which we call the ldquofirst bestrdquo approach weare led to develop and analyze a number of intermediary esti-mators starting with a ldquofifth-bestrdquo approach that computes re-alized volatility using all of the data available This fifth-bestapproach has some clearly undesirable properties In fact weshow in Section 22 that ignoring microstructure noise wouldhave a devastating effect on the use of the realized volatilityInstead of (2) one gets

sum

titi+1isin[0T]

(Yti+1 minus Yti

)2 = 2nEε2 + Op(n12) (5)

where n is the number of sampling intervals over [0T] Thusthe realized volatility estimates not the true integrated volatil-ity but rather the variance of the contamination noise In factwe show that the true integrated volatility which is Op(1) iseven dwarfed by the magnitude of the asymptotically GaussianOp(n12) term This section also discusses why it is natural towant the quadratic variation of the latent process X as opposedto trying to construct say prices with the help of some form ofquadratic variation for Y

Faced with such dire consequences the usual empirical prac-tice where one does not use all of the data is likely to producean improvement We show that this is indeed the case by ana-lyzing the estimation approach where one selects an arbitrarysparse sampling frequency (such as one every 5 minutes forinstance) We call this the ldquofourth-bestrdquo approach A natural

question to ask then is how that sparse sampling frequencyshould be determined We show how to determine it optimallyby minimizing the mean squared error (MSE) of the sparse real-ized volatility estimator yielding what we term the ldquothird-bestrdquoestimator Similar results to our third-best approach have beendiscussed independently (when σt is conditionally nonrandom)by Bandi and Russell (2003) in a paper presented at the 2003CIRANO conference

The third-best answer may be to sample say every 6 minutesand 15 seconds instead of an arbitrary 5 minutes Thus even ifone determines the sampling frequency optimally it remainsthe case that one is not using a large fraction of the avail-able data Driven by one of statisticsrsquo first principlesmdashldquothoushall not throw data awayrdquomdashwe go further by proposing twoestimators that make use of the full data Our next estima-tor the ldquosecond-bestrdquo involves sampling sparsely over sub-grids of observations For example one could subsample every5 minutes starting with the first observation then startingwith the second and so on We then average the results ob-tained across those subgrids We show that subsampling andaveraging result in a substantial decrease in the bias of theestimator

Finally we show how to bias-correct this estimator result-ing in a centered estimator for the integrated volatility despitethe presence of market microstructure noise Our final estima-tor the ldquofirst-bestrdquo estimator effectively combines the second-best estimator (an average of realized volatilities estimated oversubgrids on a slow time scale of say 5 minutes each) with thefifth-best estimator (realized volatility estimated using all of thedata) providing the bias-correction device This combination ofslow and fast time scales gives our article its title

Models akin to (4) have been studied in the constant σ

case by Zhou (1996) who proposed a bias-correcting approachbased on autocovariances The behavior of this estimator hasbeen studied by Zumbach Corsi and Trapletti (2002) Othercontributions in this direction have been made by Hansen andLunde (2004ab) who considered time-varying σ in the con-ditionally nonrandom case and by Oomen (2004) Efficientlikelihood estimation of σ has been considered by Aiumlt-SahaliaMykland and Zhang (2005)

The article is organized as follows Section 12 provides asummary of the five estimators The main theory for these es-timators including asymptotic distributions is developed suc-cessively in Sections 2ndash4 for the case of one time period [0T]The multiperiod problem is treated in Section 5 Section 6 dis-cusses how to estimate the asymptotic variance for equidistantsampling Section 7 presents the results of Monte Carlo simula-tions for all five estimators Section 8 concludes and provides adiscussion for the case where the process contains jumps TheAppendix provides proofs

12 The Five Estimators A Userrsquos Guide

Here we provide a preview of our results in Sections 2ndash4 Todescribe the results we proceed through five estimators fromthe worst to the (so far) best candidate In this section all limitresults and optimal choices are stated in the equidistantly sam-pled case the more general results are given in the subsequentsections

1396 Journal of the American Statistical Association December 2005

121 The Fifth-Best Approach Completely Ignoring theNoise The naiumlve estimator of quadratic variation is [YY](all)

T which is the realized volatility based on the entire sample Weshow in Section 2 that (5) holds We therefore conclude that re-alized volatility [YY](all)

T is not a reliable estimator for the truevariation 〈XX〉T of the returns For large n the realized volatil-ity diverges to infinity linearly in n Scaled by (2n)minus1 it esti-mates consistently the variance of microstructure noise Eε2rather than the object of interest 〈XX〉T Said differently mar-ket microstructure noise totally swamps the variance of theprice signal at the level of the realized volatility Our simula-tions in Section 7 also document this effect

122 The Fourth-Best Approach Sampling Sparsely atSome Lower Frequency Of course completely ignoring thenoise and sampling as prescribed by [YY](all)

T is not what em-pirical researchers do in practice Much if not all of the exist-ing literature seems to have settled on sampling sparsely thatis constructing lower-frequency returns from the available dataFor example using essentially the same exchange rate seriesthese somewhat ad hoc choices range from 5-minute intervalsto as long as 30 minutes That is the literature typically uses theestimator [YY](sparse)

T described in Section 23 This involvestaking a subsample of nsparse observations For example withT = 1 day or 65 hours of open trading for stocks and we startwith data sampled on average every t = 1 second then for thefull dataset n = Tt = 23400 but once we sample sparselyevery 5 minutes then we sample every 300th observation andnsparse = 78

The distribution of [YY](sparse)T is described by Lemma 1

and Proposition 1 both in Section 23 To first approximation

[YY](sparse)T

Lasymp〈XX〉T + 2nsparseEε2

︸ ︷︷ ︸bias due to noise

+[

4nsparseEε4

︸ ︷︷ ︸due to noise

+ 2T

nsparse

int T

0σ 4

t dt

︸ ︷︷ ︸due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (6)

Here Ztotal is a standard normal random variable The sym-bol ldquoasympLrdquo means that when multiplied by a suitable factorthe convergence is in law For precise statements consult Sec-tions 2ndash4

123 The Third-Best Approach Sampling Sparsely at anOptimally Determined Frequency Our results in Section 23show how if one insists on sampling sparsely it is possible todetermine an optimal sampling frequency instead of selectingthe frequency in a somewhat ad hoc manner as in the fourth-bestapproach In effect this is similar to the fourth-best approachexcept that the arbitrary nsparse is replaced by an optimally de-termined nlowast

sparse Section 23 shows how to minimize over nsparsethe MSE and calculate it for equidistant observations

nlowastsparse =

(T

4(Eε2)2

int T

0σ 4

t dt

)13

The more general version is given by (31)

We note that the third- and the fourth-best estimators whichrely on comparatively small sample sizes could benefit fromthe higher-order adjustments A suitable Edgeworth adjustmentis currently under investigation

124 The Second-Best Approach Subsampling and Averag-ing As discussed earlier even if one determines the optimalsampling frequency as we have just described it is still the casethat the data are not used to their full extent We therefore pro-pose in Section 3 the estimator [YY](avg)

T constructed by aver-aging the estimators [YY](k)T across K grids of average size nWe show in Section 35 that

[YY](avg)T

Lasymp〈XX〉T + 2nEε2︸ ︷︷ ︸

bias due to noise

+[

4n

KEε4

︸ ︷︷ ︸due to noise

+ 4T

3n

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (7)

where Ztotal is a standard normal term The above expressionis in the equidistantly sampled case the general expression isgiven by (51)ndash(52) in Section 35

As can be seen from the foregoing equation [YY](avg)T re-

mains a biased estimator of the quadratic variation 〈XX〉T ofthe true return process But the bias 2nEε2 now increases withthe average size of the subsamples and n le n Thus as faras the bias is concerned [YY](avg)

T is a better estimator than[YY](all)

T The optimal trade-off between the bias and variancefor the estimator [YY](avg)

T is described in Section 36 we setKlowast asymp nnlowast with nlowast determined in (53) namely (in the equidis-tantly sampled case)

nlowast =(

T

6(Eε2)2

int T

0σ 4

t dt

)13

(8)

125 The First-Best Approach Subsampling and Averagingand Bias-Correction Our final estimator is based on bias-correcting the estimator [YY](avg)

T This is the subject of Sec-tion 4 We show that a bias-adjusted estimator for 〈XX〉T canbe constructed as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T

that is by combining estimators obtained over the two timescales ldquoallrdquo and ldquoavgrdquo A small-sample adjustment

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T

is given in (64) in Section 42 which shares the same asymp-totic distribution as 〈XX〉T to the order considered

We show in Theorem 4 in Section 41 that if the number ofsubgrids is selected as K = cn23 then

〈XX〉TLasymp〈XX〉T

+ 1

n16

[8

c2(Eε2)2

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (9)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1397

Unlike all of the previously considered estimators this estima-tor is now correctly centered at 〈XX〉T It converges only atrate nminus16 but from a MSE perspective this is better than be-ing (badly) biased In particular the prescription is to use allthe available data so there is no limit to how often one shouldsample every few seconds or more often would be typical intick-by-tick financial data so n can be quite large Based onSection 42 we use an estimate of the optimal value

clowast =(

T

12(Eε2)2

int T

0σ 4

t dt

)minus13

(10)

2 ANALYSIS OF REALIZED VOLATILITY UNDERMARKET MICROSTRUCTURE NOISE

21 Setup

To spell out the foregoing model we let Y be the logarithmof the transaction price which is observed at times 0 = t0t1 tn = T We assume that at these times Y is related toa latent true price X (also in logarithmic scale) through (4) Thelatent price X is given in (1) The noise εti satisfies the assump-tion

εti iid with Eεti = 0 and var(εti) = Eε2also ε |= X process (11)

where |= denotes independence between two random quanti-ties Our modeling as in (4) does not require that εt exists forevery t in other words our interest in the noise is only at theobservation times ti

For the moment we focus on determining the integratedvolatility of X for one time period [0T] This is also known asthe continuous quadratic variation 〈XX〉 of X In other words

〈XX〉T =int T

0σ 2

t dt (12)

To succinctly describe the realized volatility we use the no-tions of grid and observed quadratic variation as follows

Definition 1 The full grid containing all of the observationpoints is given by

G = t0 tn (13)

Definition 2 We also consider arbitrary grids H sube G To de-note successive elements in such grids we proceed as followsIf ti isin H then timinus and ti+ denote the preceding and follow-ing elements in H ti will always denote the ith point in thefull grid G Hence when H = G timinus = timinus1 and ti+ = ti+1When H is a strict subgrid of G it will normally be the casethat timinus lt timinus1 and ti+ gt ti+1 Finally we take

|H| = ( points in grid H) minus 1 (14)

This is to say that |H| is the number of time increments (ti ti+]so that both endpoints are contained in H In particular |G| = n

Definition 3 The observed quadratic variation [middot middot] for ageneric process Z (such as Y or X) on an arbitrary grid H sube Gis given by

[ZZ]Ht =sum

tjtj+isinHtj+let

(Ztj+ minus Ztj)2 (15)

When the choice of grid follows from the context [ZZ]Ht maybe denoted as just [ZZ]t On the full grid G the quadratic vari-ation is given by

[ZZ](all)t = [ZZ]Gt =

sum

titi+1isinGti+1let

(Zti)2 (16)

where Zti = Zti+1 minus Zti Quadratic covariations are definedsimilarly (see eg Karatzas and Shreve 1991 Jacod andShiryaev 2003 for more details on quadratic variations)

Our first objective is to assess how well the realized volatil-ity [YY](all)

T approximates the integrated volatility 〈XX〉T ofthe true latent process In our asymptotic considerations wealways assume that the number of observations in [0T] goesto infinity and also that the maximum distance in time betweentwo consecutive observations goes to 0

maxi

ti rarr 0 as n rarr infin (17)

For the sake of preciseness it should be noted that whenn rarr infin we are dealing with a sequence Gn of grids Gn =t0n tnn and similarly for subgrids We have avoided us-ing double subscripts so as to not complicate the notation butthis is how all our conditions and results should be interpreted

Note finally that we have adhered to the time series conven-tion that Zti = Zti+1 minus Zti This is in contrast to the stochasticcalculus convention Zti = Zti minus Ztiminus1

22 The Realized Volatility An Estimator ofthe Variance of the Noise

Under the additive model Yti = Xti +εti the realized volatilitybased on the observed returns Yti now has the form

[YY](all)T = [XX](all)

T + 2[X ε](all)T + [ε ε](all)

T

This gives the conditional mean and variance of [YY](all)T

E([YY](all)

T

∣∣X process

)= [XX](all)T + 2nEε2 (18)

under assumption (11) Similarly

var([YY](all)

T

∣∣X process

)= 4nEε4 + Op(1) (19)

subject to condition (11) and Eε4ti = Eε4 lt infin for all i Subject

to slightly stronger conditions

var([YY](all)

T

∣∣X process

)

= 4nEε4 + (8[XX](all)T Eε2 minus 2 var(ε2)

)

+ Op(nminus12) (20)

It is also the case that as n rarr infin conditionally on the Xprocess we have asymptotic normality

nminus12([YY](all)T minus 2nEε2) Lminusrarr2(Eε4)12Znoise (21)

Here Znoise is standard normal with the subscript ldquonoiserdquo indi-cating that the randomness comes from the noise ε that is thedeviation of the observables Y from the true process X

The derivations of (19)ndash(20) along with the conditions forthe latter are provided in Section A1 The result (21) is derivedin Section A2 where it is part of Theorem A1

1398 Journal of the American Statistical Association December 2005

Equations (18) and (19) suggest that in the discrete worldwhere microstructure effects are unfortunately present realizedvolatility [YY](all)

T is not a reliable estimator for the true vari-ation [XX](all)

T of the returns For large n realized volatilitycould have little to do with the true returns Instead it relatesto the noise term Eε2 in the first order and Eε4 in the secondorder Also one can see from (18) that [YY](all)

T has a positivebias whose magnitude increases linearly with the sample size

Interestingly apart from revealing the biased nature of[YY](all)

T at high frequency our analysis also delivers a con-sistent estimator for the variance of the noise term In otherwords let

Eε2 = 1

2n[YY](all)

T

We have for a fixed true return process X

n12(Eε2 minus Eε2) rarr N(0Eε4) as n rarr infin (22)

see Theorem A1 in the AppendixBy the same methods as in Section A2 a consistent estima-

tor of the asymptotic variance of Eε2 is then given by

Eε4 = 1

2n

sum

i

(Yti)4 minus 3(Eε2)2 (23)

A question that naturally arises is why one is interested in thequadratic variation of X (either 〈XX〉 or [XX]) as opposed tothe quadratic variation of Y ([YY]) because ldquo〈YY〉rdquo wouldhave to be taken to be infinite in view of the foregoing Forexample in options pricing or hedging one could take the op-posite view that [YY] is the volatility that one actually faces

A main reason why we focus on the quadratic variation ofX is that the variation caused by the εrsquos is tied to each transac-tion as opposed to the price process of the underlying securityFrom the standpoint of trading the εrsquos represent trading costswhich are different from the costs created by the volatility ofthe underlying process Different market participants may evenface different types of trading costs depending on institutionalarrangements In any case for the purpose of integrating trad-ing costs into options prices it seems more natural to approachthe matter via the extensive literature on market microstructure(see OrsquoHara 1995 for a survey)

A related matter is that continuous finance would be diffi-cult to implement if one were to use the quadratic variationof Y If one were to use the quadratic variation [YY] then onewould also be using a quantity that would depend on the datafrequency

Finally apart from the specific application it is interestingto be able to say something about the underlying log returnprocess and to be able to separate this from the effects intro-duced by the mechanics of the trading process where the idio-syncratics of the trading arrangement play a role

23 The Optimal Sampling Frequency

We have just argued that the realized volatility estimates thewrong quantity This problem only gets worse when observa-tions are sampled more frequently Its financial interpretationboils down to market microstructure summarized by ε in (4)As the data record is sampled finely the change in true returnsgets smaller while the microstructure noise such as bidndashask

spread and transaction cost remains at the same magnitude Inother words when the sampling frequency is extremely highthe observed fluctuation in the returns process is more heavilycontaminated by microstructure noise and becomes less repre-sentative of the true variation 〈XX〉T of the returns Along thisline of discussion the broad opinion in financial applications isnot to sample too often at least when using realized volatilityWe now discuss how this can be viewed in the context of themodel (4) with stochastic volatility

Formally sparse sampling is implemented by sampling on asubgrid H of G with as before

[YY](sparse) = [YY](H) =sum

titi+isinH(Yti+ minus Yti)

2

For the moment we take the subgrid as given but later we con-sider how to optimize the sampling We call nsparse = |H|

To give a rationale for sparse sampling we propose an as-ymptotic where the law of ε although still iid is allowed tovary with n and nsparse Formally we suppose that the distri-bution of ε L(ε) is an element of the set D of all distribu-tions for which E(ε) = 0 and where E(ε2) and E(ε4)E(ε2)2

are bounded by arbitrary constants The asymptotic normalityin Section 22 then takes the following more nuanced form

Lemma 1 Suppose that X is an Itocirc process of form (1)where |microt| and σt are bounded above by a constant Supposethat for given n the grid Hn is given with nsparse rarr infin asn rarr infin and that for each n Y is related to X through model (4)Assume (11) where L(ε) isin D Like the grid Hn the law L(ε)

can depend on n (whereas the process X is fixed) Let (17) besatisfied for the sequence of grids Hn Then

[YY]HT = [XX]HT + 2nsparseEε2

+ (4nsparseEε4

+ (8[XX]HT Eε2 minus 2 var(ε2)))12

Znoise

+ Op(nminus14

sparse(Eε2)12) (24)

where Znoise is a quantity that is asymptotically standard nor-mal

The lemma is proved at the end of Section A2 Note now thatthe relative order of the terms in (24) depends on the quantitiesnsparse and Eε2 Therefore if Eε2 is small relative to nsparsethen [YY]HT is after all not an entirely absurd substitute for[XX]HT

One would be tempted to conclude that the optimal choice ofnsparse is to make it as small as possible But that would over-look the fact that the bigger the nsparse the closer the [XX]HTto the target integrated volatility 〈XX〉T from (12)

To quantify the overall error we now combine Lemma 1with the results on discretization error to study the total er-ror [YY]HT minus 〈XX〉T Following Rootzen (1980) Jacod andProtter (1998) Barndorff-Nielsen and Shephard (2002) andMykland and Zhang (2002) and under the conditions that theseauthors stated we can show that(

nsparse

T

)12

([XX]HT minus 〈XX〉T )

Lminusrarr(int T

02Hprime(t)σ 4

t dt

)12

times Zdiscrete (25)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1399

stably in law (We discuss this concept at the end of Sec 34)Zdiscrete is a standard normal random variable with the subscriptldquodiscreterdquo indicating that the randomness is due to the dis-cretization effect in [XX]HT when evaluating 〈XX〉T H(t) isthe asymptotic quadratic variation of time as discussed byMykland and Zhang (2002)

H(t) = limnrarrinfin

nsparse

T

sum

titi+isinHti+let

(ti+ minus ti)2

In the case of equidistant observations t0 = middot middot middot = tnminus1 =t = Tnsparse and Hprime(t) = 1 Because the εrsquos are independentof the X process Znoise is independent of Zdiscrete

For small Eε2 one now has an opportunity to estimate〈XX〉T It follows from our Lemma 1 and proposition 1 ofMykland and Zhang (2002) that

Proposition 1 Assume the conditions of Lemma 1 and alsothat maxtiti+isinH(ti+ minus ti) = O(1nsparse) Also assume Condi-tion E in Section A3 Then H is well defined and

[YY]HT = 〈XX〉T + 2Eε2nsparse + ϒZtotal

+ Op(nminus14

sparse(Eε2)12)+ op(nminus12

sparse) (26)

in the sense of stable convergence where Ztotal is asymptoti-cally standard normal and where the variance has the form

ϒ2 = 4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

︸ ︷︷ ︸due to noise

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

︸ ︷︷ ︸

due to discretization

(27)

Seen from this angle there is scope for using the real-ized volatility [YY]H to estimate 〈XX〉 There is a bias2Eε2nsparse but the bias goes down if one uses fewer obser-vations This then is consistent with the practice in empiricalfinance where the estimator used is not [YY](all) but instead[YY](sparse) by sampling sparsely For instance faced withdata sampled every few seconds empirical researchers wouldtypically use square returns (ie differences of log-prices Y)over say 5- 15- or 30-minute time intervals The intuitive ra-tionale for using this estimator is to attempt to control the biasof the estimator this can be assessed based on our formula (26)replacing the original sample size n by the lower number re-flecting the sparse sampling But one should avoid sampling toosparsely because formula (27) shows that decreasing nsparse hasthe effect of increasing the variance of the estimator via the dis-cretization effect which is proportional to nminus1

sparse Based on ourformulas this trade-off between sampling too often and sam-pling too rarely can be formalized and an optimal frequency atwhich to sample sparsely can be determined

It is natural to minimize the MSE of [YY](sparse)

MSE = (2nsparseEε2)2

+

4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

(28)

One has to imagine that the original sample size n is quite largeso that nsparse lt n In this case minimizing the MSE (28) meansthat one should choose nsparse to satisfy partMSEpartnsparse asymp 0 inother words

8nsparse(Eε2)2 + 4Eε4 minus T

n2sparse

int T

02Hprime(t)σ 4

t dt asymp 0 (29)

or

n3sparse + 1

2n2

1Eε4

(Eε2)2minus (Eε2)minus2 T

8

int T

02Hprime(t)σ 4

t dt asymp 0 (30)

Finally still under the conditions on Proposition 1 the opti-mum nlowast

sparse becomes

nlowastsparse = (Eε2)minus23

(T

8

int T

02Hprime(t)σ 4

t dt

)13

(1 + op(1))

as Eε2 rarr 0 (31)

Of course if the actual sample size n were smaller than nlowastsparse

then one would simply take nlowastsparse = n but this is unlikely to

occur for heavily traded stockEquation (31) is the formal statement saying that one can

sample more frequently when the error spread is small Notefrom (28) that to first order the final trade-off is between thebias 2nsparseEε2 and the variance due to discretization The ef-fect of the variance associated with Znoise is of lower order whencomparing nsparse and Eε2 It should be emphasized that (31) isa feasible way of choosing nsparse One can estimate Eε2 usingall of the data following the procedure in Section 22 The inte-gral

int T0 2Hprime(t)σ 4

t dt can be estimated by the methods discussedin Section 6

Hence if one decides to address the problem by selectinga lower frequency of observation then one can do so by sub-sampling the full grid G at an arbitrary sparse frequency anduse [YY](sparse) Alternatively one can use nsparse as optimallydetermined by (31) We denote the corresponding estimator as[YY](sparseopt) These are our fourth- and third-best estimators

3 SAMPLING SPARSELY WHILE USING ALL OFTHE DATA SUBSAMPLING AND AVERAGING

OVER MULTIPLE GRIDS

31 Multiple Grids and Sufficiency

We have argued in the previous section that one can indeedbenefit from using infrequently sampled data And yet one ofthe most basic lessons of statistics is that one should not dothis We present here two ways of tackling the problem Insteadof selecting (arbitrarily or optimally) a subsample our meth-ods are based on selecting a number of subgrids of the originalgrid of observation times G = t0 tn and then averagingthe estimators derived from the subgrids The principle is thatto the extent that there is a benefit to subsampling this benefitcan now be retained whereas the variation of the estimator canbe lessened by the averaging The benefit of averaging is clearfrom sufficiency considerations and many statisticians wouldsay that subsampling without subsequent averaging is inferen-tially incorrect

In what follows we first introduce a set of notations then turnto studying the realized volatility in the multigrid context InSection 4 we show how to eliminate the bias of the estimator byusing two time scales a combination of single grid and multiplegrids

1400 Journal of the American Statistical Association December 2005

32 Notation for the Multiple Grids

We specifically suppose that the full grid G G = t0 tnas in (13) is partitioned into K nonoverlapping subgrids G(k)k = 1 K in other words

G =K⋃

k=1

G(k) where G(k) cap G(l) = empty when k 13= l

For most purposes the natural way to select the kth subgrid G(k)

is to start with tkminus1 and then pick every Kth sample point afterthat until T That is

G(k) = tkminus1 tkminus1+K tkminus1+2K tkminus1+nkK

for k = 1 K where nk is the integer making tkminus1+nkK thelast element in G(k) We refer to this as regular allocation ofsample points to subgrids

Whether the allocation is regular or not we let nk = |G(k)| Asin Definition 2 n = |G| Recall that the realized volatility basedon all observation points G is written as [YY](all)

T Meanwhileif one uses only the subsampled observations Yt t isin G(k) thenthe realized volatility is denoted by [YY](k)T It has the form

[YY](k)T =sum

tjtj+isinG(k)

(Ytj+ minus Ytj

)2

where if ti isin G(k) then ti+ denotes the following elementsin G(k)

A natural competitor to [YY](all)T [YY](sparse)

T and

[YY](sparseopt)T is then given by

[YY](avg)T = 1

K

Ksum

k=1

[YY](k)T (32)

and this is the statistic we analyze in the what following Asbefore we fix T and use only the observations within the timeperiod [0T] Asymptotics are still under (17) and under

as n rarr infin and nK rarr infin (33)

In general the nk need not be the same across k We define

n = 1

K

Ksum

k=1

nk = n minus K + 1

K (34)

33 Error Due to the Noise [Y Y ](avg)T minus [X X ](avg)

T

Recall that we are interested in determining the integratedvolatility 〈XX〉T or quadratic variation of the true but un-observable returns As an intermediate step here we studyhow well the ldquopooledrdquo realized volatility [YY](avg)

T approx-imates [XX](avg)

T where the latter is the ldquopooledrdquo true inte-grated volatility when X is considered only on the discrete timescale

From (18) and (32)

E([YY](avg)

T

∣∣X process

)= [XX](avg)T + 2nEε2 (35)

Also because εt t isin G(k) are independent for different k

var([YY](avg)

T

∣∣X process

) = 1

K2

Ksum

k=1

var([YY](k)T

∣∣X process

)

= 4n

KEε4 + Op

(1

K

)

(36)

in the same way as in (19) Incorporating the next-order term inthe variance yields that

var([YY](avg)

T

∣∣X)

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2 var(ε2)]

+ op

(1

K

)

(37)

as in (20)By Theorem A1 in Section A2 the conditional asymptotics

for the estimator [YY](avg)T are as follows

Theorem 1 Suppose that X is an Itocirc process of form (1)Suppose that Y is related to X through model (4) and that (11)is satisfied with Eε4 lt infin Also suppose that ti and ti+1 arenot in the same subgrid for any i Under assumption (33) asn rarr infinradic

K

n

([YY](avg)T minus [XX](avg)

T minus 2nEε2) Lminusrarr 2radic

Eε4Z(avg)

noise

(38)

conditional on the X process where Z(avg)

noise is standard normal

This can be compared with the result stated in (24) Noticethat Z(avg)

noise in (38) is almost never the same as Znoise in (24)in particular cov(ZnoiseZ(avg)

noise ) = var(ε2)Eε4 based on theproof of Theorem A1 in the Appendix

In comparison with the realized volatility using the fullgrid G the aggregated estimator [YY](avg)

T provides an im-provement in that both the asymptotic bias and variance areof smaller order of n compare (18) and (19) We use this inSection 4

34 Error Due to the Discretization Effect[X X ](avg)

T minus 〈X X 〉T

In this section we study the impact of the time discretizationIn other words we investigate the deviation of [XX](avg)

T fromthe integrated volatility 〈XX〉T of the true process Denote thediscretization effect by DT where

Dt = [XX](avg)t minus 〈XX〉t

= 1

K

Ksum

k=1

([XX](k)t minus 〈XX〉t) (39)

with

[XX](k)t =sum

titi+isinG(k)ti+let

(Xti+ minus Xti

)2 (40)

In what follows we consider the asymptotics of DT The prob-lem is similar to that of finding the limit of [XX](all)

T minus〈XX〉T

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 2: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1395

prices are sampled at finer intervals microstructure issues be-come more pronounced The empirical finance literature thensuggests that the bias induced by market microstructure effectsmakes the most finely sampled data unusable and many authorsprefer to sample over longer time horizons to obtain more rea-sonable estimates The sampling length of the typical choices inthe literature is ad hoc and usually ranges from 5 to 30 minutesfor exchange rate data for instance If the original data are sam-pled once every second say then retaining an observation every5 minutes amounts to discarding 299 out of every 300 datapoints

This approach to handling the data poses a conundrum fromthe statistical standpoint It is difficult to accept that throwingaway data especially in such quantities can be an optimal solu-tion We argue here that sampling over longer horizons merelyreduces the impact of microstructure rather than quantifyingand correcting its effect for volatility estimation Ultimately weseek a solution that makes use of all of the data yet results inan unbiased and consistent estimator of the integrated volatil-ity Our contention is that the contamination due to marketmicrostructure is to first order the same as what statisticiansusually call ldquoobservation errorrdquo In other words the transactionwill be seen as an observation device for the underlying priceprocess We incorporate the observation error into the estimat-ing procedure for integrated volatility That is we suppose thatthe return process as observed at the sampling times is of theform

Yti = Xti + εti (4)

Here Xt is a latent true or efficient return process that fol-lows (4) The εprime

ti s are independent noise around the true returnIn the process of constructing our final estimator for the in-

tegrated volatility which we call the ldquofirst bestrdquo approach weare led to develop and analyze a number of intermediary esti-mators starting with a ldquofifth-bestrdquo approach that computes re-alized volatility using all of the data available This fifth-bestapproach has some clearly undesirable properties In fact weshow in Section 22 that ignoring microstructure noise wouldhave a devastating effect on the use of the realized volatilityInstead of (2) one gets

sum

titi+1isin[0T]

(Yti+1 minus Yti

)2 = 2nEε2 + Op(n12) (5)

where n is the number of sampling intervals over [0T] Thusthe realized volatility estimates not the true integrated volatil-ity but rather the variance of the contamination noise In factwe show that the true integrated volatility which is Op(1) iseven dwarfed by the magnitude of the asymptotically GaussianOp(n12) term This section also discusses why it is natural towant the quadratic variation of the latent process X as opposedto trying to construct say prices with the help of some form ofquadratic variation for Y

Faced with such dire consequences the usual empirical prac-tice where one does not use all of the data is likely to producean improvement We show that this is indeed the case by ana-lyzing the estimation approach where one selects an arbitrarysparse sampling frequency (such as one every 5 minutes forinstance) We call this the ldquofourth-bestrdquo approach A natural

question to ask then is how that sparse sampling frequencyshould be determined We show how to determine it optimallyby minimizing the mean squared error (MSE) of the sparse real-ized volatility estimator yielding what we term the ldquothird-bestrdquoestimator Similar results to our third-best approach have beendiscussed independently (when σt is conditionally nonrandom)by Bandi and Russell (2003) in a paper presented at the 2003CIRANO conference

The third-best answer may be to sample say every 6 minutesand 15 seconds instead of an arbitrary 5 minutes Thus even ifone determines the sampling frequency optimally it remainsthe case that one is not using a large fraction of the avail-able data Driven by one of statisticsrsquo first principlesmdashldquothoushall not throw data awayrdquomdashwe go further by proposing twoestimators that make use of the full data Our next estima-tor the ldquosecond-bestrdquo involves sampling sparsely over sub-grids of observations For example one could subsample every5 minutes starting with the first observation then startingwith the second and so on We then average the results ob-tained across those subgrids We show that subsampling andaveraging result in a substantial decrease in the bias of theestimator

Finally we show how to bias-correct this estimator result-ing in a centered estimator for the integrated volatility despitethe presence of market microstructure noise Our final estima-tor the ldquofirst-bestrdquo estimator effectively combines the second-best estimator (an average of realized volatilities estimated oversubgrids on a slow time scale of say 5 minutes each) with thefifth-best estimator (realized volatility estimated using all of thedata) providing the bias-correction device This combination ofslow and fast time scales gives our article its title

Models akin to (4) have been studied in the constant σ

case by Zhou (1996) who proposed a bias-correcting approachbased on autocovariances The behavior of this estimator hasbeen studied by Zumbach Corsi and Trapletti (2002) Othercontributions in this direction have been made by Hansen andLunde (2004ab) who considered time-varying σ in the con-ditionally nonrandom case and by Oomen (2004) Efficientlikelihood estimation of σ has been considered by Aiumlt-SahaliaMykland and Zhang (2005)

The article is organized as follows Section 12 provides asummary of the five estimators The main theory for these es-timators including asymptotic distributions is developed suc-cessively in Sections 2ndash4 for the case of one time period [0T]The multiperiod problem is treated in Section 5 Section 6 dis-cusses how to estimate the asymptotic variance for equidistantsampling Section 7 presents the results of Monte Carlo simula-tions for all five estimators Section 8 concludes and provides adiscussion for the case where the process contains jumps TheAppendix provides proofs

12 The Five Estimators A Userrsquos Guide

Here we provide a preview of our results in Sections 2ndash4 Todescribe the results we proceed through five estimators fromthe worst to the (so far) best candidate In this section all limitresults and optimal choices are stated in the equidistantly sam-pled case the more general results are given in the subsequentsections

1396 Journal of the American Statistical Association December 2005

121 The Fifth-Best Approach Completely Ignoring theNoise The naiumlve estimator of quadratic variation is [YY](all)

T which is the realized volatility based on the entire sample Weshow in Section 2 that (5) holds We therefore conclude that re-alized volatility [YY](all)

T is not a reliable estimator for the truevariation 〈XX〉T of the returns For large n the realized volatil-ity diverges to infinity linearly in n Scaled by (2n)minus1 it esti-mates consistently the variance of microstructure noise Eε2rather than the object of interest 〈XX〉T Said differently mar-ket microstructure noise totally swamps the variance of theprice signal at the level of the realized volatility Our simula-tions in Section 7 also document this effect

122 The Fourth-Best Approach Sampling Sparsely atSome Lower Frequency Of course completely ignoring thenoise and sampling as prescribed by [YY](all)

T is not what em-pirical researchers do in practice Much if not all of the exist-ing literature seems to have settled on sampling sparsely thatis constructing lower-frequency returns from the available dataFor example using essentially the same exchange rate seriesthese somewhat ad hoc choices range from 5-minute intervalsto as long as 30 minutes That is the literature typically uses theestimator [YY](sparse)

T described in Section 23 This involvestaking a subsample of nsparse observations For example withT = 1 day or 65 hours of open trading for stocks and we startwith data sampled on average every t = 1 second then for thefull dataset n = Tt = 23400 but once we sample sparselyevery 5 minutes then we sample every 300th observation andnsparse = 78

The distribution of [YY](sparse)T is described by Lemma 1

and Proposition 1 both in Section 23 To first approximation

[YY](sparse)T

Lasymp〈XX〉T + 2nsparseEε2

︸ ︷︷ ︸bias due to noise

+[

4nsparseEε4

︸ ︷︷ ︸due to noise

+ 2T

nsparse

int T

0σ 4

t dt

︸ ︷︷ ︸due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (6)

Here Ztotal is a standard normal random variable The sym-bol ldquoasympLrdquo means that when multiplied by a suitable factorthe convergence is in law For precise statements consult Sec-tions 2ndash4

123 The Third-Best Approach Sampling Sparsely at anOptimally Determined Frequency Our results in Section 23show how if one insists on sampling sparsely it is possible todetermine an optimal sampling frequency instead of selectingthe frequency in a somewhat ad hoc manner as in the fourth-bestapproach In effect this is similar to the fourth-best approachexcept that the arbitrary nsparse is replaced by an optimally de-termined nlowast

sparse Section 23 shows how to minimize over nsparsethe MSE and calculate it for equidistant observations

nlowastsparse =

(T

4(Eε2)2

int T

0σ 4

t dt

)13

The more general version is given by (31)

We note that the third- and the fourth-best estimators whichrely on comparatively small sample sizes could benefit fromthe higher-order adjustments A suitable Edgeworth adjustmentis currently under investigation

124 The Second-Best Approach Subsampling and Averag-ing As discussed earlier even if one determines the optimalsampling frequency as we have just described it is still the casethat the data are not used to their full extent We therefore pro-pose in Section 3 the estimator [YY](avg)

T constructed by aver-aging the estimators [YY](k)T across K grids of average size nWe show in Section 35 that

[YY](avg)T

Lasymp〈XX〉T + 2nEε2︸ ︷︷ ︸

bias due to noise

+[

4n

KEε4

︸ ︷︷ ︸due to noise

+ 4T

3n

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (7)

where Ztotal is a standard normal term The above expressionis in the equidistantly sampled case the general expression isgiven by (51)ndash(52) in Section 35

As can be seen from the foregoing equation [YY](avg)T re-

mains a biased estimator of the quadratic variation 〈XX〉T ofthe true return process But the bias 2nEε2 now increases withthe average size of the subsamples and n le n Thus as faras the bias is concerned [YY](avg)

T is a better estimator than[YY](all)

T The optimal trade-off between the bias and variancefor the estimator [YY](avg)

T is described in Section 36 we setKlowast asymp nnlowast with nlowast determined in (53) namely (in the equidis-tantly sampled case)

nlowast =(

T

6(Eε2)2

int T

0σ 4

t dt

)13

(8)

125 The First-Best Approach Subsampling and Averagingand Bias-Correction Our final estimator is based on bias-correcting the estimator [YY](avg)

T This is the subject of Sec-tion 4 We show that a bias-adjusted estimator for 〈XX〉T canbe constructed as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T

that is by combining estimators obtained over the two timescales ldquoallrdquo and ldquoavgrdquo A small-sample adjustment

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T

is given in (64) in Section 42 which shares the same asymp-totic distribution as 〈XX〉T to the order considered

We show in Theorem 4 in Section 41 that if the number ofsubgrids is selected as K = cn23 then

〈XX〉TLasymp〈XX〉T

+ 1

n16

[8

c2(Eε2)2

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (9)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1397

Unlike all of the previously considered estimators this estima-tor is now correctly centered at 〈XX〉T It converges only atrate nminus16 but from a MSE perspective this is better than be-ing (badly) biased In particular the prescription is to use allthe available data so there is no limit to how often one shouldsample every few seconds or more often would be typical intick-by-tick financial data so n can be quite large Based onSection 42 we use an estimate of the optimal value

clowast =(

T

12(Eε2)2

int T

0σ 4

t dt

)minus13

(10)

2 ANALYSIS OF REALIZED VOLATILITY UNDERMARKET MICROSTRUCTURE NOISE

21 Setup

To spell out the foregoing model we let Y be the logarithmof the transaction price which is observed at times 0 = t0t1 tn = T We assume that at these times Y is related toa latent true price X (also in logarithmic scale) through (4) Thelatent price X is given in (1) The noise εti satisfies the assump-tion

εti iid with Eεti = 0 and var(εti) = Eε2also ε |= X process (11)

where |= denotes independence between two random quanti-ties Our modeling as in (4) does not require that εt exists forevery t in other words our interest in the noise is only at theobservation times ti

For the moment we focus on determining the integratedvolatility of X for one time period [0T] This is also known asthe continuous quadratic variation 〈XX〉 of X In other words

〈XX〉T =int T

0σ 2

t dt (12)

To succinctly describe the realized volatility we use the no-tions of grid and observed quadratic variation as follows

Definition 1 The full grid containing all of the observationpoints is given by

G = t0 tn (13)

Definition 2 We also consider arbitrary grids H sube G To de-note successive elements in such grids we proceed as followsIf ti isin H then timinus and ti+ denote the preceding and follow-ing elements in H ti will always denote the ith point in thefull grid G Hence when H = G timinus = timinus1 and ti+ = ti+1When H is a strict subgrid of G it will normally be the casethat timinus lt timinus1 and ti+ gt ti+1 Finally we take

|H| = ( points in grid H) minus 1 (14)

This is to say that |H| is the number of time increments (ti ti+]so that both endpoints are contained in H In particular |G| = n

Definition 3 The observed quadratic variation [middot middot] for ageneric process Z (such as Y or X) on an arbitrary grid H sube Gis given by

[ZZ]Ht =sum

tjtj+isinHtj+let

(Ztj+ minus Ztj)2 (15)

When the choice of grid follows from the context [ZZ]Ht maybe denoted as just [ZZ]t On the full grid G the quadratic vari-ation is given by

[ZZ](all)t = [ZZ]Gt =

sum

titi+1isinGti+1let

(Zti)2 (16)

where Zti = Zti+1 minus Zti Quadratic covariations are definedsimilarly (see eg Karatzas and Shreve 1991 Jacod andShiryaev 2003 for more details on quadratic variations)

Our first objective is to assess how well the realized volatil-ity [YY](all)

T approximates the integrated volatility 〈XX〉T ofthe true latent process In our asymptotic considerations wealways assume that the number of observations in [0T] goesto infinity and also that the maximum distance in time betweentwo consecutive observations goes to 0

maxi

ti rarr 0 as n rarr infin (17)

For the sake of preciseness it should be noted that whenn rarr infin we are dealing with a sequence Gn of grids Gn =t0n tnn and similarly for subgrids We have avoided us-ing double subscripts so as to not complicate the notation butthis is how all our conditions and results should be interpreted

Note finally that we have adhered to the time series conven-tion that Zti = Zti+1 minus Zti This is in contrast to the stochasticcalculus convention Zti = Zti minus Ztiminus1

22 The Realized Volatility An Estimator ofthe Variance of the Noise

Under the additive model Yti = Xti +εti the realized volatilitybased on the observed returns Yti now has the form

[YY](all)T = [XX](all)

T + 2[X ε](all)T + [ε ε](all)

T

This gives the conditional mean and variance of [YY](all)T

E([YY](all)

T

∣∣X process

)= [XX](all)T + 2nEε2 (18)

under assumption (11) Similarly

var([YY](all)

T

∣∣X process

)= 4nEε4 + Op(1) (19)

subject to condition (11) and Eε4ti = Eε4 lt infin for all i Subject

to slightly stronger conditions

var([YY](all)

T

∣∣X process

)

= 4nEε4 + (8[XX](all)T Eε2 minus 2 var(ε2)

)

+ Op(nminus12) (20)

It is also the case that as n rarr infin conditionally on the Xprocess we have asymptotic normality

nminus12([YY](all)T minus 2nEε2) Lminusrarr2(Eε4)12Znoise (21)

Here Znoise is standard normal with the subscript ldquonoiserdquo indi-cating that the randomness comes from the noise ε that is thedeviation of the observables Y from the true process X

The derivations of (19)ndash(20) along with the conditions forthe latter are provided in Section A1 The result (21) is derivedin Section A2 where it is part of Theorem A1

1398 Journal of the American Statistical Association December 2005

Equations (18) and (19) suggest that in the discrete worldwhere microstructure effects are unfortunately present realizedvolatility [YY](all)

T is not a reliable estimator for the true vari-ation [XX](all)

T of the returns For large n realized volatilitycould have little to do with the true returns Instead it relatesto the noise term Eε2 in the first order and Eε4 in the secondorder Also one can see from (18) that [YY](all)

T has a positivebias whose magnitude increases linearly with the sample size

Interestingly apart from revealing the biased nature of[YY](all)

T at high frequency our analysis also delivers a con-sistent estimator for the variance of the noise term In otherwords let

Eε2 = 1

2n[YY](all)

T

We have for a fixed true return process X

n12(Eε2 minus Eε2) rarr N(0Eε4) as n rarr infin (22)

see Theorem A1 in the AppendixBy the same methods as in Section A2 a consistent estima-

tor of the asymptotic variance of Eε2 is then given by

Eε4 = 1

2n

sum

i

(Yti)4 minus 3(Eε2)2 (23)

A question that naturally arises is why one is interested in thequadratic variation of X (either 〈XX〉 or [XX]) as opposed tothe quadratic variation of Y ([YY]) because ldquo〈YY〉rdquo wouldhave to be taken to be infinite in view of the foregoing Forexample in options pricing or hedging one could take the op-posite view that [YY] is the volatility that one actually faces

A main reason why we focus on the quadratic variation ofX is that the variation caused by the εrsquos is tied to each transac-tion as opposed to the price process of the underlying securityFrom the standpoint of trading the εrsquos represent trading costswhich are different from the costs created by the volatility ofthe underlying process Different market participants may evenface different types of trading costs depending on institutionalarrangements In any case for the purpose of integrating trad-ing costs into options prices it seems more natural to approachthe matter via the extensive literature on market microstructure(see OrsquoHara 1995 for a survey)

A related matter is that continuous finance would be diffi-cult to implement if one were to use the quadratic variationof Y If one were to use the quadratic variation [YY] then onewould also be using a quantity that would depend on the datafrequency

Finally apart from the specific application it is interestingto be able to say something about the underlying log returnprocess and to be able to separate this from the effects intro-duced by the mechanics of the trading process where the idio-syncratics of the trading arrangement play a role

23 The Optimal Sampling Frequency

We have just argued that the realized volatility estimates thewrong quantity This problem only gets worse when observa-tions are sampled more frequently Its financial interpretationboils down to market microstructure summarized by ε in (4)As the data record is sampled finely the change in true returnsgets smaller while the microstructure noise such as bidndashask

spread and transaction cost remains at the same magnitude Inother words when the sampling frequency is extremely highthe observed fluctuation in the returns process is more heavilycontaminated by microstructure noise and becomes less repre-sentative of the true variation 〈XX〉T of the returns Along thisline of discussion the broad opinion in financial applications isnot to sample too often at least when using realized volatilityWe now discuss how this can be viewed in the context of themodel (4) with stochastic volatility

Formally sparse sampling is implemented by sampling on asubgrid H of G with as before

[YY](sparse) = [YY](H) =sum

titi+isinH(Yti+ minus Yti)

2

For the moment we take the subgrid as given but later we con-sider how to optimize the sampling We call nsparse = |H|

To give a rationale for sparse sampling we propose an as-ymptotic where the law of ε although still iid is allowed tovary with n and nsparse Formally we suppose that the distri-bution of ε L(ε) is an element of the set D of all distribu-tions for which E(ε) = 0 and where E(ε2) and E(ε4)E(ε2)2

are bounded by arbitrary constants The asymptotic normalityin Section 22 then takes the following more nuanced form

Lemma 1 Suppose that X is an Itocirc process of form (1)where |microt| and σt are bounded above by a constant Supposethat for given n the grid Hn is given with nsparse rarr infin asn rarr infin and that for each n Y is related to X through model (4)Assume (11) where L(ε) isin D Like the grid Hn the law L(ε)

can depend on n (whereas the process X is fixed) Let (17) besatisfied for the sequence of grids Hn Then

[YY]HT = [XX]HT + 2nsparseEε2

+ (4nsparseEε4

+ (8[XX]HT Eε2 minus 2 var(ε2)))12

Znoise

+ Op(nminus14

sparse(Eε2)12) (24)

where Znoise is a quantity that is asymptotically standard nor-mal

The lemma is proved at the end of Section A2 Note now thatthe relative order of the terms in (24) depends on the quantitiesnsparse and Eε2 Therefore if Eε2 is small relative to nsparsethen [YY]HT is after all not an entirely absurd substitute for[XX]HT

One would be tempted to conclude that the optimal choice ofnsparse is to make it as small as possible But that would over-look the fact that the bigger the nsparse the closer the [XX]HTto the target integrated volatility 〈XX〉T from (12)

To quantify the overall error we now combine Lemma 1with the results on discretization error to study the total er-ror [YY]HT minus 〈XX〉T Following Rootzen (1980) Jacod andProtter (1998) Barndorff-Nielsen and Shephard (2002) andMykland and Zhang (2002) and under the conditions that theseauthors stated we can show that(

nsparse

T

)12

([XX]HT minus 〈XX〉T )

Lminusrarr(int T

02Hprime(t)σ 4

t dt

)12

times Zdiscrete (25)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1399

stably in law (We discuss this concept at the end of Sec 34)Zdiscrete is a standard normal random variable with the subscriptldquodiscreterdquo indicating that the randomness is due to the dis-cretization effect in [XX]HT when evaluating 〈XX〉T H(t) isthe asymptotic quadratic variation of time as discussed byMykland and Zhang (2002)

H(t) = limnrarrinfin

nsparse

T

sum

titi+isinHti+let

(ti+ minus ti)2

In the case of equidistant observations t0 = middot middot middot = tnminus1 =t = Tnsparse and Hprime(t) = 1 Because the εrsquos are independentof the X process Znoise is independent of Zdiscrete

For small Eε2 one now has an opportunity to estimate〈XX〉T It follows from our Lemma 1 and proposition 1 ofMykland and Zhang (2002) that

Proposition 1 Assume the conditions of Lemma 1 and alsothat maxtiti+isinH(ti+ minus ti) = O(1nsparse) Also assume Condi-tion E in Section A3 Then H is well defined and

[YY]HT = 〈XX〉T + 2Eε2nsparse + ϒZtotal

+ Op(nminus14

sparse(Eε2)12)+ op(nminus12

sparse) (26)

in the sense of stable convergence where Ztotal is asymptoti-cally standard normal and where the variance has the form

ϒ2 = 4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

︸ ︷︷ ︸due to noise

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

︸ ︷︷ ︸

due to discretization

(27)

Seen from this angle there is scope for using the real-ized volatility [YY]H to estimate 〈XX〉 There is a bias2Eε2nsparse but the bias goes down if one uses fewer obser-vations This then is consistent with the practice in empiricalfinance where the estimator used is not [YY](all) but instead[YY](sparse) by sampling sparsely For instance faced withdata sampled every few seconds empirical researchers wouldtypically use square returns (ie differences of log-prices Y)over say 5- 15- or 30-minute time intervals The intuitive ra-tionale for using this estimator is to attempt to control the biasof the estimator this can be assessed based on our formula (26)replacing the original sample size n by the lower number re-flecting the sparse sampling But one should avoid sampling toosparsely because formula (27) shows that decreasing nsparse hasthe effect of increasing the variance of the estimator via the dis-cretization effect which is proportional to nminus1

sparse Based on ourformulas this trade-off between sampling too often and sam-pling too rarely can be formalized and an optimal frequency atwhich to sample sparsely can be determined

It is natural to minimize the MSE of [YY](sparse)

MSE = (2nsparseEε2)2

+

4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

(28)

One has to imagine that the original sample size n is quite largeso that nsparse lt n In this case minimizing the MSE (28) meansthat one should choose nsparse to satisfy partMSEpartnsparse asymp 0 inother words

8nsparse(Eε2)2 + 4Eε4 minus T

n2sparse

int T

02Hprime(t)σ 4

t dt asymp 0 (29)

or

n3sparse + 1

2n2

1Eε4

(Eε2)2minus (Eε2)minus2 T

8

int T

02Hprime(t)σ 4

t dt asymp 0 (30)

Finally still under the conditions on Proposition 1 the opti-mum nlowast

sparse becomes

nlowastsparse = (Eε2)minus23

(T

8

int T

02Hprime(t)σ 4

t dt

)13

(1 + op(1))

as Eε2 rarr 0 (31)

Of course if the actual sample size n were smaller than nlowastsparse

then one would simply take nlowastsparse = n but this is unlikely to

occur for heavily traded stockEquation (31) is the formal statement saying that one can

sample more frequently when the error spread is small Notefrom (28) that to first order the final trade-off is between thebias 2nsparseEε2 and the variance due to discretization The ef-fect of the variance associated with Znoise is of lower order whencomparing nsparse and Eε2 It should be emphasized that (31) isa feasible way of choosing nsparse One can estimate Eε2 usingall of the data following the procedure in Section 22 The inte-gral

int T0 2Hprime(t)σ 4

t dt can be estimated by the methods discussedin Section 6

Hence if one decides to address the problem by selectinga lower frequency of observation then one can do so by sub-sampling the full grid G at an arbitrary sparse frequency anduse [YY](sparse) Alternatively one can use nsparse as optimallydetermined by (31) We denote the corresponding estimator as[YY](sparseopt) These are our fourth- and third-best estimators

3 SAMPLING SPARSELY WHILE USING ALL OFTHE DATA SUBSAMPLING AND AVERAGING

OVER MULTIPLE GRIDS

31 Multiple Grids and Sufficiency

We have argued in the previous section that one can indeedbenefit from using infrequently sampled data And yet one ofthe most basic lessons of statistics is that one should not dothis We present here two ways of tackling the problem Insteadof selecting (arbitrarily or optimally) a subsample our meth-ods are based on selecting a number of subgrids of the originalgrid of observation times G = t0 tn and then averagingthe estimators derived from the subgrids The principle is thatto the extent that there is a benefit to subsampling this benefitcan now be retained whereas the variation of the estimator canbe lessened by the averaging The benefit of averaging is clearfrom sufficiency considerations and many statisticians wouldsay that subsampling without subsequent averaging is inferen-tially incorrect

In what follows we first introduce a set of notations then turnto studying the realized volatility in the multigrid context InSection 4 we show how to eliminate the bias of the estimator byusing two time scales a combination of single grid and multiplegrids

1400 Journal of the American Statistical Association December 2005

32 Notation for the Multiple Grids

We specifically suppose that the full grid G G = t0 tnas in (13) is partitioned into K nonoverlapping subgrids G(k)k = 1 K in other words

G =K⋃

k=1

G(k) where G(k) cap G(l) = empty when k 13= l

For most purposes the natural way to select the kth subgrid G(k)

is to start with tkminus1 and then pick every Kth sample point afterthat until T That is

G(k) = tkminus1 tkminus1+K tkminus1+2K tkminus1+nkK

for k = 1 K where nk is the integer making tkminus1+nkK thelast element in G(k) We refer to this as regular allocation ofsample points to subgrids

Whether the allocation is regular or not we let nk = |G(k)| Asin Definition 2 n = |G| Recall that the realized volatility basedon all observation points G is written as [YY](all)

T Meanwhileif one uses only the subsampled observations Yt t isin G(k) thenthe realized volatility is denoted by [YY](k)T It has the form

[YY](k)T =sum

tjtj+isinG(k)

(Ytj+ minus Ytj

)2

where if ti isin G(k) then ti+ denotes the following elementsin G(k)

A natural competitor to [YY](all)T [YY](sparse)

T and

[YY](sparseopt)T is then given by

[YY](avg)T = 1

K

Ksum

k=1

[YY](k)T (32)

and this is the statistic we analyze in the what following Asbefore we fix T and use only the observations within the timeperiod [0T] Asymptotics are still under (17) and under

as n rarr infin and nK rarr infin (33)

In general the nk need not be the same across k We define

n = 1

K

Ksum

k=1

nk = n minus K + 1

K (34)

33 Error Due to the Noise [Y Y ](avg)T minus [X X ](avg)

T

Recall that we are interested in determining the integratedvolatility 〈XX〉T or quadratic variation of the true but un-observable returns As an intermediate step here we studyhow well the ldquopooledrdquo realized volatility [YY](avg)

T approx-imates [XX](avg)

T where the latter is the ldquopooledrdquo true inte-grated volatility when X is considered only on the discrete timescale

From (18) and (32)

E([YY](avg)

T

∣∣X process

)= [XX](avg)T + 2nEε2 (35)

Also because εt t isin G(k) are independent for different k

var([YY](avg)

T

∣∣X process

) = 1

K2

Ksum

k=1

var([YY](k)T

∣∣X process

)

= 4n

KEε4 + Op

(1

K

)

(36)

in the same way as in (19) Incorporating the next-order term inthe variance yields that

var([YY](avg)

T

∣∣X)

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2 var(ε2)]

+ op

(1

K

)

(37)

as in (20)By Theorem A1 in Section A2 the conditional asymptotics

for the estimator [YY](avg)T are as follows

Theorem 1 Suppose that X is an Itocirc process of form (1)Suppose that Y is related to X through model (4) and that (11)is satisfied with Eε4 lt infin Also suppose that ti and ti+1 arenot in the same subgrid for any i Under assumption (33) asn rarr infinradic

K

n

([YY](avg)T minus [XX](avg)

T minus 2nEε2) Lminusrarr 2radic

Eε4Z(avg)

noise

(38)

conditional on the X process where Z(avg)

noise is standard normal

This can be compared with the result stated in (24) Noticethat Z(avg)

noise in (38) is almost never the same as Znoise in (24)in particular cov(ZnoiseZ(avg)

noise ) = var(ε2)Eε4 based on theproof of Theorem A1 in the Appendix

In comparison with the realized volatility using the fullgrid G the aggregated estimator [YY](avg)

T provides an im-provement in that both the asymptotic bias and variance areof smaller order of n compare (18) and (19) We use this inSection 4

34 Error Due to the Discretization Effect[X X ](avg)

T minus 〈X X 〉T

In this section we study the impact of the time discretizationIn other words we investigate the deviation of [XX](avg)

T fromthe integrated volatility 〈XX〉T of the true process Denote thediscretization effect by DT where

Dt = [XX](avg)t minus 〈XX〉t

= 1

K

Ksum

k=1

([XX](k)t minus 〈XX〉t) (39)

with

[XX](k)t =sum

titi+isinG(k)ti+let

(Xti+ minus Xti

)2 (40)

In what follows we consider the asymptotics of DT The prob-lem is similar to that of finding the limit of [XX](all)

T minus〈XX〉T

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 3: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1396 Journal of the American Statistical Association December 2005

121 The Fifth-Best Approach Completely Ignoring theNoise The naiumlve estimator of quadratic variation is [YY](all)

T which is the realized volatility based on the entire sample Weshow in Section 2 that (5) holds We therefore conclude that re-alized volatility [YY](all)

T is not a reliable estimator for the truevariation 〈XX〉T of the returns For large n the realized volatil-ity diverges to infinity linearly in n Scaled by (2n)minus1 it esti-mates consistently the variance of microstructure noise Eε2rather than the object of interest 〈XX〉T Said differently mar-ket microstructure noise totally swamps the variance of theprice signal at the level of the realized volatility Our simula-tions in Section 7 also document this effect

122 The Fourth-Best Approach Sampling Sparsely atSome Lower Frequency Of course completely ignoring thenoise and sampling as prescribed by [YY](all)

T is not what em-pirical researchers do in practice Much if not all of the exist-ing literature seems to have settled on sampling sparsely thatis constructing lower-frequency returns from the available dataFor example using essentially the same exchange rate seriesthese somewhat ad hoc choices range from 5-minute intervalsto as long as 30 minutes That is the literature typically uses theestimator [YY](sparse)

T described in Section 23 This involvestaking a subsample of nsparse observations For example withT = 1 day or 65 hours of open trading for stocks and we startwith data sampled on average every t = 1 second then for thefull dataset n = Tt = 23400 but once we sample sparselyevery 5 minutes then we sample every 300th observation andnsparse = 78

The distribution of [YY](sparse)T is described by Lemma 1

and Proposition 1 both in Section 23 To first approximation

[YY](sparse)T

Lasymp〈XX〉T + 2nsparseEε2

︸ ︷︷ ︸bias due to noise

+[

4nsparseEε4

︸ ︷︷ ︸due to noise

+ 2T

nsparse

int T

0σ 4

t dt

︸ ︷︷ ︸due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (6)

Here Ztotal is a standard normal random variable The sym-bol ldquoasympLrdquo means that when multiplied by a suitable factorthe convergence is in law For precise statements consult Sec-tions 2ndash4

123 The Third-Best Approach Sampling Sparsely at anOptimally Determined Frequency Our results in Section 23show how if one insists on sampling sparsely it is possible todetermine an optimal sampling frequency instead of selectingthe frequency in a somewhat ad hoc manner as in the fourth-bestapproach In effect this is similar to the fourth-best approachexcept that the arbitrary nsparse is replaced by an optimally de-termined nlowast

sparse Section 23 shows how to minimize over nsparsethe MSE and calculate it for equidistant observations

nlowastsparse =

(T

4(Eε2)2

int T

0σ 4

t dt

)13

The more general version is given by (31)

We note that the third- and the fourth-best estimators whichrely on comparatively small sample sizes could benefit fromthe higher-order adjustments A suitable Edgeworth adjustmentis currently under investigation

124 The Second-Best Approach Subsampling and Averag-ing As discussed earlier even if one determines the optimalsampling frequency as we have just described it is still the casethat the data are not used to their full extent We therefore pro-pose in Section 3 the estimator [YY](avg)

T constructed by aver-aging the estimators [YY](k)T across K grids of average size nWe show in Section 35 that

[YY](avg)T

Lasymp〈XX〉T + 2nEε2︸ ︷︷ ︸

bias due to noise

+[

4n

KEε4

︸ ︷︷ ︸due to noise

+ 4T

3n

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (7)

where Ztotal is a standard normal term The above expressionis in the equidistantly sampled case the general expression isgiven by (51)ndash(52) in Section 35

As can be seen from the foregoing equation [YY](avg)T re-

mains a biased estimator of the quadratic variation 〈XX〉T ofthe true return process But the bias 2nEε2 now increases withthe average size of the subsamples and n le n Thus as faras the bias is concerned [YY](avg)

T is a better estimator than[YY](all)

T The optimal trade-off between the bias and variancefor the estimator [YY](avg)

T is described in Section 36 we setKlowast asymp nnlowast with nlowast determined in (53) namely (in the equidis-tantly sampled case)

nlowast =(

T

6(Eε2)2

int T

0σ 4

t dt

)13

(8)

125 The First-Best Approach Subsampling and Averagingand Bias-Correction Our final estimator is based on bias-correcting the estimator [YY](avg)

T This is the subject of Sec-tion 4 We show that a bias-adjusted estimator for 〈XX〉T canbe constructed as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T

that is by combining estimators obtained over the two timescales ldquoallrdquo and ldquoavgrdquo A small-sample adjustment

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T

is given in (64) in Section 42 which shares the same asymp-totic distribution as 〈XX〉T to the order considered

We show in Theorem 4 in Section 41 that if the number ofsubgrids is selected as K = cn23 then

〈XX〉TLasymp〈XX〉T

+ 1

n16

[8

c2(Eε2)2

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal (9)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1397

Unlike all of the previously considered estimators this estima-tor is now correctly centered at 〈XX〉T It converges only atrate nminus16 but from a MSE perspective this is better than be-ing (badly) biased In particular the prescription is to use allthe available data so there is no limit to how often one shouldsample every few seconds or more often would be typical intick-by-tick financial data so n can be quite large Based onSection 42 we use an estimate of the optimal value

clowast =(

T

12(Eε2)2

int T

0σ 4

t dt

)minus13

(10)

2 ANALYSIS OF REALIZED VOLATILITY UNDERMARKET MICROSTRUCTURE NOISE

21 Setup

To spell out the foregoing model we let Y be the logarithmof the transaction price which is observed at times 0 = t0t1 tn = T We assume that at these times Y is related toa latent true price X (also in logarithmic scale) through (4) Thelatent price X is given in (1) The noise εti satisfies the assump-tion

εti iid with Eεti = 0 and var(εti) = Eε2also ε |= X process (11)

where |= denotes independence between two random quanti-ties Our modeling as in (4) does not require that εt exists forevery t in other words our interest in the noise is only at theobservation times ti

For the moment we focus on determining the integratedvolatility of X for one time period [0T] This is also known asthe continuous quadratic variation 〈XX〉 of X In other words

〈XX〉T =int T

0σ 2

t dt (12)

To succinctly describe the realized volatility we use the no-tions of grid and observed quadratic variation as follows

Definition 1 The full grid containing all of the observationpoints is given by

G = t0 tn (13)

Definition 2 We also consider arbitrary grids H sube G To de-note successive elements in such grids we proceed as followsIf ti isin H then timinus and ti+ denote the preceding and follow-ing elements in H ti will always denote the ith point in thefull grid G Hence when H = G timinus = timinus1 and ti+ = ti+1When H is a strict subgrid of G it will normally be the casethat timinus lt timinus1 and ti+ gt ti+1 Finally we take

|H| = ( points in grid H) minus 1 (14)

This is to say that |H| is the number of time increments (ti ti+]so that both endpoints are contained in H In particular |G| = n

Definition 3 The observed quadratic variation [middot middot] for ageneric process Z (such as Y or X) on an arbitrary grid H sube Gis given by

[ZZ]Ht =sum

tjtj+isinHtj+let

(Ztj+ minus Ztj)2 (15)

When the choice of grid follows from the context [ZZ]Ht maybe denoted as just [ZZ]t On the full grid G the quadratic vari-ation is given by

[ZZ](all)t = [ZZ]Gt =

sum

titi+1isinGti+1let

(Zti)2 (16)

where Zti = Zti+1 minus Zti Quadratic covariations are definedsimilarly (see eg Karatzas and Shreve 1991 Jacod andShiryaev 2003 for more details on quadratic variations)

Our first objective is to assess how well the realized volatil-ity [YY](all)

T approximates the integrated volatility 〈XX〉T ofthe true latent process In our asymptotic considerations wealways assume that the number of observations in [0T] goesto infinity and also that the maximum distance in time betweentwo consecutive observations goes to 0

maxi

ti rarr 0 as n rarr infin (17)

For the sake of preciseness it should be noted that whenn rarr infin we are dealing with a sequence Gn of grids Gn =t0n tnn and similarly for subgrids We have avoided us-ing double subscripts so as to not complicate the notation butthis is how all our conditions and results should be interpreted

Note finally that we have adhered to the time series conven-tion that Zti = Zti+1 minus Zti This is in contrast to the stochasticcalculus convention Zti = Zti minus Ztiminus1

22 The Realized Volatility An Estimator ofthe Variance of the Noise

Under the additive model Yti = Xti +εti the realized volatilitybased on the observed returns Yti now has the form

[YY](all)T = [XX](all)

T + 2[X ε](all)T + [ε ε](all)

T

This gives the conditional mean and variance of [YY](all)T

E([YY](all)

T

∣∣X process

)= [XX](all)T + 2nEε2 (18)

under assumption (11) Similarly

var([YY](all)

T

∣∣X process

)= 4nEε4 + Op(1) (19)

subject to condition (11) and Eε4ti = Eε4 lt infin for all i Subject

to slightly stronger conditions

var([YY](all)

T

∣∣X process

)

= 4nEε4 + (8[XX](all)T Eε2 minus 2 var(ε2)

)

+ Op(nminus12) (20)

It is also the case that as n rarr infin conditionally on the Xprocess we have asymptotic normality

nminus12([YY](all)T minus 2nEε2) Lminusrarr2(Eε4)12Znoise (21)

Here Znoise is standard normal with the subscript ldquonoiserdquo indi-cating that the randomness comes from the noise ε that is thedeviation of the observables Y from the true process X

The derivations of (19)ndash(20) along with the conditions forthe latter are provided in Section A1 The result (21) is derivedin Section A2 where it is part of Theorem A1

1398 Journal of the American Statistical Association December 2005

Equations (18) and (19) suggest that in the discrete worldwhere microstructure effects are unfortunately present realizedvolatility [YY](all)

T is not a reliable estimator for the true vari-ation [XX](all)

T of the returns For large n realized volatilitycould have little to do with the true returns Instead it relatesto the noise term Eε2 in the first order and Eε4 in the secondorder Also one can see from (18) that [YY](all)

T has a positivebias whose magnitude increases linearly with the sample size

Interestingly apart from revealing the biased nature of[YY](all)

T at high frequency our analysis also delivers a con-sistent estimator for the variance of the noise term In otherwords let

Eε2 = 1

2n[YY](all)

T

We have for a fixed true return process X

n12(Eε2 minus Eε2) rarr N(0Eε4) as n rarr infin (22)

see Theorem A1 in the AppendixBy the same methods as in Section A2 a consistent estima-

tor of the asymptotic variance of Eε2 is then given by

Eε4 = 1

2n

sum

i

(Yti)4 minus 3(Eε2)2 (23)

A question that naturally arises is why one is interested in thequadratic variation of X (either 〈XX〉 or [XX]) as opposed tothe quadratic variation of Y ([YY]) because ldquo〈YY〉rdquo wouldhave to be taken to be infinite in view of the foregoing Forexample in options pricing or hedging one could take the op-posite view that [YY] is the volatility that one actually faces

A main reason why we focus on the quadratic variation ofX is that the variation caused by the εrsquos is tied to each transac-tion as opposed to the price process of the underlying securityFrom the standpoint of trading the εrsquos represent trading costswhich are different from the costs created by the volatility ofthe underlying process Different market participants may evenface different types of trading costs depending on institutionalarrangements In any case for the purpose of integrating trad-ing costs into options prices it seems more natural to approachthe matter via the extensive literature on market microstructure(see OrsquoHara 1995 for a survey)

A related matter is that continuous finance would be diffi-cult to implement if one were to use the quadratic variationof Y If one were to use the quadratic variation [YY] then onewould also be using a quantity that would depend on the datafrequency

Finally apart from the specific application it is interestingto be able to say something about the underlying log returnprocess and to be able to separate this from the effects intro-duced by the mechanics of the trading process where the idio-syncratics of the trading arrangement play a role

23 The Optimal Sampling Frequency

We have just argued that the realized volatility estimates thewrong quantity This problem only gets worse when observa-tions are sampled more frequently Its financial interpretationboils down to market microstructure summarized by ε in (4)As the data record is sampled finely the change in true returnsgets smaller while the microstructure noise such as bidndashask

spread and transaction cost remains at the same magnitude Inother words when the sampling frequency is extremely highthe observed fluctuation in the returns process is more heavilycontaminated by microstructure noise and becomes less repre-sentative of the true variation 〈XX〉T of the returns Along thisline of discussion the broad opinion in financial applications isnot to sample too often at least when using realized volatilityWe now discuss how this can be viewed in the context of themodel (4) with stochastic volatility

Formally sparse sampling is implemented by sampling on asubgrid H of G with as before

[YY](sparse) = [YY](H) =sum

titi+isinH(Yti+ minus Yti)

2

For the moment we take the subgrid as given but later we con-sider how to optimize the sampling We call nsparse = |H|

To give a rationale for sparse sampling we propose an as-ymptotic where the law of ε although still iid is allowed tovary with n and nsparse Formally we suppose that the distri-bution of ε L(ε) is an element of the set D of all distribu-tions for which E(ε) = 0 and where E(ε2) and E(ε4)E(ε2)2

are bounded by arbitrary constants The asymptotic normalityin Section 22 then takes the following more nuanced form

Lemma 1 Suppose that X is an Itocirc process of form (1)where |microt| and σt are bounded above by a constant Supposethat for given n the grid Hn is given with nsparse rarr infin asn rarr infin and that for each n Y is related to X through model (4)Assume (11) where L(ε) isin D Like the grid Hn the law L(ε)

can depend on n (whereas the process X is fixed) Let (17) besatisfied for the sequence of grids Hn Then

[YY]HT = [XX]HT + 2nsparseEε2

+ (4nsparseEε4

+ (8[XX]HT Eε2 minus 2 var(ε2)))12

Znoise

+ Op(nminus14

sparse(Eε2)12) (24)

where Znoise is a quantity that is asymptotically standard nor-mal

The lemma is proved at the end of Section A2 Note now thatthe relative order of the terms in (24) depends on the quantitiesnsparse and Eε2 Therefore if Eε2 is small relative to nsparsethen [YY]HT is after all not an entirely absurd substitute for[XX]HT

One would be tempted to conclude that the optimal choice ofnsparse is to make it as small as possible But that would over-look the fact that the bigger the nsparse the closer the [XX]HTto the target integrated volatility 〈XX〉T from (12)

To quantify the overall error we now combine Lemma 1with the results on discretization error to study the total er-ror [YY]HT minus 〈XX〉T Following Rootzen (1980) Jacod andProtter (1998) Barndorff-Nielsen and Shephard (2002) andMykland and Zhang (2002) and under the conditions that theseauthors stated we can show that(

nsparse

T

)12

([XX]HT minus 〈XX〉T )

Lminusrarr(int T

02Hprime(t)σ 4

t dt

)12

times Zdiscrete (25)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1399

stably in law (We discuss this concept at the end of Sec 34)Zdiscrete is a standard normal random variable with the subscriptldquodiscreterdquo indicating that the randomness is due to the dis-cretization effect in [XX]HT when evaluating 〈XX〉T H(t) isthe asymptotic quadratic variation of time as discussed byMykland and Zhang (2002)

H(t) = limnrarrinfin

nsparse

T

sum

titi+isinHti+let

(ti+ minus ti)2

In the case of equidistant observations t0 = middot middot middot = tnminus1 =t = Tnsparse and Hprime(t) = 1 Because the εrsquos are independentof the X process Znoise is independent of Zdiscrete

For small Eε2 one now has an opportunity to estimate〈XX〉T It follows from our Lemma 1 and proposition 1 ofMykland and Zhang (2002) that

Proposition 1 Assume the conditions of Lemma 1 and alsothat maxtiti+isinH(ti+ minus ti) = O(1nsparse) Also assume Condi-tion E in Section A3 Then H is well defined and

[YY]HT = 〈XX〉T + 2Eε2nsparse + ϒZtotal

+ Op(nminus14

sparse(Eε2)12)+ op(nminus12

sparse) (26)

in the sense of stable convergence where Ztotal is asymptoti-cally standard normal and where the variance has the form

ϒ2 = 4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

︸ ︷︷ ︸due to noise

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

︸ ︷︷ ︸

due to discretization

(27)

Seen from this angle there is scope for using the real-ized volatility [YY]H to estimate 〈XX〉 There is a bias2Eε2nsparse but the bias goes down if one uses fewer obser-vations This then is consistent with the practice in empiricalfinance where the estimator used is not [YY](all) but instead[YY](sparse) by sampling sparsely For instance faced withdata sampled every few seconds empirical researchers wouldtypically use square returns (ie differences of log-prices Y)over say 5- 15- or 30-minute time intervals The intuitive ra-tionale for using this estimator is to attempt to control the biasof the estimator this can be assessed based on our formula (26)replacing the original sample size n by the lower number re-flecting the sparse sampling But one should avoid sampling toosparsely because formula (27) shows that decreasing nsparse hasthe effect of increasing the variance of the estimator via the dis-cretization effect which is proportional to nminus1

sparse Based on ourformulas this trade-off between sampling too often and sam-pling too rarely can be formalized and an optimal frequency atwhich to sample sparsely can be determined

It is natural to minimize the MSE of [YY](sparse)

MSE = (2nsparseEε2)2

+

4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

(28)

One has to imagine that the original sample size n is quite largeso that nsparse lt n In this case minimizing the MSE (28) meansthat one should choose nsparse to satisfy partMSEpartnsparse asymp 0 inother words

8nsparse(Eε2)2 + 4Eε4 minus T

n2sparse

int T

02Hprime(t)σ 4

t dt asymp 0 (29)

or

n3sparse + 1

2n2

1Eε4

(Eε2)2minus (Eε2)minus2 T

8

int T

02Hprime(t)σ 4

t dt asymp 0 (30)

Finally still under the conditions on Proposition 1 the opti-mum nlowast

sparse becomes

nlowastsparse = (Eε2)minus23

(T

8

int T

02Hprime(t)σ 4

t dt

)13

(1 + op(1))

as Eε2 rarr 0 (31)

Of course if the actual sample size n were smaller than nlowastsparse

then one would simply take nlowastsparse = n but this is unlikely to

occur for heavily traded stockEquation (31) is the formal statement saying that one can

sample more frequently when the error spread is small Notefrom (28) that to first order the final trade-off is between thebias 2nsparseEε2 and the variance due to discretization The ef-fect of the variance associated with Znoise is of lower order whencomparing nsparse and Eε2 It should be emphasized that (31) isa feasible way of choosing nsparse One can estimate Eε2 usingall of the data following the procedure in Section 22 The inte-gral

int T0 2Hprime(t)σ 4

t dt can be estimated by the methods discussedin Section 6

Hence if one decides to address the problem by selectinga lower frequency of observation then one can do so by sub-sampling the full grid G at an arbitrary sparse frequency anduse [YY](sparse) Alternatively one can use nsparse as optimallydetermined by (31) We denote the corresponding estimator as[YY](sparseopt) These are our fourth- and third-best estimators

3 SAMPLING SPARSELY WHILE USING ALL OFTHE DATA SUBSAMPLING AND AVERAGING

OVER MULTIPLE GRIDS

31 Multiple Grids and Sufficiency

We have argued in the previous section that one can indeedbenefit from using infrequently sampled data And yet one ofthe most basic lessons of statistics is that one should not dothis We present here two ways of tackling the problem Insteadof selecting (arbitrarily or optimally) a subsample our meth-ods are based on selecting a number of subgrids of the originalgrid of observation times G = t0 tn and then averagingthe estimators derived from the subgrids The principle is thatto the extent that there is a benefit to subsampling this benefitcan now be retained whereas the variation of the estimator canbe lessened by the averaging The benefit of averaging is clearfrom sufficiency considerations and many statisticians wouldsay that subsampling without subsequent averaging is inferen-tially incorrect

In what follows we first introduce a set of notations then turnto studying the realized volatility in the multigrid context InSection 4 we show how to eliminate the bias of the estimator byusing two time scales a combination of single grid and multiplegrids

1400 Journal of the American Statistical Association December 2005

32 Notation for the Multiple Grids

We specifically suppose that the full grid G G = t0 tnas in (13) is partitioned into K nonoverlapping subgrids G(k)k = 1 K in other words

G =K⋃

k=1

G(k) where G(k) cap G(l) = empty when k 13= l

For most purposes the natural way to select the kth subgrid G(k)

is to start with tkminus1 and then pick every Kth sample point afterthat until T That is

G(k) = tkminus1 tkminus1+K tkminus1+2K tkminus1+nkK

for k = 1 K where nk is the integer making tkminus1+nkK thelast element in G(k) We refer to this as regular allocation ofsample points to subgrids

Whether the allocation is regular or not we let nk = |G(k)| Asin Definition 2 n = |G| Recall that the realized volatility basedon all observation points G is written as [YY](all)

T Meanwhileif one uses only the subsampled observations Yt t isin G(k) thenthe realized volatility is denoted by [YY](k)T It has the form

[YY](k)T =sum

tjtj+isinG(k)

(Ytj+ minus Ytj

)2

where if ti isin G(k) then ti+ denotes the following elementsin G(k)

A natural competitor to [YY](all)T [YY](sparse)

T and

[YY](sparseopt)T is then given by

[YY](avg)T = 1

K

Ksum

k=1

[YY](k)T (32)

and this is the statistic we analyze in the what following Asbefore we fix T and use only the observations within the timeperiod [0T] Asymptotics are still under (17) and under

as n rarr infin and nK rarr infin (33)

In general the nk need not be the same across k We define

n = 1

K

Ksum

k=1

nk = n minus K + 1

K (34)

33 Error Due to the Noise [Y Y ](avg)T minus [X X ](avg)

T

Recall that we are interested in determining the integratedvolatility 〈XX〉T or quadratic variation of the true but un-observable returns As an intermediate step here we studyhow well the ldquopooledrdquo realized volatility [YY](avg)

T approx-imates [XX](avg)

T where the latter is the ldquopooledrdquo true inte-grated volatility when X is considered only on the discrete timescale

From (18) and (32)

E([YY](avg)

T

∣∣X process

)= [XX](avg)T + 2nEε2 (35)

Also because εt t isin G(k) are independent for different k

var([YY](avg)

T

∣∣X process

) = 1

K2

Ksum

k=1

var([YY](k)T

∣∣X process

)

= 4n

KEε4 + Op

(1

K

)

(36)

in the same way as in (19) Incorporating the next-order term inthe variance yields that

var([YY](avg)

T

∣∣X)

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2 var(ε2)]

+ op

(1

K

)

(37)

as in (20)By Theorem A1 in Section A2 the conditional asymptotics

for the estimator [YY](avg)T are as follows

Theorem 1 Suppose that X is an Itocirc process of form (1)Suppose that Y is related to X through model (4) and that (11)is satisfied with Eε4 lt infin Also suppose that ti and ti+1 arenot in the same subgrid for any i Under assumption (33) asn rarr infinradic

K

n

([YY](avg)T minus [XX](avg)

T minus 2nEε2) Lminusrarr 2radic

Eε4Z(avg)

noise

(38)

conditional on the X process where Z(avg)

noise is standard normal

This can be compared with the result stated in (24) Noticethat Z(avg)

noise in (38) is almost never the same as Znoise in (24)in particular cov(ZnoiseZ(avg)

noise ) = var(ε2)Eε4 based on theproof of Theorem A1 in the Appendix

In comparison with the realized volatility using the fullgrid G the aggregated estimator [YY](avg)

T provides an im-provement in that both the asymptotic bias and variance areof smaller order of n compare (18) and (19) We use this inSection 4

34 Error Due to the Discretization Effect[X X ](avg)

T minus 〈X X 〉T

In this section we study the impact of the time discretizationIn other words we investigate the deviation of [XX](avg)

T fromthe integrated volatility 〈XX〉T of the true process Denote thediscretization effect by DT where

Dt = [XX](avg)t minus 〈XX〉t

= 1

K

Ksum

k=1

([XX](k)t minus 〈XX〉t) (39)

with

[XX](k)t =sum

titi+isinG(k)ti+let

(Xti+ minus Xti

)2 (40)

In what follows we consider the asymptotics of DT The prob-lem is similar to that of finding the limit of [XX](all)

T minus〈XX〉T

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 4: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1397

Unlike all of the previously considered estimators this estima-tor is now correctly centered at 〈XX〉T It converges only atrate nminus16 but from a MSE perspective this is better than be-ing (badly) biased In particular the prescription is to use allthe available data so there is no limit to how often one shouldsample every few seconds or more often would be typical intick-by-tick financial data so n can be quite large Based onSection 42 we use an estimate of the optimal value

clowast =(

T

12(Eε2)2

int T

0σ 4

t dt

)minus13

(10)

2 ANALYSIS OF REALIZED VOLATILITY UNDERMARKET MICROSTRUCTURE NOISE

21 Setup

To spell out the foregoing model we let Y be the logarithmof the transaction price which is observed at times 0 = t0t1 tn = T We assume that at these times Y is related toa latent true price X (also in logarithmic scale) through (4) Thelatent price X is given in (1) The noise εti satisfies the assump-tion

εti iid with Eεti = 0 and var(εti) = Eε2also ε |= X process (11)

where |= denotes independence between two random quanti-ties Our modeling as in (4) does not require that εt exists forevery t in other words our interest in the noise is only at theobservation times ti

For the moment we focus on determining the integratedvolatility of X for one time period [0T] This is also known asthe continuous quadratic variation 〈XX〉 of X In other words

〈XX〉T =int T

0σ 2

t dt (12)

To succinctly describe the realized volatility we use the no-tions of grid and observed quadratic variation as follows

Definition 1 The full grid containing all of the observationpoints is given by

G = t0 tn (13)

Definition 2 We also consider arbitrary grids H sube G To de-note successive elements in such grids we proceed as followsIf ti isin H then timinus and ti+ denote the preceding and follow-ing elements in H ti will always denote the ith point in thefull grid G Hence when H = G timinus = timinus1 and ti+ = ti+1When H is a strict subgrid of G it will normally be the casethat timinus lt timinus1 and ti+ gt ti+1 Finally we take

|H| = ( points in grid H) minus 1 (14)

This is to say that |H| is the number of time increments (ti ti+]so that both endpoints are contained in H In particular |G| = n

Definition 3 The observed quadratic variation [middot middot] for ageneric process Z (such as Y or X) on an arbitrary grid H sube Gis given by

[ZZ]Ht =sum

tjtj+isinHtj+let

(Ztj+ minus Ztj)2 (15)

When the choice of grid follows from the context [ZZ]Ht maybe denoted as just [ZZ]t On the full grid G the quadratic vari-ation is given by

[ZZ](all)t = [ZZ]Gt =

sum

titi+1isinGti+1let

(Zti)2 (16)

where Zti = Zti+1 minus Zti Quadratic covariations are definedsimilarly (see eg Karatzas and Shreve 1991 Jacod andShiryaev 2003 for more details on quadratic variations)

Our first objective is to assess how well the realized volatil-ity [YY](all)

T approximates the integrated volatility 〈XX〉T ofthe true latent process In our asymptotic considerations wealways assume that the number of observations in [0T] goesto infinity and also that the maximum distance in time betweentwo consecutive observations goes to 0

maxi

ti rarr 0 as n rarr infin (17)

For the sake of preciseness it should be noted that whenn rarr infin we are dealing with a sequence Gn of grids Gn =t0n tnn and similarly for subgrids We have avoided us-ing double subscripts so as to not complicate the notation butthis is how all our conditions and results should be interpreted

Note finally that we have adhered to the time series conven-tion that Zti = Zti+1 minus Zti This is in contrast to the stochasticcalculus convention Zti = Zti minus Ztiminus1

22 The Realized Volatility An Estimator ofthe Variance of the Noise

Under the additive model Yti = Xti +εti the realized volatilitybased on the observed returns Yti now has the form

[YY](all)T = [XX](all)

T + 2[X ε](all)T + [ε ε](all)

T

This gives the conditional mean and variance of [YY](all)T

E([YY](all)

T

∣∣X process

)= [XX](all)T + 2nEε2 (18)

under assumption (11) Similarly

var([YY](all)

T

∣∣X process

)= 4nEε4 + Op(1) (19)

subject to condition (11) and Eε4ti = Eε4 lt infin for all i Subject

to slightly stronger conditions

var([YY](all)

T

∣∣X process

)

= 4nEε4 + (8[XX](all)T Eε2 minus 2 var(ε2)

)

+ Op(nminus12) (20)

It is also the case that as n rarr infin conditionally on the Xprocess we have asymptotic normality

nminus12([YY](all)T minus 2nEε2) Lminusrarr2(Eε4)12Znoise (21)

Here Znoise is standard normal with the subscript ldquonoiserdquo indi-cating that the randomness comes from the noise ε that is thedeviation of the observables Y from the true process X

The derivations of (19)ndash(20) along with the conditions forthe latter are provided in Section A1 The result (21) is derivedin Section A2 where it is part of Theorem A1

1398 Journal of the American Statistical Association December 2005

Equations (18) and (19) suggest that in the discrete worldwhere microstructure effects are unfortunately present realizedvolatility [YY](all)

T is not a reliable estimator for the true vari-ation [XX](all)

T of the returns For large n realized volatilitycould have little to do with the true returns Instead it relatesto the noise term Eε2 in the first order and Eε4 in the secondorder Also one can see from (18) that [YY](all)

T has a positivebias whose magnitude increases linearly with the sample size

Interestingly apart from revealing the biased nature of[YY](all)

T at high frequency our analysis also delivers a con-sistent estimator for the variance of the noise term In otherwords let

Eε2 = 1

2n[YY](all)

T

We have for a fixed true return process X

n12(Eε2 minus Eε2) rarr N(0Eε4) as n rarr infin (22)

see Theorem A1 in the AppendixBy the same methods as in Section A2 a consistent estima-

tor of the asymptotic variance of Eε2 is then given by

Eε4 = 1

2n

sum

i

(Yti)4 minus 3(Eε2)2 (23)

A question that naturally arises is why one is interested in thequadratic variation of X (either 〈XX〉 or [XX]) as opposed tothe quadratic variation of Y ([YY]) because ldquo〈YY〉rdquo wouldhave to be taken to be infinite in view of the foregoing Forexample in options pricing or hedging one could take the op-posite view that [YY] is the volatility that one actually faces

A main reason why we focus on the quadratic variation ofX is that the variation caused by the εrsquos is tied to each transac-tion as opposed to the price process of the underlying securityFrom the standpoint of trading the εrsquos represent trading costswhich are different from the costs created by the volatility ofthe underlying process Different market participants may evenface different types of trading costs depending on institutionalarrangements In any case for the purpose of integrating trad-ing costs into options prices it seems more natural to approachthe matter via the extensive literature on market microstructure(see OrsquoHara 1995 for a survey)

A related matter is that continuous finance would be diffi-cult to implement if one were to use the quadratic variationof Y If one were to use the quadratic variation [YY] then onewould also be using a quantity that would depend on the datafrequency

Finally apart from the specific application it is interestingto be able to say something about the underlying log returnprocess and to be able to separate this from the effects intro-duced by the mechanics of the trading process where the idio-syncratics of the trading arrangement play a role

23 The Optimal Sampling Frequency

We have just argued that the realized volatility estimates thewrong quantity This problem only gets worse when observa-tions are sampled more frequently Its financial interpretationboils down to market microstructure summarized by ε in (4)As the data record is sampled finely the change in true returnsgets smaller while the microstructure noise such as bidndashask

spread and transaction cost remains at the same magnitude Inother words when the sampling frequency is extremely highthe observed fluctuation in the returns process is more heavilycontaminated by microstructure noise and becomes less repre-sentative of the true variation 〈XX〉T of the returns Along thisline of discussion the broad opinion in financial applications isnot to sample too often at least when using realized volatilityWe now discuss how this can be viewed in the context of themodel (4) with stochastic volatility

Formally sparse sampling is implemented by sampling on asubgrid H of G with as before

[YY](sparse) = [YY](H) =sum

titi+isinH(Yti+ minus Yti)

2

For the moment we take the subgrid as given but later we con-sider how to optimize the sampling We call nsparse = |H|

To give a rationale for sparse sampling we propose an as-ymptotic where the law of ε although still iid is allowed tovary with n and nsparse Formally we suppose that the distri-bution of ε L(ε) is an element of the set D of all distribu-tions for which E(ε) = 0 and where E(ε2) and E(ε4)E(ε2)2

are bounded by arbitrary constants The asymptotic normalityin Section 22 then takes the following more nuanced form

Lemma 1 Suppose that X is an Itocirc process of form (1)where |microt| and σt are bounded above by a constant Supposethat for given n the grid Hn is given with nsparse rarr infin asn rarr infin and that for each n Y is related to X through model (4)Assume (11) where L(ε) isin D Like the grid Hn the law L(ε)

can depend on n (whereas the process X is fixed) Let (17) besatisfied for the sequence of grids Hn Then

[YY]HT = [XX]HT + 2nsparseEε2

+ (4nsparseEε4

+ (8[XX]HT Eε2 minus 2 var(ε2)))12

Znoise

+ Op(nminus14

sparse(Eε2)12) (24)

where Znoise is a quantity that is asymptotically standard nor-mal

The lemma is proved at the end of Section A2 Note now thatthe relative order of the terms in (24) depends on the quantitiesnsparse and Eε2 Therefore if Eε2 is small relative to nsparsethen [YY]HT is after all not an entirely absurd substitute for[XX]HT

One would be tempted to conclude that the optimal choice ofnsparse is to make it as small as possible But that would over-look the fact that the bigger the nsparse the closer the [XX]HTto the target integrated volatility 〈XX〉T from (12)

To quantify the overall error we now combine Lemma 1with the results on discretization error to study the total er-ror [YY]HT minus 〈XX〉T Following Rootzen (1980) Jacod andProtter (1998) Barndorff-Nielsen and Shephard (2002) andMykland and Zhang (2002) and under the conditions that theseauthors stated we can show that(

nsparse

T

)12

([XX]HT minus 〈XX〉T )

Lminusrarr(int T

02Hprime(t)σ 4

t dt

)12

times Zdiscrete (25)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1399

stably in law (We discuss this concept at the end of Sec 34)Zdiscrete is a standard normal random variable with the subscriptldquodiscreterdquo indicating that the randomness is due to the dis-cretization effect in [XX]HT when evaluating 〈XX〉T H(t) isthe asymptotic quadratic variation of time as discussed byMykland and Zhang (2002)

H(t) = limnrarrinfin

nsparse

T

sum

titi+isinHti+let

(ti+ minus ti)2

In the case of equidistant observations t0 = middot middot middot = tnminus1 =t = Tnsparse and Hprime(t) = 1 Because the εrsquos are independentof the X process Znoise is independent of Zdiscrete

For small Eε2 one now has an opportunity to estimate〈XX〉T It follows from our Lemma 1 and proposition 1 ofMykland and Zhang (2002) that

Proposition 1 Assume the conditions of Lemma 1 and alsothat maxtiti+isinH(ti+ minus ti) = O(1nsparse) Also assume Condi-tion E in Section A3 Then H is well defined and

[YY]HT = 〈XX〉T + 2Eε2nsparse + ϒZtotal

+ Op(nminus14

sparse(Eε2)12)+ op(nminus12

sparse) (26)

in the sense of stable convergence where Ztotal is asymptoti-cally standard normal and where the variance has the form

ϒ2 = 4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

︸ ︷︷ ︸due to noise

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

︸ ︷︷ ︸

due to discretization

(27)

Seen from this angle there is scope for using the real-ized volatility [YY]H to estimate 〈XX〉 There is a bias2Eε2nsparse but the bias goes down if one uses fewer obser-vations This then is consistent with the practice in empiricalfinance where the estimator used is not [YY](all) but instead[YY](sparse) by sampling sparsely For instance faced withdata sampled every few seconds empirical researchers wouldtypically use square returns (ie differences of log-prices Y)over say 5- 15- or 30-minute time intervals The intuitive ra-tionale for using this estimator is to attempt to control the biasof the estimator this can be assessed based on our formula (26)replacing the original sample size n by the lower number re-flecting the sparse sampling But one should avoid sampling toosparsely because formula (27) shows that decreasing nsparse hasthe effect of increasing the variance of the estimator via the dis-cretization effect which is proportional to nminus1

sparse Based on ourformulas this trade-off between sampling too often and sam-pling too rarely can be formalized and an optimal frequency atwhich to sample sparsely can be determined

It is natural to minimize the MSE of [YY](sparse)

MSE = (2nsparseEε2)2

+

4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

(28)

One has to imagine that the original sample size n is quite largeso that nsparse lt n In this case minimizing the MSE (28) meansthat one should choose nsparse to satisfy partMSEpartnsparse asymp 0 inother words

8nsparse(Eε2)2 + 4Eε4 minus T

n2sparse

int T

02Hprime(t)σ 4

t dt asymp 0 (29)

or

n3sparse + 1

2n2

1Eε4

(Eε2)2minus (Eε2)minus2 T

8

int T

02Hprime(t)σ 4

t dt asymp 0 (30)

Finally still under the conditions on Proposition 1 the opti-mum nlowast

sparse becomes

nlowastsparse = (Eε2)minus23

(T

8

int T

02Hprime(t)σ 4

t dt

)13

(1 + op(1))

as Eε2 rarr 0 (31)

Of course if the actual sample size n were smaller than nlowastsparse

then one would simply take nlowastsparse = n but this is unlikely to

occur for heavily traded stockEquation (31) is the formal statement saying that one can

sample more frequently when the error spread is small Notefrom (28) that to first order the final trade-off is between thebias 2nsparseEε2 and the variance due to discretization The ef-fect of the variance associated with Znoise is of lower order whencomparing nsparse and Eε2 It should be emphasized that (31) isa feasible way of choosing nsparse One can estimate Eε2 usingall of the data following the procedure in Section 22 The inte-gral

int T0 2Hprime(t)σ 4

t dt can be estimated by the methods discussedin Section 6

Hence if one decides to address the problem by selectinga lower frequency of observation then one can do so by sub-sampling the full grid G at an arbitrary sparse frequency anduse [YY](sparse) Alternatively one can use nsparse as optimallydetermined by (31) We denote the corresponding estimator as[YY](sparseopt) These are our fourth- and third-best estimators

3 SAMPLING SPARSELY WHILE USING ALL OFTHE DATA SUBSAMPLING AND AVERAGING

OVER MULTIPLE GRIDS

31 Multiple Grids and Sufficiency

We have argued in the previous section that one can indeedbenefit from using infrequently sampled data And yet one ofthe most basic lessons of statistics is that one should not dothis We present here two ways of tackling the problem Insteadof selecting (arbitrarily or optimally) a subsample our meth-ods are based on selecting a number of subgrids of the originalgrid of observation times G = t0 tn and then averagingthe estimators derived from the subgrids The principle is thatto the extent that there is a benefit to subsampling this benefitcan now be retained whereas the variation of the estimator canbe lessened by the averaging The benefit of averaging is clearfrom sufficiency considerations and many statisticians wouldsay that subsampling without subsequent averaging is inferen-tially incorrect

In what follows we first introduce a set of notations then turnto studying the realized volatility in the multigrid context InSection 4 we show how to eliminate the bias of the estimator byusing two time scales a combination of single grid and multiplegrids

1400 Journal of the American Statistical Association December 2005

32 Notation for the Multiple Grids

We specifically suppose that the full grid G G = t0 tnas in (13) is partitioned into K nonoverlapping subgrids G(k)k = 1 K in other words

G =K⋃

k=1

G(k) where G(k) cap G(l) = empty when k 13= l

For most purposes the natural way to select the kth subgrid G(k)

is to start with tkminus1 and then pick every Kth sample point afterthat until T That is

G(k) = tkminus1 tkminus1+K tkminus1+2K tkminus1+nkK

for k = 1 K where nk is the integer making tkminus1+nkK thelast element in G(k) We refer to this as regular allocation ofsample points to subgrids

Whether the allocation is regular or not we let nk = |G(k)| Asin Definition 2 n = |G| Recall that the realized volatility basedon all observation points G is written as [YY](all)

T Meanwhileif one uses only the subsampled observations Yt t isin G(k) thenthe realized volatility is denoted by [YY](k)T It has the form

[YY](k)T =sum

tjtj+isinG(k)

(Ytj+ minus Ytj

)2

where if ti isin G(k) then ti+ denotes the following elementsin G(k)

A natural competitor to [YY](all)T [YY](sparse)

T and

[YY](sparseopt)T is then given by

[YY](avg)T = 1

K

Ksum

k=1

[YY](k)T (32)

and this is the statistic we analyze in the what following Asbefore we fix T and use only the observations within the timeperiod [0T] Asymptotics are still under (17) and under

as n rarr infin and nK rarr infin (33)

In general the nk need not be the same across k We define

n = 1

K

Ksum

k=1

nk = n minus K + 1

K (34)

33 Error Due to the Noise [Y Y ](avg)T minus [X X ](avg)

T

Recall that we are interested in determining the integratedvolatility 〈XX〉T or quadratic variation of the true but un-observable returns As an intermediate step here we studyhow well the ldquopooledrdquo realized volatility [YY](avg)

T approx-imates [XX](avg)

T where the latter is the ldquopooledrdquo true inte-grated volatility when X is considered only on the discrete timescale

From (18) and (32)

E([YY](avg)

T

∣∣X process

)= [XX](avg)T + 2nEε2 (35)

Also because εt t isin G(k) are independent for different k

var([YY](avg)

T

∣∣X process

) = 1

K2

Ksum

k=1

var([YY](k)T

∣∣X process

)

= 4n

KEε4 + Op

(1

K

)

(36)

in the same way as in (19) Incorporating the next-order term inthe variance yields that

var([YY](avg)

T

∣∣X)

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2 var(ε2)]

+ op

(1

K

)

(37)

as in (20)By Theorem A1 in Section A2 the conditional asymptotics

for the estimator [YY](avg)T are as follows

Theorem 1 Suppose that X is an Itocirc process of form (1)Suppose that Y is related to X through model (4) and that (11)is satisfied with Eε4 lt infin Also suppose that ti and ti+1 arenot in the same subgrid for any i Under assumption (33) asn rarr infinradic

K

n

([YY](avg)T minus [XX](avg)

T minus 2nEε2) Lminusrarr 2radic

Eε4Z(avg)

noise

(38)

conditional on the X process where Z(avg)

noise is standard normal

This can be compared with the result stated in (24) Noticethat Z(avg)

noise in (38) is almost never the same as Znoise in (24)in particular cov(ZnoiseZ(avg)

noise ) = var(ε2)Eε4 based on theproof of Theorem A1 in the Appendix

In comparison with the realized volatility using the fullgrid G the aggregated estimator [YY](avg)

T provides an im-provement in that both the asymptotic bias and variance areof smaller order of n compare (18) and (19) We use this inSection 4

34 Error Due to the Discretization Effect[X X ](avg)

T minus 〈X X 〉T

In this section we study the impact of the time discretizationIn other words we investigate the deviation of [XX](avg)

T fromthe integrated volatility 〈XX〉T of the true process Denote thediscretization effect by DT where

Dt = [XX](avg)t minus 〈XX〉t

= 1

K

Ksum

k=1

([XX](k)t minus 〈XX〉t) (39)

with

[XX](k)t =sum

titi+isinG(k)ti+let

(Xti+ minus Xti

)2 (40)

In what follows we consider the asymptotics of DT The prob-lem is similar to that of finding the limit of [XX](all)

T minus〈XX〉T

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 5: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1398 Journal of the American Statistical Association December 2005

Equations (18) and (19) suggest that in the discrete worldwhere microstructure effects are unfortunately present realizedvolatility [YY](all)

T is not a reliable estimator for the true vari-ation [XX](all)

T of the returns For large n realized volatilitycould have little to do with the true returns Instead it relatesto the noise term Eε2 in the first order and Eε4 in the secondorder Also one can see from (18) that [YY](all)

T has a positivebias whose magnitude increases linearly with the sample size

Interestingly apart from revealing the biased nature of[YY](all)

T at high frequency our analysis also delivers a con-sistent estimator for the variance of the noise term In otherwords let

Eε2 = 1

2n[YY](all)

T

We have for a fixed true return process X

n12(Eε2 minus Eε2) rarr N(0Eε4) as n rarr infin (22)

see Theorem A1 in the AppendixBy the same methods as in Section A2 a consistent estima-

tor of the asymptotic variance of Eε2 is then given by

Eε4 = 1

2n

sum

i

(Yti)4 minus 3(Eε2)2 (23)

A question that naturally arises is why one is interested in thequadratic variation of X (either 〈XX〉 or [XX]) as opposed tothe quadratic variation of Y ([YY]) because ldquo〈YY〉rdquo wouldhave to be taken to be infinite in view of the foregoing Forexample in options pricing or hedging one could take the op-posite view that [YY] is the volatility that one actually faces

A main reason why we focus on the quadratic variation ofX is that the variation caused by the εrsquos is tied to each transac-tion as opposed to the price process of the underlying securityFrom the standpoint of trading the εrsquos represent trading costswhich are different from the costs created by the volatility ofthe underlying process Different market participants may evenface different types of trading costs depending on institutionalarrangements In any case for the purpose of integrating trad-ing costs into options prices it seems more natural to approachthe matter via the extensive literature on market microstructure(see OrsquoHara 1995 for a survey)

A related matter is that continuous finance would be diffi-cult to implement if one were to use the quadratic variationof Y If one were to use the quadratic variation [YY] then onewould also be using a quantity that would depend on the datafrequency

Finally apart from the specific application it is interestingto be able to say something about the underlying log returnprocess and to be able to separate this from the effects intro-duced by the mechanics of the trading process where the idio-syncratics of the trading arrangement play a role

23 The Optimal Sampling Frequency

We have just argued that the realized volatility estimates thewrong quantity This problem only gets worse when observa-tions are sampled more frequently Its financial interpretationboils down to market microstructure summarized by ε in (4)As the data record is sampled finely the change in true returnsgets smaller while the microstructure noise such as bidndashask

spread and transaction cost remains at the same magnitude Inother words when the sampling frequency is extremely highthe observed fluctuation in the returns process is more heavilycontaminated by microstructure noise and becomes less repre-sentative of the true variation 〈XX〉T of the returns Along thisline of discussion the broad opinion in financial applications isnot to sample too often at least when using realized volatilityWe now discuss how this can be viewed in the context of themodel (4) with stochastic volatility

Formally sparse sampling is implemented by sampling on asubgrid H of G with as before

[YY](sparse) = [YY](H) =sum

titi+isinH(Yti+ minus Yti)

2

For the moment we take the subgrid as given but later we con-sider how to optimize the sampling We call nsparse = |H|

To give a rationale for sparse sampling we propose an as-ymptotic where the law of ε although still iid is allowed tovary with n and nsparse Formally we suppose that the distri-bution of ε L(ε) is an element of the set D of all distribu-tions for which E(ε) = 0 and where E(ε2) and E(ε4)E(ε2)2

are bounded by arbitrary constants The asymptotic normalityin Section 22 then takes the following more nuanced form

Lemma 1 Suppose that X is an Itocirc process of form (1)where |microt| and σt are bounded above by a constant Supposethat for given n the grid Hn is given with nsparse rarr infin asn rarr infin and that for each n Y is related to X through model (4)Assume (11) where L(ε) isin D Like the grid Hn the law L(ε)

can depend on n (whereas the process X is fixed) Let (17) besatisfied for the sequence of grids Hn Then

[YY]HT = [XX]HT + 2nsparseEε2

+ (4nsparseEε4

+ (8[XX]HT Eε2 minus 2 var(ε2)))12

Znoise

+ Op(nminus14

sparse(Eε2)12) (24)

where Znoise is a quantity that is asymptotically standard nor-mal

The lemma is proved at the end of Section A2 Note now thatthe relative order of the terms in (24) depends on the quantitiesnsparse and Eε2 Therefore if Eε2 is small relative to nsparsethen [YY]HT is after all not an entirely absurd substitute for[XX]HT

One would be tempted to conclude that the optimal choice ofnsparse is to make it as small as possible But that would over-look the fact that the bigger the nsparse the closer the [XX]HTto the target integrated volatility 〈XX〉T from (12)

To quantify the overall error we now combine Lemma 1with the results on discretization error to study the total er-ror [YY]HT minus 〈XX〉T Following Rootzen (1980) Jacod andProtter (1998) Barndorff-Nielsen and Shephard (2002) andMykland and Zhang (2002) and under the conditions that theseauthors stated we can show that(

nsparse

T

)12

([XX]HT minus 〈XX〉T )

Lminusrarr(int T

02Hprime(t)σ 4

t dt

)12

times Zdiscrete (25)

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1399

stably in law (We discuss this concept at the end of Sec 34)Zdiscrete is a standard normal random variable with the subscriptldquodiscreterdquo indicating that the randomness is due to the dis-cretization effect in [XX]HT when evaluating 〈XX〉T H(t) isthe asymptotic quadratic variation of time as discussed byMykland and Zhang (2002)

H(t) = limnrarrinfin

nsparse

T

sum

titi+isinHti+let

(ti+ minus ti)2

In the case of equidistant observations t0 = middot middot middot = tnminus1 =t = Tnsparse and Hprime(t) = 1 Because the εrsquos are independentof the X process Znoise is independent of Zdiscrete

For small Eε2 one now has an opportunity to estimate〈XX〉T It follows from our Lemma 1 and proposition 1 ofMykland and Zhang (2002) that

Proposition 1 Assume the conditions of Lemma 1 and alsothat maxtiti+isinH(ti+ minus ti) = O(1nsparse) Also assume Condi-tion E in Section A3 Then H is well defined and

[YY]HT = 〈XX〉T + 2Eε2nsparse + ϒZtotal

+ Op(nminus14

sparse(Eε2)12)+ op(nminus12

sparse) (26)

in the sense of stable convergence where Ztotal is asymptoti-cally standard normal and where the variance has the form

ϒ2 = 4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

︸ ︷︷ ︸due to noise

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

︸ ︷︷ ︸

due to discretization

(27)

Seen from this angle there is scope for using the real-ized volatility [YY]H to estimate 〈XX〉 There is a bias2Eε2nsparse but the bias goes down if one uses fewer obser-vations This then is consistent with the practice in empiricalfinance where the estimator used is not [YY](all) but instead[YY](sparse) by sampling sparsely For instance faced withdata sampled every few seconds empirical researchers wouldtypically use square returns (ie differences of log-prices Y)over say 5- 15- or 30-minute time intervals The intuitive ra-tionale for using this estimator is to attempt to control the biasof the estimator this can be assessed based on our formula (26)replacing the original sample size n by the lower number re-flecting the sparse sampling But one should avoid sampling toosparsely because formula (27) shows that decreasing nsparse hasthe effect of increasing the variance of the estimator via the dis-cretization effect which is proportional to nminus1

sparse Based on ourformulas this trade-off between sampling too often and sam-pling too rarely can be formalized and an optimal frequency atwhich to sample sparsely can be determined

It is natural to minimize the MSE of [YY](sparse)

MSE = (2nsparseEε2)2

+

4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

(28)

One has to imagine that the original sample size n is quite largeso that nsparse lt n In this case minimizing the MSE (28) meansthat one should choose nsparse to satisfy partMSEpartnsparse asymp 0 inother words

8nsparse(Eε2)2 + 4Eε4 minus T

n2sparse

int T

02Hprime(t)σ 4

t dt asymp 0 (29)

or

n3sparse + 1

2n2

1Eε4

(Eε2)2minus (Eε2)minus2 T

8

int T

02Hprime(t)σ 4

t dt asymp 0 (30)

Finally still under the conditions on Proposition 1 the opti-mum nlowast

sparse becomes

nlowastsparse = (Eε2)minus23

(T

8

int T

02Hprime(t)σ 4

t dt

)13

(1 + op(1))

as Eε2 rarr 0 (31)

Of course if the actual sample size n were smaller than nlowastsparse

then one would simply take nlowastsparse = n but this is unlikely to

occur for heavily traded stockEquation (31) is the formal statement saying that one can

sample more frequently when the error spread is small Notefrom (28) that to first order the final trade-off is between thebias 2nsparseEε2 and the variance due to discretization The ef-fect of the variance associated with Znoise is of lower order whencomparing nsparse and Eε2 It should be emphasized that (31) isa feasible way of choosing nsparse One can estimate Eε2 usingall of the data following the procedure in Section 22 The inte-gral

int T0 2Hprime(t)σ 4

t dt can be estimated by the methods discussedin Section 6

Hence if one decides to address the problem by selectinga lower frequency of observation then one can do so by sub-sampling the full grid G at an arbitrary sparse frequency anduse [YY](sparse) Alternatively one can use nsparse as optimallydetermined by (31) We denote the corresponding estimator as[YY](sparseopt) These are our fourth- and third-best estimators

3 SAMPLING SPARSELY WHILE USING ALL OFTHE DATA SUBSAMPLING AND AVERAGING

OVER MULTIPLE GRIDS

31 Multiple Grids and Sufficiency

We have argued in the previous section that one can indeedbenefit from using infrequently sampled data And yet one ofthe most basic lessons of statistics is that one should not dothis We present here two ways of tackling the problem Insteadof selecting (arbitrarily or optimally) a subsample our meth-ods are based on selecting a number of subgrids of the originalgrid of observation times G = t0 tn and then averagingthe estimators derived from the subgrids The principle is thatto the extent that there is a benefit to subsampling this benefitcan now be retained whereas the variation of the estimator canbe lessened by the averaging The benefit of averaging is clearfrom sufficiency considerations and many statisticians wouldsay that subsampling without subsequent averaging is inferen-tially incorrect

In what follows we first introduce a set of notations then turnto studying the realized volatility in the multigrid context InSection 4 we show how to eliminate the bias of the estimator byusing two time scales a combination of single grid and multiplegrids

1400 Journal of the American Statistical Association December 2005

32 Notation for the Multiple Grids

We specifically suppose that the full grid G G = t0 tnas in (13) is partitioned into K nonoverlapping subgrids G(k)k = 1 K in other words

G =K⋃

k=1

G(k) where G(k) cap G(l) = empty when k 13= l

For most purposes the natural way to select the kth subgrid G(k)

is to start with tkminus1 and then pick every Kth sample point afterthat until T That is

G(k) = tkminus1 tkminus1+K tkminus1+2K tkminus1+nkK

for k = 1 K where nk is the integer making tkminus1+nkK thelast element in G(k) We refer to this as regular allocation ofsample points to subgrids

Whether the allocation is regular or not we let nk = |G(k)| Asin Definition 2 n = |G| Recall that the realized volatility basedon all observation points G is written as [YY](all)

T Meanwhileif one uses only the subsampled observations Yt t isin G(k) thenthe realized volatility is denoted by [YY](k)T It has the form

[YY](k)T =sum

tjtj+isinG(k)

(Ytj+ minus Ytj

)2

where if ti isin G(k) then ti+ denotes the following elementsin G(k)

A natural competitor to [YY](all)T [YY](sparse)

T and

[YY](sparseopt)T is then given by

[YY](avg)T = 1

K

Ksum

k=1

[YY](k)T (32)

and this is the statistic we analyze in the what following Asbefore we fix T and use only the observations within the timeperiod [0T] Asymptotics are still under (17) and under

as n rarr infin and nK rarr infin (33)

In general the nk need not be the same across k We define

n = 1

K

Ksum

k=1

nk = n minus K + 1

K (34)

33 Error Due to the Noise [Y Y ](avg)T minus [X X ](avg)

T

Recall that we are interested in determining the integratedvolatility 〈XX〉T or quadratic variation of the true but un-observable returns As an intermediate step here we studyhow well the ldquopooledrdquo realized volatility [YY](avg)

T approx-imates [XX](avg)

T where the latter is the ldquopooledrdquo true inte-grated volatility when X is considered only on the discrete timescale

From (18) and (32)

E([YY](avg)

T

∣∣X process

)= [XX](avg)T + 2nEε2 (35)

Also because εt t isin G(k) are independent for different k

var([YY](avg)

T

∣∣X process

) = 1

K2

Ksum

k=1

var([YY](k)T

∣∣X process

)

= 4n

KEε4 + Op

(1

K

)

(36)

in the same way as in (19) Incorporating the next-order term inthe variance yields that

var([YY](avg)

T

∣∣X)

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2 var(ε2)]

+ op

(1

K

)

(37)

as in (20)By Theorem A1 in Section A2 the conditional asymptotics

for the estimator [YY](avg)T are as follows

Theorem 1 Suppose that X is an Itocirc process of form (1)Suppose that Y is related to X through model (4) and that (11)is satisfied with Eε4 lt infin Also suppose that ti and ti+1 arenot in the same subgrid for any i Under assumption (33) asn rarr infinradic

K

n

([YY](avg)T minus [XX](avg)

T minus 2nEε2) Lminusrarr 2radic

Eε4Z(avg)

noise

(38)

conditional on the X process where Z(avg)

noise is standard normal

This can be compared with the result stated in (24) Noticethat Z(avg)

noise in (38) is almost never the same as Znoise in (24)in particular cov(ZnoiseZ(avg)

noise ) = var(ε2)Eε4 based on theproof of Theorem A1 in the Appendix

In comparison with the realized volatility using the fullgrid G the aggregated estimator [YY](avg)

T provides an im-provement in that both the asymptotic bias and variance areof smaller order of n compare (18) and (19) We use this inSection 4

34 Error Due to the Discretization Effect[X X ](avg)

T minus 〈X X 〉T

In this section we study the impact of the time discretizationIn other words we investigate the deviation of [XX](avg)

T fromthe integrated volatility 〈XX〉T of the true process Denote thediscretization effect by DT where

Dt = [XX](avg)t minus 〈XX〉t

= 1

K

Ksum

k=1

([XX](k)t minus 〈XX〉t) (39)

with

[XX](k)t =sum

titi+isinG(k)ti+let

(Xti+ minus Xti

)2 (40)

In what follows we consider the asymptotics of DT The prob-lem is similar to that of finding the limit of [XX](all)

T minus〈XX〉T

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 6: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1399

stably in law (We discuss this concept at the end of Sec 34)Zdiscrete is a standard normal random variable with the subscriptldquodiscreterdquo indicating that the randomness is due to the dis-cretization effect in [XX]HT when evaluating 〈XX〉T H(t) isthe asymptotic quadratic variation of time as discussed byMykland and Zhang (2002)

H(t) = limnrarrinfin

nsparse

T

sum

titi+isinHti+let

(ti+ minus ti)2

In the case of equidistant observations t0 = middot middot middot = tnminus1 =t = Tnsparse and Hprime(t) = 1 Because the εrsquos are independentof the X process Znoise is independent of Zdiscrete

For small Eε2 one now has an opportunity to estimate〈XX〉T It follows from our Lemma 1 and proposition 1 ofMykland and Zhang (2002) that

Proposition 1 Assume the conditions of Lemma 1 and alsothat maxtiti+isinH(ti+ minus ti) = O(1nsparse) Also assume Condi-tion E in Section A3 Then H is well defined and

[YY]HT = 〈XX〉T + 2Eε2nsparse + ϒZtotal

+ Op(nminus14

sparse(Eε2)12)+ op(nminus12

sparse) (26)

in the sense of stable convergence where Ztotal is asymptoti-cally standard normal and where the variance has the form

ϒ2 = 4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

︸ ︷︷ ︸due to noise

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

︸ ︷︷ ︸

due to discretization

(27)

Seen from this angle there is scope for using the real-ized volatility [YY]H to estimate 〈XX〉 There is a bias2Eε2nsparse but the bias goes down if one uses fewer obser-vations This then is consistent with the practice in empiricalfinance where the estimator used is not [YY](all) but instead[YY](sparse) by sampling sparsely For instance faced withdata sampled every few seconds empirical researchers wouldtypically use square returns (ie differences of log-prices Y)over say 5- 15- or 30-minute time intervals The intuitive ra-tionale for using this estimator is to attempt to control the biasof the estimator this can be assessed based on our formula (26)replacing the original sample size n by the lower number re-flecting the sparse sampling But one should avoid sampling toosparsely because formula (27) shows that decreasing nsparse hasthe effect of increasing the variance of the estimator via the dis-cretization effect which is proportional to nminus1

sparse Based on ourformulas this trade-off between sampling too often and sam-pling too rarely can be formalized and an optimal frequency atwhich to sample sparsely can be determined

It is natural to minimize the MSE of [YY](sparse)

MSE = (2nsparseEε2)2

+

4nsparseEε4 + (8[XX]HT Eε2 minus 2 var(ε2))

+ T

nsparse

int T

02Hprime(t)σ 4

t dt

(28)

One has to imagine that the original sample size n is quite largeso that nsparse lt n In this case minimizing the MSE (28) meansthat one should choose nsparse to satisfy partMSEpartnsparse asymp 0 inother words

8nsparse(Eε2)2 + 4Eε4 minus T

n2sparse

int T

02Hprime(t)σ 4

t dt asymp 0 (29)

or

n3sparse + 1

2n2

1Eε4

(Eε2)2minus (Eε2)minus2 T

8

int T

02Hprime(t)σ 4

t dt asymp 0 (30)

Finally still under the conditions on Proposition 1 the opti-mum nlowast

sparse becomes

nlowastsparse = (Eε2)minus23

(T

8

int T

02Hprime(t)σ 4

t dt

)13

(1 + op(1))

as Eε2 rarr 0 (31)

Of course if the actual sample size n were smaller than nlowastsparse

then one would simply take nlowastsparse = n but this is unlikely to

occur for heavily traded stockEquation (31) is the formal statement saying that one can

sample more frequently when the error spread is small Notefrom (28) that to first order the final trade-off is between thebias 2nsparseEε2 and the variance due to discretization The ef-fect of the variance associated with Znoise is of lower order whencomparing nsparse and Eε2 It should be emphasized that (31) isa feasible way of choosing nsparse One can estimate Eε2 usingall of the data following the procedure in Section 22 The inte-gral

int T0 2Hprime(t)σ 4

t dt can be estimated by the methods discussedin Section 6

Hence if one decides to address the problem by selectinga lower frequency of observation then one can do so by sub-sampling the full grid G at an arbitrary sparse frequency anduse [YY](sparse) Alternatively one can use nsparse as optimallydetermined by (31) We denote the corresponding estimator as[YY](sparseopt) These are our fourth- and third-best estimators

3 SAMPLING SPARSELY WHILE USING ALL OFTHE DATA SUBSAMPLING AND AVERAGING

OVER MULTIPLE GRIDS

31 Multiple Grids and Sufficiency

We have argued in the previous section that one can indeedbenefit from using infrequently sampled data And yet one ofthe most basic lessons of statistics is that one should not dothis We present here two ways of tackling the problem Insteadof selecting (arbitrarily or optimally) a subsample our meth-ods are based on selecting a number of subgrids of the originalgrid of observation times G = t0 tn and then averagingthe estimators derived from the subgrids The principle is thatto the extent that there is a benefit to subsampling this benefitcan now be retained whereas the variation of the estimator canbe lessened by the averaging The benefit of averaging is clearfrom sufficiency considerations and many statisticians wouldsay that subsampling without subsequent averaging is inferen-tially incorrect

In what follows we first introduce a set of notations then turnto studying the realized volatility in the multigrid context InSection 4 we show how to eliminate the bias of the estimator byusing two time scales a combination of single grid and multiplegrids

1400 Journal of the American Statistical Association December 2005

32 Notation for the Multiple Grids

We specifically suppose that the full grid G G = t0 tnas in (13) is partitioned into K nonoverlapping subgrids G(k)k = 1 K in other words

G =K⋃

k=1

G(k) where G(k) cap G(l) = empty when k 13= l

For most purposes the natural way to select the kth subgrid G(k)

is to start with tkminus1 and then pick every Kth sample point afterthat until T That is

G(k) = tkminus1 tkminus1+K tkminus1+2K tkminus1+nkK

for k = 1 K where nk is the integer making tkminus1+nkK thelast element in G(k) We refer to this as regular allocation ofsample points to subgrids

Whether the allocation is regular or not we let nk = |G(k)| Asin Definition 2 n = |G| Recall that the realized volatility basedon all observation points G is written as [YY](all)

T Meanwhileif one uses only the subsampled observations Yt t isin G(k) thenthe realized volatility is denoted by [YY](k)T It has the form

[YY](k)T =sum

tjtj+isinG(k)

(Ytj+ minus Ytj

)2

where if ti isin G(k) then ti+ denotes the following elementsin G(k)

A natural competitor to [YY](all)T [YY](sparse)

T and

[YY](sparseopt)T is then given by

[YY](avg)T = 1

K

Ksum

k=1

[YY](k)T (32)

and this is the statistic we analyze in the what following Asbefore we fix T and use only the observations within the timeperiod [0T] Asymptotics are still under (17) and under

as n rarr infin and nK rarr infin (33)

In general the nk need not be the same across k We define

n = 1

K

Ksum

k=1

nk = n minus K + 1

K (34)

33 Error Due to the Noise [Y Y ](avg)T minus [X X ](avg)

T

Recall that we are interested in determining the integratedvolatility 〈XX〉T or quadratic variation of the true but un-observable returns As an intermediate step here we studyhow well the ldquopooledrdquo realized volatility [YY](avg)

T approx-imates [XX](avg)

T where the latter is the ldquopooledrdquo true inte-grated volatility when X is considered only on the discrete timescale

From (18) and (32)

E([YY](avg)

T

∣∣X process

)= [XX](avg)T + 2nEε2 (35)

Also because εt t isin G(k) are independent for different k

var([YY](avg)

T

∣∣X process

) = 1

K2

Ksum

k=1

var([YY](k)T

∣∣X process

)

= 4n

KEε4 + Op

(1

K

)

(36)

in the same way as in (19) Incorporating the next-order term inthe variance yields that

var([YY](avg)

T

∣∣X)

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2 var(ε2)]

+ op

(1

K

)

(37)

as in (20)By Theorem A1 in Section A2 the conditional asymptotics

for the estimator [YY](avg)T are as follows

Theorem 1 Suppose that X is an Itocirc process of form (1)Suppose that Y is related to X through model (4) and that (11)is satisfied with Eε4 lt infin Also suppose that ti and ti+1 arenot in the same subgrid for any i Under assumption (33) asn rarr infinradic

K

n

([YY](avg)T minus [XX](avg)

T minus 2nEε2) Lminusrarr 2radic

Eε4Z(avg)

noise

(38)

conditional on the X process where Z(avg)

noise is standard normal

This can be compared with the result stated in (24) Noticethat Z(avg)

noise in (38) is almost never the same as Znoise in (24)in particular cov(ZnoiseZ(avg)

noise ) = var(ε2)Eε4 based on theproof of Theorem A1 in the Appendix

In comparison with the realized volatility using the fullgrid G the aggregated estimator [YY](avg)

T provides an im-provement in that both the asymptotic bias and variance areof smaller order of n compare (18) and (19) We use this inSection 4

34 Error Due to the Discretization Effect[X X ](avg)

T minus 〈X X 〉T

In this section we study the impact of the time discretizationIn other words we investigate the deviation of [XX](avg)

T fromthe integrated volatility 〈XX〉T of the true process Denote thediscretization effect by DT where

Dt = [XX](avg)t minus 〈XX〉t

= 1

K

Ksum

k=1

([XX](k)t minus 〈XX〉t) (39)

with

[XX](k)t =sum

titi+isinG(k)ti+let

(Xti+ minus Xti

)2 (40)

In what follows we consider the asymptotics of DT The prob-lem is similar to that of finding the limit of [XX](all)

T minus〈XX〉T

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 7: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1400 Journal of the American Statistical Association December 2005

32 Notation for the Multiple Grids

We specifically suppose that the full grid G G = t0 tnas in (13) is partitioned into K nonoverlapping subgrids G(k)k = 1 K in other words

G =K⋃

k=1

G(k) where G(k) cap G(l) = empty when k 13= l

For most purposes the natural way to select the kth subgrid G(k)

is to start with tkminus1 and then pick every Kth sample point afterthat until T That is

G(k) = tkminus1 tkminus1+K tkminus1+2K tkminus1+nkK

for k = 1 K where nk is the integer making tkminus1+nkK thelast element in G(k) We refer to this as regular allocation ofsample points to subgrids

Whether the allocation is regular or not we let nk = |G(k)| Asin Definition 2 n = |G| Recall that the realized volatility basedon all observation points G is written as [YY](all)

T Meanwhileif one uses only the subsampled observations Yt t isin G(k) thenthe realized volatility is denoted by [YY](k)T It has the form

[YY](k)T =sum

tjtj+isinG(k)

(Ytj+ minus Ytj

)2

where if ti isin G(k) then ti+ denotes the following elementsin G(k)

A natural competitor to [YY](all)T [YY](sparse)

T and

[YY](sparseopt)T is then given by

[YY](avg)T = 1

K

Ksum

k=1

[YY](k)T (32)

and this is the statistic we analyze in the what following Asbefore we fix T and use only the observations within the timeperiod [0T] Asymptotics are still under (17) and under

as n rarr infin and nK rarr infin (33)

In general the nk need not be the same across k We define

n = 1

K

Ksum

k=1

nk = n minus K + 1

K (34)

33 Error Due to the Noise [Y Y ](avg)T minus [X X ](avg)

T

Recall that we are interested in determining the integratedvolatility 〈XX〉T or quadratic variation of the true but un-observable returns As an intermediate step here we studyhow well the ldquopooledrdquo realized volatility [YY](avg)

T approx-imates [XX](avg)

T where the latter is the ldquopooledrdquo true inte-grated volatility when X is considered only on the discrete timescale

From (18) and (32)

E([YY](avg)

T

∣∣X process

)= [XX](avg)T + 2nEε2 (35)

Also because εt t isin G(k) are independent for different k

var([YY](avg)

T

∣∣X process

) = 1

K2

Ksum

k=1

var([YY](k)T

∣∣X process

)

= 4n

KEε4 + Op

(1

K

)

(36)

in the same way as in (19) Incorporating the next-order term inthe variance yields that

var([YY](avg)

T

∣∣X)

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2 var(ε2)]

+ op

(1

K

)

(37)

as in (20)By Theorem A1 in Section A2 the conditional asymptotics

for the estimator [YY](avg)T are as follows

Theorem 1 Suppose that X is an Itocirc process of form (1)Suppose that Y is related to X through model (4) and that (11)is satisfied with Eε4 lt infin Also suppose that ti and ti+1 arenot in the same subgrid for any i Under assumption (33) asn rarr infinradic

K

n

([YY](avg)T minus [XX](avg)

T minus 2nEε2) Lminusrarr 2radic

Eε4Z(avg)

noise

(38)

conditional on the X process where Z(avg)

noise is standard normal

This can be compared with the result stated in (24) Noticethat Z(avg)

noise in (38) is almost never the same as Znoise in (24)in particular cov(ZnoiseZ(avg)

noise ) = var(ε2)Eε4 based on theproof of Theorem A1 in the Appendix

In comparison with the realized volatility using the fullgrid G the aggregated estimator [YY](avg)

T provides an im-provement in that both the asymptotic bias and variance areof smaller order of n compare (18) and (19) We use this inSection 4

34 Error Due to the Discretization Effect[X X ](avg)

T minus 〈X X 〉T

In this section we study the impact of the time discretizationIn other words we investigate the deviation of [XX](avg)

T fromthe integrated volatility 〈XX〉T of the true process Denote thediscretization effect by DT where

Dt = [XX](avg)t minus 〈XX〉t

= 1

K

Ksum

k=1

([XX](k)t minus 〈XX〉t) (39)

with

[XX](k)t =sum

titi+isinG(k)ti+let

(Xti+ minus Xti

)2 (40)

In what follows we consider the asymptotics of DT The prob-lem is similar to that of finding the limit of [XX](all)

T minus〈XX〉T

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 8: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1401

[cf (25)] However this present case is more complicated dueto the multiple grids

We suppose in the following that the sampling pointsare regularly allocated to subgrids in other words G(l) =tlminus1 tK+lminus1 We also assume that

maxi

|ti| = O

(1

n

)

(41)

and

Kn rarr 0 (42)

Define the weight function

hi = 4

Kt

(Kminus1)andisum

j=1

(

1 minus j

K

)2

timinusj (43)

In the case where the ti are equidistant and under regular al-location of points to subgrids ti = t and so all of the hirsquos(except the first K minus 1) are equal and

hi = 41

K

(Kminus1)andisum

j=1

(

1 minus j

K

)2

=

42K2 minus 4K + 3

6K2= 4

3+ o(1) i ge K minus 1

4

3

i

K2(3K2 minus 3Ki + i2) + o(1) i lt K minus 1

(44)

More generally assumptions (41) and (42) ensure that

supi

hi = O(1) (45)

We take 〈DD〉T to be the quadratic variation of Dt whenviewed as a continuous-time process (39) This gives the bestapproximation to the variance of DT We show the followingresults in Section A3

Theorem 2 Suppose that X is an Itocirc process of the form (1)with drift coefficient microt and diffusion coefficient σt both con-tinuous almost surely Also suppose that σt is bounded awayfrom 0 Assume (41) and (42) and that sampling points areregularly allocated to grids Then the quadratic variation of DT

is approximately

〈DD〉T = TK

nη2

n + op

(K

n

)

(46)

where

η2n =sum

i

hiσ4ti ti (47)

In particular DT = Op((Kn)12) From this we derive avariancendashvariance trade-off between the two effects that havebeen discussed noise and discretization First however we dis-cuss the asymptotic law of DT We discuss stable convergenceat the end of this section

Theorem 3 Assume the conditions of Theorem 2 and alsothat

η2n

Pminusrarrη2 (48)

where η is random Also assume condition E in Section A3Then

DT(Kn)12 Lminusrarrηradic

TZdiscrete (49)

where Zdiscrete is standard normal and independent of theprocess X The convergence in law is stable

In other words DT(Kn)12 can be taken to be asymptoti-cally mixed normal ldquoN(0 η2T)rdquo For most of our discussion itis most convenient to suppose (48) and this is satisfied in manycases For example when the ti are equidistant and under reg-ular allocation of points to subgrids

η2 = 4

3

int T

0σ 4

t dt (50)

following (44) One does not need to rely on (48) we ar-gue in Section A3 that without this condition one can takeDT(Kn)12 to be approximately N(0 η2

nT) For estimation ofη2 or η2

n see Section 6Finally stable convergence (Reacutenyi 1963 Aldous and

Eagleson 1978 Hall and Heyde 1980 chap 3) means for ourpurposes that the left side of (49) converges to the right sidejointly with the X process and that Z is independent of XThis is slightly weaker than convergence conditional on Xbut serves the same function of permitting the incorporationof conditionality-type phenomena into arguments and conclu-sions

35 Combining the Two Sources of Error

We can now combine the two error terms arising from dis-cretization and from the observation noise It follows from The-orems 1 and 3 that

[YY](avg)T minus 〈XX〉T minus 2nEε2 = ξZtotal + op(1) (51)

where Ztotal is an asymptotically standard normal random vari-able independent of the X process and

ξ2 = 4n

KEε4

︸ ︷︷ ︸due to noise

+ T1

nη2

︸ ︷︷ ︸due to discretization

(52)

It is easily seen that if one takes K = cn23 then both compo-nents in ξ2 will be present in the limit otherwise one of themwill dominate Based on (51) [YY](avg)

T is still a biased estima-tor of the quadratic variation 〈XX〉T of the true return processOne can recognize that as far as the asymptotic bias is con-cerned [YY](avg)

T is a better estimator than [YY](all)T because

n le n demonstrating that the bias in the subsampled estimator[YY](avg)

T increases in a slower pace than the full-grid estima-tor One can also construct a bias-adjusted estimator from (51)this further development would involve the higher-order analy-sis between the bias and the subsampled estimator We describethe methodology of bias correction in Section 4

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 9: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1402 Journal of the American Statistical Association December 2005

36 The Benefits of Sampling Sparsely OptimalSampling Frequency in the Multiple-Grid Case

As in Section 23 when the noise is negligible asymptoti-cally we can search for an optimal n for subsampling to balancethe coexistence of the bias and the variance in (51) To reducethe MSE of [YY](avg)

T we set partMSEpart n = 0 From (52)ndash(51)bias = 2nEε2 and ξ2 = 4 n

K Eε4 + Tn η2 Then

MSE = bias2 + ξ2 = 4(Eε2)2n2 + 4n

KEε4 + T

nη2

= 4(Eε2)2n2 + T

nη2 to first order

thus the optimal nlowast satisfies that

nlowast =(

Tη2

8(Eε2)2

)13

(53)

Therefore assuming that the estimator [YY](avg)T is adopted

one could benefit from a minimum MSE if one subsamples nlowastdata in an equidistant fashion In other words all n observationscan be used if one uses Klowast Klowast asymp nnlowast subgrids This is incontrast to the drawback of using all of the data in the single-grid case The subsampling coupled with aggregation brings outthe advantage of using all of data Of course for the asymptoticsto work we need Eε2 rarr 0 Our recommendation however isto use the bias-corrected estimator given in Section 4

4 THE FINAL ESTIMATOR SUBSAMPLINGAVERAGING AND BIAS CORRECTION

OVER TWO TIME SCALES

41 The Final Estimator 〈X X 〉T Main Result

In the preceding sections we have seen that the multigridestimator [YY](avg) is yet another biased estimator of the trueintegrated volatility 〈XX〉 In this section we improve themultigrid estimator by adopting bias adjustment To access thebias one uses the full grid As mentioned before from (22) inthe single-grid case (Sec 2) Eε2 can be consistently approxi-mated by

Eε2 = 1

2n[YY](all)

T (54)

Hence the bias of [YY](avg) can be consistently estimated by2nEε2 A bias-adjusted estimator for 〈XX〉 thus can be ob-tained as

〈XX〉T = [YY](avg)T minus n

n[YY](all)

T (55)

thereby combining the two time scales (all) and (avg)To study the asymptotic behavior of 〈XX〉T first note that

under the conditions of Theorem A1 in Section A2(

K

n

)12(〈XX〉T minus [XX](avg)T

)

=(

K

n

)12([YY](avg)T minus [XX](avg)

T minus 2nEε2)

minus 2(Kn)12(Eε2 minus Eε2)

Lminusrarr N(08(Eε2)2) (56)

where the convergence in law is conditional on XWe can now combine this with the results of Section 34 to

determine the optimal choice of K as n rarr infin

〈XX〉T minus 〈XX〉T

= (〈XX〉T minus [XX](avg)T

)+ ([XX](avg)T minus 〈XX〉T

)

= Op

(n12

K12

)

+ Op(nminus12) (57)

The error is minimized by equating the two terms on theright side of (57) yielding that the optimal sampling step for[YY](avg)

T is K = O(n23) The right side of (57) then has orderOp(nminus16)

In particular if we take

K = cn23 (58)

then we find the limit in (57) as follows

Theorem 4 Suppose that X is an Itocirc process of form (1) andassume the conditions of Theorem 3 in Section 34 Supposethat Y is related to X through model (4) and that (11) is satisfiedwith Eε2 lt infin Under assumption (58)

n16(〈XX〉T minus 〈XX〉T)

Lminusrarr N(08cminus2(Eε2)2)+ η

radicTN(0 c)

= (8cminus2(Eε2)2 + cη2T)12N(01) (59)

where the convergence is stable in law (see Sec 34)

Proof Note that the first normal distribution comes from(56) and that the second comes from Theorem 3 in Section 34The two normal distributions are independent because the con-vergence of the first term in (57) is conditional on the X processwhich is why they can be amalgamated as stated The require-ment that Eε4 lt infin (Thm A1 in the App) is not needed be-cause only a law of large numbers is required for M(1)

T (see theproof of Thm A1) when considering the difference in (56)This finishes the proof

The estimation of the asymptotic spread s2 = 8cminus2(Eε2)2 +cη2T of 〈XX〉T is deferred to Section 6 It is seen in Sec-tion A1 that for K = cn23 the second-order conditional vari-ance of 〈XX〉T given the X process is given by

var(〈XX〉T

∣∣X) = nminus13cminus28(Eε2)2

+ nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

]

+ op(nminus23) (60)

The evidence from our Monte Carlo simulations in Section 7suggests that this correction may matter in small samples andthat the (random) variance of 〈XX〉T is best estimated by

s2 + nminus23cminus1[8[XX](avg)T Eε2 minus 2 var(ε2)

] (61)

For the correction term [XX](avg)T can be estimated by 〈XX〉T

itself Meanwhile by (23) a consistent estimator of var(ε2) isgiven by

var(ε2) = 1

2n

sum

i

(Yti

)4 minus 4(Eε2)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 10: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1403

42 Properties of 〈X X 〉T Optimal Sampling andBias Adjustment

To further pin down the optimal sampling frequency K onecan minimize the expected asymptotic variance in (59) to obtain

clowast =(

16(Eε2)2

T Eη2

)13

(62)

which can be consistently estimated from data in past time pe-riods (before time t0 = 0) using Eε2 and an estimator of η2 (cfSec 6) As mentioned in Section 34 η2 can be taken to be inde-pendent of K as long as one allocates sampling points to gridsregularly as defined in Section 32 Hence one can choose cand so also K based on past data

Example 1 If σ 2t is constant and for equidistant sampling

and regular allocation to grids η2 = 43σ 4T then the asymptotic

variance in (59) is

8cminus2(Eε2)2 + cη2T = 8cminus2(Eε2)2 + 43 cσ 4T2

and the optimal choice of c becomes

copt =(

12(Eε2)2

T2σ 4

)13

(63)

In this case the asymptotic variance in (59) is

2(12(Eε2)2)13

(σ 2T)43

One can also of course estimate c to minimize the actual as-ymptotic variance in (59) from data in the current time period(0 le t le T) It is beyond the scope of this article to considerwhether such a device for selecting the frequency has any im-pact on our asymptotic results

In addition to large-sample arguments one can study 〈XX〉Tfrom a ldquosmallishrdquo sample standpoint We argue in what followsthat one can apply a bias-type adjustment to get

〈XX〉(adj)T =

(

1 minus n

n

)minus1

〈XX〉T (64)

The difference from the estimator in (55) is of orderOp(nn) = Op(Kminus1) and thus the two estimators behave thesame as the asymptotic order that we consider The estima-tor (64) however has the appeal of being in a certain wayldquounbiasedrdquo as follows For arbitrary (ab) consider all estima-tors of the form

〈XX〉(adj)T = a[YY](avg)

T minus bn

n[YY](all)

T

Then from (18) and (35)

E(〈XX〉(adj)

T

∣∣X process

)

= a([XX](avg)

T + 2nEε2)minus bn

n

([XX](all)T + 2nEε2)

= a[XX](avg)T minus b

n

n[XX](all)

T + 2(a minus b)nEε2

It is natural to choose a = b to completely remove the ef-fect of Eε2 Also following Section 34 both [XX](avg)

T and[XX](all)

T are asymptotically unbiased estimators of 〈XX〉T Hence one can argue that one should take a(1minus nn) = 1 yield-ing (64)

Similarly an adjusted estimator of Eε2 is given by

Eε2(adj) = 12 (n minus n)minus1([YY](all)

T minus [YY](avg)T

) (65)

which satisfies that E(Eε2(adj)|X process) = Eε2 + 12 (n minus

n)minus1([XX](all)T minus [XX](avg)

T ) and thus is unbiased to high or-der As for the asymptotic distribution one can see from Theo-rem A1 in the Appendix that

Eε2(adj) minus Eε2

= (Eε2 minus Eε2)(1 + O(Kminus1)) + Op(Knminus32)

= Eε2 minus Eε2 + Op(nminus12Kminus1)+ Op

(Knminus32)

= Eε2 minus Eε2 + Op(nminus56)

from (58) It follows that n12(Eε2 minus Eε2) and n12(Eε2(adj) minusEε2) have the same asymptotic distribution

5 EXTENSION TO MULTIPLE PERIOD INFERENCE

For a given family A = G(k) k = 1 K we denote

〈XX〉t = [YY](avg)t minus n

n[YY](all)

t (66)

where as usual [YY](all)t =sumti+1let(Yti)

2 and [YY](avg)t =

1K

sumKk=1[YY](k)t with

[YY](k)t =sum

titi+isinG(k)ti+let

(Yti+ minus Yti

)2

To estimate 〈XX〉 for several discrete time periods say[0T1] [T1T2] [TMminus1TM] where M is fixed this am-ounts to estimating 〈XX〉Tm minus 〈XX〉Tmminus1 = int Tm

Tmminus1σ 2

u du for

m = 1 M and the obvious estimator is 〈XX〉Tmminus

〈XX〉Tmminus1

To carry out the asymptotics let nm be the number ofpoints in the mth time segment and similarly let Km = cmn23

m where cm is a constant Then n16

m (〈XX〉Tmminus 〈XX〉Tmminus1

minusint Tm

Tmminus1σ 2

u du)m = 1 M converge stably to (8cminus2m (Eε2)2 +

cmη2m(Tm minus Tmminus1))

12Zm where the Zm are iid standard nor-mals independent of the underlying process and η2

m is thelimit η2 (Thm 3) for time period m In the case of equidis-tant ti and regular allocation of sample points to grids η2

m =43

int TmTmminus1

σ 4u du

In other words the one-period asymptotics generalizestraightforwardly to the multiperiod case This is because〈XX〉Tm

minus 〈XX〉Tmminus1minus int Tm

Tmminus1σ 2

u du has to first order a mar-tingale structure This can be seen in the Appendix

An advantage of our proposed estimator is that if εti hasdifferent variance in different time segments say var(εti) =(Eε2)m for ti isin (Tmminus1Tm] then both consistency and asymp-totic (mixed) normality continue to hold provided that one re-places Eε2 by (Eε2)m This adds a measure of robustness tothe procedure If one were convinced that Eε2 were the sameacross time segments then an alternative estimator would havethe form

〈XX〉t = [YY](avg)t minus

(1

Kti+1 le t minus 1

)1

n[YY](all)

T

for t = T1 TM (67)

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 11: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1404 Journal of the American Statistical Association December 2005

However the errors 〈XX〉Tmminus 〈XX〉Tmminus1

minus int TmTmminus1

σ 2u du in this

case are not asymptotically independent Note that for T = Tmboth candidates (66) and (67) for 〈XX〉t coincide with thequantity in (55)

6 ESTIMATING THE ASYMPTOTICVARIANCE OF 〈X X 〉T

In the one-period case the main goal is to estimate the as-ymptotic variance s2 = 8cminus2(Eε2)2 + cη2T [cf (58) and (59)]The multigrid case is a straightforward generalization as indi-cated in Section 5

Here we are concerned only with the case where the points tiare equally spaced (ti = t) and are regularly allocated tothe grids A1 = G(k) k = 1 K1 A richer set of ingredientsare required to find the spread than to just estimate 〈XX〉T To implement the estimator we create an additional familyA2 = G(ki) k = 1 K1 i = 1 I of grids where G(ki)

contains every Ith point of G(k) starting with the ith pointWe assume that K1 sim c1n23 The new family then consists ofK2 sim c2n23 grids where c2 = c1I

In addition we need to divide the time line into segments(Tmminus1Tm] where Tm = m

M T For the purposes of this discus-sion M is large but finite We now get an initial estimator ofspread as

s20 = n13

Msum

m=1

(〈XX〉K1Tm

minus 〈XX〉K1Tmminus1

minus (〈XX〉K2Tm

minus 〈XX〉K2Tmminus1

))2

where 〈XX〉Kit is the estimator (66) using the grid family i

i = 12Using the discussion in Section 5 we can show that

s20 asymp s2

0 (68)

where for c1 13= c2 (I 13= 1)

s20 = 8(Eε2)2(cminus2

1 + cminus22 minus cminus1

1 cminus12 ) + (c12

1 minus c122

)2Tη2

= 8(Eε2)2cminus21 (1 + Iminus2 minus Iminus1) + c1

(I12 minus 1

)2Tη2 (69)

In (68) the symbol ldquoasymprdquo denotes first convergence in law asn rarr infin and then a limit in probability as M rarr infin BecauseEε2 can be estimated by Eε2 = [YY](all)2n we can put hatson s2

0 (Eε2)2 and η2 in (69) to obtain an estimator of η2 Sim-ilarly

s2 = 8(Eε2)2(

cminus2 minus c(cminus21 + cminus2

2 minus cminus11 cminus1

2 )

(c121 minus c12

2 )2

)

+ c

(c121 minus c12

2 )2

s20

= 8

(

cminus2 minus ccminus31

Iminus2 minus Iminus1 + 1

(I12 minus 1)2

)

(Eε2)2

+ c

c1

1

(I12 minus 1)2

s20 (70)

where c sim Knminus23 where K is the number of grids used origi-nally to estimate 〈XX〉T

Table 1 Coefficients of (Eε2)2 and s 2 When c1 = c

I coeff(s 20 ) coeff((Eε2)2)

3 1866 minus3611cminus2

4 1000 15000cminus2

Normally one would take c1 = c Hence an estimator s2 canbe found from s2

0 and Eε2 When c1 = c we argue that the op-timal choice is I = 3 or 4 as follows The coefficients in (70)become

coeff(s20) = (I12 minus 1

)minus2

and

coeff((Eε2)2) = 8cminus2(I12 minus 1)minus2

f (I)

where f (I) = I minus 2I12 minus Iminus2 + Iminus1 For I ge 2 f (I) is increas-ing and f (I) crosses 0 for I between 3 and 4 These there-fore are the two integer values of I that give the lowest ratioof coeff((Eε2)2)coeff(s2

0) Using I = 3 or 4 therefore wouldmaximally insulate against (Eε2)2 dominating over s2

0 This isdesirable because s2

0 is the estimator of carrying the informa-tion about η2 Numerical values for the coefficients are given inTable 1 If c were such that (Eε2)2 still overwhelms s2

0 then achoice of c1 13= c should be considered

7 MONTE CARLO EVIDENCE

In this article we have discussed five approaches to deal-ing with the impact of market microstructure noise on realizedvolatility In this section we examine the performance of eachapproach in simulations and compare the results with thosepredicted by the aforementioned asymptotic theory

71 Simulations Design

We use the stochastic volatility model of Heston (1993) asour data-generating process

dXt = (micro minus vt2)dt + σt dBt (71)

and

dvt = κ(α minus vt)dt + γ v12t dWt (72)

where vt = σ 2t We assume that the parameters (micro κα γ )

and ρ the correlation coefficient between the two Brownianmotions B and W are constant in this model We also as-sume Fellerrsquos condition 2κα ge γ 2 to make the zero boundaryunattainable by the volatility process

We simulate M = 25000 sample paths of the process us-ing the Euler scheme at a time interval t = 1 second and setthe parameter values at values reasonable for a stock micro = 05

κ = 5 α = 04 γ = 5 and ρ = minus5 As for the market mi-crostructure noise ε we assume that it is Gaussian and smallSpecifically we set (Eε2)12 = 0005 (ie the standard devia-tion of the noise is 05 of the value of the asset price) Wepurposefully set this value to be smaller than the values cali-brated using the empirical market microstructure literature

On each simulated sample path we estimate 〈XX〉T overT = 1 day (ie T = 1252 because the parameter values are allannualized) using the five estimation strategies described ear-lier [YY](all)

T [YY](sparse)T [YY](sparseopt)

T [YY](avg)T and

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 12: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1405

finally 〈XX〉(adj)T We assume that a day consists of 65 hours

of open trading as is the case on the NYSE and NASDAQFor the fourth-best estimator we represent this approach usingsparse sampling at a frequency of once every 5 minutes

72 Simulations Results

Table 2 reports the simulation results for the five estima-tion strategies For each estimator the bias is calculated asE [estimator minus 〈XX〉T ] The variance row reports the quantity

E

[

var

(

estimator minus 〈XX〉T

∣∣∣

int T

0σ 2

t dtint T

0σ 4

t dt

)]

(73)

The rows marked ldquorelativerdquo report the corresponding results inpercentage terms that is for

estimator minus 〈XX〉T

〈XX〉T

Note that although the convergence of the discretization error isstable as opposed to conditional (73) seems like a reasonablestand-in for the theoretical quantity under consideration

In each case ldquosmall samplerdquo represents the average valueover the M simulated paths and ldquoasymptoticrdquo refers to the valuepredicted by our theory The noise parts of the asymptotic vari-ances are taken to include the relevant second-order terms asgiven by (20) (37) and (60) yielding spreads like (in the lattercase) (61) The small-sample variance is computed by binningthe results of the simulations according to the values of the pair(int T

0 σ 2t dt

int T0 σ 4

t dt) calculating the variance within each binand then averaging across the bins weighted by the number ofelements in each bin

Comparing the rows showing small-sample results to theirasymptotic predictions we see that the asymptotic theory pro-vides a good approximation to the first two moments of thesmall-sample distribution of the five estimators Comparingacross columns we see that the differences between the five es-timators are quite large As expected the naive fifth-best strat-egy produces disastrous results The first-best strategy resultsin a large decrease of root MSE (RMSE) relative to the oth-ers including in particular the fourth-best strategy which isthe approach currently used in most of the empirical literatureBut at least for the parameter values used here determining theoptimal sampling frequency (the sparse optimal or third-beststrategy) results in a modest improvement over the arbitrary

selection of a sampling frequency (the sparse or fourth-beststrategy) Using all of the data as in the subsampled and aver-aged estimator (second-best) but especially the bias correctionby combining two time scales as in our first-best approach iswhere the major practical improvements reside

It may seem surprising that the variance of the estimator isreduced so much from the second to first-best estimator Afterall if one sets c = Knminus23 then the asymptotic normal termin (7) becomes

1

n16

[4

c2(Eε4)

︸ ︷︷ ︸due to noise

+ c4T

3

int T

0σ 4

t dt︸ ︷︷ ︸

due to discretization︸ ︷︷ ︸

]12

total variance

Ztotal

This is quite similar to the error term in (9) However thec for the second-best estimator is chosen by (8) whereas forthe first-best case it is given by (10) The two optima are verydifferent because for the second-best estimator it is chosen bya biasndashvariance trade-off whereas for the first-best estimatorit is chosen to minimize the asymptotic variance From thisstandpoint it is quite reasonable for the first-best estimator toalso have substantially better asymptotic variance In fact from(8) and (10) it is easy to see that for the optimal choices

clowastsecond best = n132minus13

(Eε4

E(ε2)2

)13

clowastfirst best

whence the total variance is in fact quite different for the twoestimators

Figure 1 shows the small-sample and asymptotic RMSE ofthe [YY](sparse)

T estimator as a function of the subsamplinginterval illustrating the minimization problem studied in Sec-tion 23 Finally Figure 2 shows the standardized distributionof the first-best estimator obtained from the simulations (his-togram) and the corresponding asymptotic distribution pre-dicted by our theory (solid line) Due to the effect of bias thedistributions of the fifth- to second-best estimators are irrele-vant because they cannot be used to set intervals for the volatil-ity

8 DISCUSSION

In this work we have quantified and corrected the effect ofnoise on the nonparametric assessment of integrated volatil-ity In the setting of high-frequency data the usual financialpractice is to use sparse sampling in other words throwing

Table 2 Monte Carlo Simulations for the Five Estimation Strategies

Fifth-best Fourth-best Third-best Second-best First-best[ Y Y ] (all)

T [ Y Y ] (sparse)T [ Y Y ] (sparse opt)

T [ Y Y ] (avg)T 〈X X 〉 (adj)

T

Small-sample bias 11699 times 10minus2 389 times 10minus5 218 times 10minus5 1926 times 10minus5 2 times 10minus8

Asymptotic bias 11700 times 10minus2 390 times 10minus5 220 times 10minus5 1927 times 10minus5 0

Small-sample variance 1791 times 10minus8 14414 times 10minus9 159 times 10minus9 941 times 10minus10 9 times 10minus11

Asymptotic variance 1788 times 10minus8 14409 times 10minus9 158 times 10minus9 937 times 10minus10 8 times 10minus11

Small-sample RMSE 11699 times 10minus2 5437 times 10minus5 4543 times 10minus5 3622 times 10minus5 94 times 10minus6

Asymptotic RMSE 11700 times 10minus2 5442 times 10minus5 4546 times 10minus5 3618 times 10minus5 89 times 10minus6

Small-sample relative bias 182 61 18 15 minus00045Small-sample relative variance 82502 115 11 053 0043Small-sample relative RMSE 340 124 37 28 065

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 13: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1406 Journal of the American Statistical Association December 2005

Figure 1 RMSE of the Sparse Estimator as a Function of the Subsampling Frequency ( RMSE small sample RMSE asymptotic)

away most of the available data We have argued that this iscaused by not incorporating the noise in the model Although itis statistically unsound to throw away data we have shown thatit is possible to build on this practice to construct estimators thatmake statistical sense

Specifically we have found that the usual realized volatilitymainly estimates the magnitude of the noise term rather thananything to do with volatility An approach built on separatingthe observations into multiple ldquogridsrdquo lessens this problem Wefound that the best results can be obtained by combining theusual (ldquosingle-gridrdquo) realized volatility with the multiple-gridndashbased device This gives an estimator that is approximatelyunbiased we have also shown how to assess the (random) vari-

ance of this estimator We also show that for our most recom-mended procedure the optimal number of multiple grids is oforder O(n23) (see Sec 4) Most of the development is in thecontext of finding the integrated volatility over one time periodat the end we extend this to multiple periods Also in the casewhere the noise can be taken to be almost negligible we pro-vide a way of optimizing the sampling frequency if one wishesto use the classical ldquorealized volatilityrdquo or its multigrid exten-sion

One important message of the article is that any time one hasan impulse to sample sparsely one can always do better with amultigrid method regardless of the model or the quantity beingestimated

Figure 2 Asymptotic and Small-Sample Standardized Distributions of the First-Best Estimator

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 14: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1407

Our results in this article cover the case where the latent Xfollows the Itocirc process model (1) Theorem 1 remains true un-der the assumption that X is a general semimartingale that isa process X that can also have jumps This follows because theproof of Lemma A2 remains true for general semimartingale X(Jacod and Shiryaev 2003 thm I447 p 52) Also apart fromusing Lemma A2 Theorem A1 is unrelated to the nature ofthe X process As far as the effect of jumps on the discretizationeffect is concerned it should be noted that consistency holdsas follows When X is a general semimartingale then for anysequence of grids Hn so that (41) holds [XX]Hn converges tothe continuous-time quadratic variation of X that is

int T

0σ 2

t dt +sum

0letleT

( jump of X at t2) (74)

(cf Jacod and Shiryaev 2003 thm I447 p 52) In particu-lar this also applies to subgrids under the regularity conditionsthat we have discussed Furthermore by extension of the sameproof [XX](avg) also converges to (74) From the same argu-ment so does 〈XX〉T In this sense the two-scales estimator isrobust to jumps

APPENDIX PROOFS OF RESULTS

When the total grid G is considered we usesumnminus1

i=1 sum

ti+1leT andsum

tiisinG interchangeably in the following proofs Also we write ldquo|Xrdquo toindicate expressions that are conditional on the entire X process

A1 Variance of [Y Y ]T Given the X Process

Here we calculate explicitly the variance in (19) from which thestated approximation follows The explicit remainder term is also usedfor (36) Let a partition of [0T] be 0 = t0 le t1 le middot middot middot le tn = T Underassumption (11)

var([YY](all)

T

∣∣X) = var

[ sum

ti+1leT

(Yti

)2∣∣∣X

]

=sum

ti+1leT

var[(

Yti)2∣∣X

]

︸ ︷︷ ︸IT

+ 2sum

ti+1leT

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

︸ ︷︷ ︸IIT

because Yti = Xti + εti is 1 dependent given X process

var[(

Yti)2∣∣X

] = κ4(Yti

∣∣X)+ 2

[var(Yti

∣∣X)]2

+ 4[E(Yti

∣∣X)]2 var

(Yti

∣∣X)

+ 4E(Yti

∣∣X)κ3(Yti

∣∣X)

= κ4(εti

)+ 2[var(εti

)]2 + 4(Xti

)2 var(εti

)

+ 4(Xti

)κ3(εti

)(under Assumption 11)

= 2κ4(ε) + 8(Eε2)2 + 8(Xti

)2Eε2

and because κ3(εti) = 0 The κrsquos are the cumulants of the relevantorder

κ1(ε) = 0 κ2(ε) = var(ε) = E(ε2)(A1)

κ3(ε) = Eε3 and κ4(ε) = E(ε4) minus 3(Eε2)2

So IT = n(2κ4(ε) + 8(Eε2)2) + 8[XX](all)T Eε2 Similarly for the co-

variance

cov[(

Ytiminus1

)2(Yti

)2∣∣X]

= cov[(

εtiminus1

)2(εti

)2]+ 4(Xtiminus1

)(Xti

)cov(εtiminus1 εti

)

+ 2(Xtiminus1

)cov[εtiminus1

(εti

)2]

+ 2(Xti

)cov[(

εtiminus1

)2εti

]

= κ4(ε) + 2(Eε2)2 minus 4(Xtiminus1

)(Xti

)κ2(ε)

minus 2(Xti

)κ3(ε) + 2

(Xtiminus1

)κ3(ε) (A2)

because of (A1)Thus assuming the coefficients in (A2)

IIT = 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum

ti+1leT

(Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

Amalgamating the two expressions one obtains

var([YY](all)

T

∣∣X)

= n(2κ4(ε) + 8(Eε2)2)+ 8[XX](all)

T Eε2

+ 2(n minus 1)(κ4(ε) + 2(Eε2)2)

minus 8Eε2sum(

Xtiminus1

)(Xti

)minus 4κ3(ε)(Xtnminus1 minus Xt0

)

= 4nEε4 + Rn (A3)

where the remainder term Rn satisfies

|Rn| le 8Eε2[XX]T + 2(κ4(ε) + 2(Eε2)2)

+ 8Eε2∣∣∣sum(

Xtiminus1

)∣∣∣∣∣(Xti

)∣∣+ 4|κ3(ε)|(∣∣Xtnminus1

∣∣+ ∣∣Xt0

∣∣)

le 16Eε2[XX](all)T + 2

(κ4(ε) + 2(Eε2)2)

+ 2|κ3(ε)|(2 + [XX]T ) (A4)

by the CauchyndashSchwarz inequality and because |x| le (1 + x2)2 Be-

cause [XX](all)T = Op(1) (19) follows

Under slightly stronger conditions (eg |microt| and σt are boundedabove by a constant)

sum(Xtiminus1 )(Xti) is a near-martingale and of

order Op(nminus12) and similarly Xtnminus1 minus Xt0 = Op(nminus12) fromwhich (20) follows

Similarly to obtain a higher-order approximation to the variance ofthe estimator 〈XX〉T note that

〈XX〉T minus 〈XX〉T =(

[YY](avg)T minus 1

K[YY](all)

T minus [XX](avg)T

)

+ ([XX](avg)T minus 〈XX〉T

) (A5)

Now recalling (20) and (37) and in addition

cov([YY](all) [YY](avg)

∣∣X)

= 2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)

we see that

var(〈XX〉T |X)

= var([YY](avg)

T

∣∣X)+ 1

K2var([YY](all)

T

∣∣X)

minus 2

Kcov([YY](avg)

T [YY](all)T

∣∣X)

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 15: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1408 Journal of the American Statistical Association December 2005

= 4n

KEε4 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]+ op

(1

K

)

+ 1

K2

[4nKEε4 + (8[XX](all)

T Eε2 minus 2(Eε4 minus (Eε2)2))+ op(1)

]

minus 2

K

[

2(Eε4 minus (Eε2)2)

(

2n minus 1

K

)

+ 8Eε2[XX](all)T

1

K+ op

(1

K

)]

= n

K8(Eε2)2 + 1

K

[8[XX](avg)

T Eε2 minus 2(Eε4 minus (Eε2)2)]

+ op

(1

K

)

because at the optimal choice K = cn23 and n = cminus1n13 Therefore(60) holds

A2 The Relevant Central Limit Theorem

Lemma A2 Suppose that X is an Itocirc process Suppose that Y isrelated to X through model (4) Then under assumption (11) and defi-nitions (16) and (32)

[YY](all)T = [ε ε](all)

T + Op(1) and

[YY](avg)T = [ε ε](avg)

T + [XX](avg)T + Op

(1radicK

)

Proof (a) The one-grid case

[YY](all)T = [XX](all)

T + [ε ε](all)T + 2[X ε](all)

T (A6)

We show that

E(([X ε](all)

T

)2∣∣X)= Op(1) (A7)

and in particular

[X ε](all)T = Op(1) (A8)

To see (A7)

[X ε](all)T =

nminus1sum

i=0

(Xti

)(εti

)

=nminus1sum

i=0

(Xti

)εti+1 minus

nminus1sum

i=0

(Xti

)εti

=nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + Xtnminus1εtn minus Xt0εt0 (A9)

Because E([X ε](all)T |X) = 0 and εti iid for different ti we get

E(([X ε](all)

T

)2∣∣X) = var

([X ε](all)T

∣∣X)

= Eε2

[ nminus1sum

i=1

(Xtiminus1 minus Xti

)2 + X2tnminus1

+ X2t0

]

= 2[XX]T Eε2 minus 2Eε2nminus1sum

i=1

(Xtiminus1

)(Xti

)

le 4[XX]T Eε2 (A10)

by the CauchyndashSchwarz inequality from which and from [XX](all)

being of order Op(1) (A7) follows Hence (A8) follows by theMarkov inequality

(b) The multiple-grid case Notice that

[YY](avg) = [XX](avg)T + [ε ε](avg)

T + 2[X ε](avg)T (A11)

(A11) strictly follows from model (4) and the definitions of grids and

[middot middot](avg)t see Section 32

We need to show that

E(([X ε](avg)

T

)2∣∣X)= Op

(1

K

)

(A12)

in particular

[X ε](avg)T = Op

(1

K12

)

(A13)

and var([X ε](avg)T |X) = E[([X ε](avg)

T )2|X]To show (A12) note that E([X ε](avg)

T |X) = 0 and

E[([X ε](avg)

T

)2∣∣X] = var

([X ε](avg)T

∣∣X)

= 1

K2

Ksum

k=1

var([X ε](k)T

∣∣X)

le 4Eε2

K[XX](avg)

T = Op

(1

K

)

where the second equality follows from the disjointness of differentgrids as well as ε |= X The inequality follows from the same argumentas in (A10) Then the order follows because [XX](avg)

T = Op(1) seethe method of Mykland and Zhang (2002) for a rigorous development

for the order of [XX](avg)T

Theorem A1 Suppose that X is an Itocirc process of form (1) Sup-pose that Y is related to X through model (4) and that (11) is satis-fied with Eε4 lt infin Also suppose that ti and ti+1 is not in the same

subgrid for any i Under assumption (33) as n rarr infin (radic

n(Eε2 minusEε2)

radicKn ([YY](avg)

T minus [XX](avg)T minus 2nEε2)) converges in law to a

bivariate normal with mean 0 and covariance matrix(

Eε4 2 var(ε2)

2 var(ε2) 4Eε4

)

(A14)

conditional on the X process where the limiting random variable isindependent of the X process

Proof By Lemma A2 we need the distribution of [ε ε](avg) and[ε ε](all) First we explore the convergence of

1radicn

([ε ε](all)T minus 2nEε2 [ε ε](avg)

T K minus 2nKEε2) (A15)

Recall that all of the sampling points t0 t1 tn are within [0T] Weuse G to denote the time points in the full sampling as in the singlegrid G(k) denotes the subsamplings from the kth grid As before ifti isin G(k) then timinus and ti+ are the previous and next elements in G(k)εtiminus = 0 for ti = minG(k) and εti+ = 0 for ti = maxG(k)

Set

M(1)T = 1radic

n

sum

tiisinG

(ε2

ti minus Eε2)

M(2)T = 1radic

n

sum

tiisinGεtiεtiminus1 (A16)

M(3)T = 1radic

n

Ksum

k=1

sum

tiisinG(k)

εtiεtiminus

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 16: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1409

We first find the asymptotic distribution of (M(1)T M(2)

T M(3)T ) using

the martingale central limit theorem Then we use the result to find thelimit of (A15)

Note that (M(1)T M(2)

T M(3)T ) are the end points of martingales

with respect to filtration Fi = σ(εtj j le iXt all t) We now de-rive its (discrete-time) predictable quadratic variation 〈M(l)M(k)〉l k = 123 [discrete-time predictable quadratic variations are onlyused in this proof and the proof of Lemma 1 and are different fromthe continuous time quadratic variations in (12)]

langM(1)M(1)

rangT = 1

n

sum

tiisinGvar(ε2

ti minus Eε2∣∣Ftiminus1

)

= var(ε2)

langM(2)M(2)

rangT = 1

n

sum

tiisinGvar(εtiεtiminus1

∣∣Ftiminus1

)

(A17)= Eε2

n

sum

tiisinGε2

timinus1= (Eε2)2 + op(1)

langM(3)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

var(εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

ε2timinus = (Eε2)2 + op(1)

by the law of large numbersSimilarly for the predictable quadratic covariations

langM(1)M(2)

rangT = 1

n

sum

tiisinGcov(ε2

ti minus Eε2 εtiεtiminus1

∣∣Ftiminus1

)

= Eε3 1

n

sum

tiisinGεtiminus1 = op(1)

langM(1)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(ε2

ti minus Eε2 εtiεtiminus∣∣Ftiminus1

)

(A18)

= Eε3 1

n

Ksum

k=1

sum

tiisinG(k)

εtiminus = op(1)

langM(2)M(3)

rangT = 1

n

Ksum

k=1

sum

tiisinG(k)

cov(εtiεtiminus1 εtiεtiminus

∣∣Ftiminus1

)

= Eε2

n

Ksum

k=1

sum

tiisinG(k)

εtiminus1εtiminus = op(1)

because ti+1 is not in the same grid as tiBecause the εti rsquos are iid and Eε4

ti lt infin the conditional Lin-deberg conditions are satisfied Hence by the martingale centrallimit theorem (see Hall and Heyde 1980 condition 31 p 58)(M(1)M(2)M(3)) are asymptotically normal with covariance ma-trix as the asymptotic value of 〈M(l)M(k)〉 In other words asymp-totically (M(1)M(2)M(3)) are independent normal with respectivevariances var(ε) (Eε2)2 and (Eε2)2

Returning to (A15) we have that

[ε ε](all) minus 2nEε2

= 2sum

i 13=0n

(ε2ti minus Eε2) + (ε2

t0 minus Eε2) + (ε2tn minus Eε2) minus 2

sum

tigt0

εtiεtiminus1

= 2radic

n(M(1) minus M(2)

)+ Op(1) (A19)

Meanwhile

[ε ε](k) minus 2nkEε2

=sum

tiisinG(k)ti 13=maxG(k)

(εti+ minus εti

)2 minus 2nkEε2

= 2sum

tiisinG(k)

(ε2

ti minus Eε2)minus (ε2minG(k) minus Eε2)minus (ε2

maxG(k) minus Eε2)

minus 2sum

tiisinG(k)

εtiεtiminus (A20)

where nk + 1 represents the total number of sampling points in G(k)Hence

[ε ε](avg)T K minus 2nEε2K = radic

n(2M(1) minus 2M(3)

)minus R

= 2radic

n(M(1) minus M(3)

)+ Op(K12) (A21)

because R =sumKk=1[(ε2

minG(k) minus Eε2) + (ε2maxG(k) minus Eε2)] satisfying

ER2 = var(R) le 4K var(ε2)

Because nminus1K rarr 0 and because the error terms in (A19) and(A20) are uniformly integrable it follows that

(A15) = 2(M(1) minus M(2)M(1) minus M(3)

)+ op(1) (A22)

Hence (A15) is also asymptotically normal with covariance matrix(

4Eε4 4 var(ε2)

4 var(ε2) 4Eε4

)

By Lemma A2 and as nminus1K rarr 0

1radicn

([YY](all)T minus 2nEε2K

([YY](avg)T minus [XX](avg)

T minus 2nEε2))∣∣∣X

is asymptotically normal

1radicn

( [YY](all)T minus 2nEε2

[YY](avg)T K minus 2nKEε2 minus [XX](avg)

T K

∣∣∣∣X

)

= 2

(M(1) minus M(2)

M(1) minus M(3)

)

+ op(1)

Lminusrarr 2N

(

0

(Eε4 var(ε2)

var(ε2) Eε4

))

(A23)

Because

Eε2 = 1

2n[YY](all)

T and(A24)

Kradicn

=radic

K

n(1 + o(1))

Theorem A1 follows

Proof of Lemma 1 For simplicity and without loss of generalitywe assume that H = G The result follows the proofs of Lemma A2

and Theorem A1 but with a more exact representation of [YY](all)T minus

[XX](all)T Specifically from (A9) and (A19) we have that

[YY](all)T minus [XX](all)

T minus 2nEε2

= [ε ε]T minus 2nEε2 + 2[X ε](all)T

= 2sum

i 13=0n

(ε2

ti minus Eε2)+ (ε2t0 minus Eε2)+ (ε2

tn minus Eε2)minus 2sum

tigt0

εtiεtiminus1

+ 2nminus1sum

i=1

(Xtiminus1 minus Xti

)εti + 2Xtnminus1εtn minus 2Xt0εt0

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 17: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

1410 Journal of the American Statistical Association December 2005

With the same filtration as in the proof of Theorem A1 this is thesum of a martingale triangular array with increment 2(ε2

ti minus Eε2) minus2εtiεtiminus1 + (Xtiminus1 minusXti)εti for i 13= 0 n Again the conditions of themartingale central limit theorem (see Hall and Heyde 1980 chap 3)are satisfied by (17) whence the term is asymptotically normal con-ditionally on the X process The conditional variance is given by (20)under the conditions mentioned in the statement of the lemma Be-cause Eε2 can vary with n the Op(nminus12

sparse) term in (20) is actually of

order Op(nminus12sparseEε2)

A3 Asymptotics of DT

For transparency of notation we take t = Tn in other words theaverage of the ti

Proof of Theorem 2 Note first that because we assume thatmicrot and σt are continuous they are locally bounded Because here weseek to show convergence in probability we can therefore assumewithout loss of generality that |microt| is bounded above by a nonrandomconstant and also that infin gt σ+ ge σt ge σminus gt 0 where σ+ and σminus arenonrandom This is by the usual stopping time argument

Also note that once microt and σt are bounded as we have now assumedby Girsanovrsquos theorem (see eg Karatzas and Shreve 1991 chap 35Jacod and Shiryaev 2003 chap II3b) we can without loss of gen-erality further suppose that microt = 0 identically (If the convergence inprobability holds under the equivalent probability then it also will holdunder the original one)

Now for the derivation First consider

(Xti minus Xtiminus

)2 =( timinus1sum

tj=timinusXtj

)2

=timinus1sum

tj=timinus

(Xtj

)2 + 2sum

timinus1getkgttlgetiminusXtkXtl

It follows that

[XX](avg)T

= [XX](all)T + 2

nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ Op(Kn)

The remainder term incorporates end effects and has order Op(Kn)

by (41) and (42) and due to the regular allocation of points to gridsFollowing Jacod and Protter (1998) and Mykland and Zhang (2002)

[XX](all)T minus 〈XX〉T = Op(1

radicn ) = op(

radicKn ) too under the as-

sumption (17) [which is a special case of assumption (41)] and thatinftisin[0T] σ 2

t gt 0 almost surely In other words

DT = ([XX](avg)T minus 〈XX〉T

)

= 2nminus1sum

i=1

(Xti

)Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

)+ op(radic

Kt) (A25)

It follows that

〈DD〉T = 4nminus1sum

j=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)(Xtiminusj

))2

+ op(Kt)

= (I) + (II) + op(Kt) (A26)

where

(I) = 4nminus1sum

i=1

(〈XX〉ti

)(Kandisum

j=1

(

1 minus j

K

)2(Xtiminusj

)2)

= 4nminus1sum

i=1

σ 4ti ti

(Kandisum

j=1

(

1 minus j

K

)2timinusj

)

+ op(Kt)

= Ktnminus1sum

i=1

σ 4ti hiti + op(Kt)

where hi is given by (43) Thus Theorem 2 will have been shown if wecan show that

(II) = 8nminus1sum

i=1

〈XX〉tiζi (A27)

is of order op(Kt) where

ζi =iminus1sum

lgtrge0

(Xtl

)(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A28)

We do this as follows Denote δ+ = maxi |ti| = O( 1n ) and set

(II)prime = 8sumnminus1

i=1 tiσ 2timinusK

ζi Then by Houmllderrsquos inequality E|(II) minus(II)prime| le δ+n sup|tminuss|le(K+1)δ+ |σ 2

t minus σ 2s |2 supi ζi2 = o(δ+K) be-

cause of the boundedness and continuity of σt and because applyingthe BurkholderndashDavisndashGundy inequality twice we have that

E(ζ 2i ) le E

iminus1sum

l=1

〈XX〉tl

times(

(lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le δ+(σ+)2

timesiminus1sum

l=1

E

((lminus1)and(iminus1)sum

rge0

(Xtr

)(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le (δ+)2(σ+)4

timesiminus1sum

l=1

(lminus1)and(iminus1)sum

rge0

((

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

= O((δ+K)2) uniformly in i (A29)

where the second-to-last transition does for the inner sum what the twofirst inequalitities do for the outer sum Now rewrite

(II)prime = 8nminus1sum

lgtrge0

(Xtl

)(Xtr

)

timesnminus1sum

i=l+1

tiσ2timinusK

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+ (A30)

In the same way as in (A29) we then obtain

E((II)prime)2 le 64(δ+)2(σ+)4

timesnminus1sum

lgtrge0

E

( nminus1sum

i=l+1

tiσ2timinusK

times(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates

Page 18: A Tale of Two Time Scales: Determining Integrated …galton.uchicago.edu/~mykland/paperlinks/p1394.pdfA Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency

Zhang Mykland and Aiumlt-Sahalia Determining Integrated Volatility 1411

le 64(δ+)2(σ+)4 times (δ+)2(σ+)4

timesnminus1sum

lgtrge0

( nminus1sum

i=l+1

(

1 minus i minus l

K

)+(1 minus i minus r

K

)+)2

le 64(δ+)4(σ+)8nK3 = O((δ+K)3) = o((δ+K)2) (A31)

by assumptions (41) and (42) The transition from the second to the lastline in (A31) is due to the fact that

sumnminus1i=l+1(1minus iminusl

K )+(1minus iminusrK )+ le K

and = 0 when lminusr ge K Thus we have shown that (II) = op(Kt) andhence Theorem 2 has been established

We now proceed to the asymptotic distribution of DT We first statea technical condition on the filtration (Ft)0letleT to which Xt and microt(but not the εprimersquos) are assumed to be adapted

Condition E (Description of the filtration) There is a continuousmultidimensional P-local martingale X = (X (1) X (p)) any pso that Ft is the smallest sigma-field containing σ(Xs s le t) and N where N contains all of the null sets in σ(Xs s le T)

For example X can be a collection of Brownian motions

Proof of Theorem 3 We can show by methods similar to those inthe proof of Theorem 2 that if L is any martingale adapted to the filtra-tion generated by X then

supt

∣∣∣∣

1

tK〈DL〉t

∣∣∣∣

prarr 0 (A32)

The stable convergence with respect to the filtration (Ft)0letleT thenfollows in view of Rootzen (1980) or Jacod and Protter (1998) Thisends the proof of Theorem 3

Finally in the case where η2n does not converge one can still use the

mixed normal with variance η2n This is because every subsequence

of η2n has a further subsequence that does converge in probability to

some η2 and hence for which the assumption (48) in Theorem 3 wouldbe satisfied The reason for this is that one can define the distributionfunction of a finite measure by

Gn(t) =sum

ti+1let

hiti (A33)

Because Gn(t) le T supi hi it follows from (45) that the sequence Gnis weakly compact in the sense of weak convergence see Hellyrsquos the-orem (eg Billingsley 1995 p 336) For any convergent subsequenceGn rarr G we then get that

η2n =

int T

0σ 4

t dGn(t) rarrint T

0σ 4

t dG(t) (A34)

almost surely because we have assumed that 〈XX〉primet is a continuousfunction of t One then defines η2 to be the (subsequence-dependent)right side of (A34) To proceed further with the asymptotics continuethe foregoing subsequence and note that

1

tK〈DD〉t asymp

int t

0σ 4

s dGn(s)

rarrint t

0σ 4

s dG(s)

completing the proof

[Received October 2003 Revised December 2004]

REFERENCES

Aiumlt-Sahalia Y Mykland P A and Zhang L (2005) ldquoHow Often to Samplea Continuous-Time Process in the Presence of Market Microstructure NoiserdquoReview of Financial Studies 18 351ndash416

Aldous D J and Eagleson G K (1978) ldquoOn Mixing and Stability of LimitTheoremsrdquo The Annals of Probability 6 325ndash331

Andersen T G Bollerslev T Diebold F X and Labys P (2001) ldquoThe Dis-tribution of Exchange Rate Realized Volatilityrdquo Journal of the American Sta-tistical Association 96 42ndash55

Bandi F M and Russell J R (2003) ldquoMicrostructure Noise Realized Volatil-ity and Optimal Samplingrdquo technical report University of Chicago GraduateSchool of Business

Barndorff-Nielsen O E and Shephard N (2002) ldquoEconometric Analysis ofRealized Volatility and Its Use in Estimating Stochastic Volatility ModelsrdquoJournal of the Royal Statistical Society Ser B 64 253ndash280

Billingsley P (1995) Probability and Measure (3rd ed) New York WileyBrown S J (1990) ldquoEstimating Volatilityrdquo in Financial Options From Theory

to Practice eds S Figlewski W Silber and M Subrahmanyam HomewoodIL Business One-Irwin pp 516ndash537

Chernov M and Ghysels E (2000) ldquoA Study Towards a Unified Approach tothe Joint Estimation of Objective and Risk Neutral Measures for the Purposeof Options Valuationrdquo Journal of Financial Economics 57 407ndash458

Dacorogna M M Genccedilay R Muumlller U Olsen R B and Pictet O V(2001) An Introduction to High-Frequency Finance San Diego AcademicPress

Gallant A R Hsu C-T and Tauchen G T (1999) ldquoUsing Daily RangeData to Calibrate Volatility Diffusions and Extract the Forward IntegratedVariancerdquo The Review of Economics and Statistics 81 617ndash631

Gloter A (2000) ldquoEstimation des Paramegravetres drsquoune Diffusion Cacheacuteerdquo unpub-lished doctoral thesis Universiteacute de Marne-la-Valleacutee

Hall P and Heyde C C (1980) Martingale Limit Theory and Its ApplicationBoston Academic Press

Hansen P R and Lunde A (2004a) ldquoRealized Variance and IID Market Mi-crostructure Noiserdquo technical report Brown University Dept of Economics

(2004b) ldquoAn Unbiased Measure of Realized Variancerdquo technical re-port Brown University Dept of Economics

Heston S (1993) ldquoA Closed-Form Solution for Options With StochasticVolatility With Applications to Bonds and Currency Optionsrdquo Review of Fi-nancial Studies 6 327ndash343

Hull J and White A (1987) ldquoThe Pricing of Options on Assets With Sto-chastic Volatilitiesrdquo Journal of Finance 42 281ndash300

Jacod J and Protter P (1998) ldquoAsymptotic Error Distributions for the EulerMethod for Stochastic Differential Equationsrdquo The Annals of Probability 26267ndash307

Jacod J and Shiryaev A N (2003) Limit Theorems for Stochastic Processes(2nd ed) New York Springer-Verlag

Karatzas I and Shreve S E (1991) Brownian Motion and Stochastic Calcu-lus New York Springer-Verlag

Mykland P A and Zhang L (2002) ldquoANOVA for Diffusionsrdquo technical re-port University of Chicago Dept of Statistics

OrsquoHara M (1995) Market Microstructure Theorys Cambridge MA Black-well

Oomen R (2004) ldquoProperties of Realized Variance for a Pure Jump ProcessCalendar Time Sampling versus Business Time Samplingrdquo technical reportUniversity of Warwick Warwick Business School

Reacutenyi A (1963) ldquoOn Stable Sequences of Eventsrdquo Sankhya Ser A 25293ndash302

Rootzen H (1980) ldquoLimit Distributions for the Error in Approximations ofStochastic Integralsrdquo The Annals of Probability 8 241ndash251

Zhou B (1996) ldquoHigh-Frequency Data and Volatility in Foreign-ExchangeRatesrdquo Journal of Business amp Economic Statistics 14 45ndash52

Zumbach G Corsi F and Trapletti A (2002) ldquoEfficient Estimation ofVolatility Using High-Frequency Datardquo technical report Olsen amp Associates