Chapter12 Solutions

ISMA CENTREUNIVERSITY OF READING

Solutions to the Review Questions at the End of Chapter 121. (a) The scope of possible answers to this part of the question is limited only by the imagination! Simulations studies are useful in any situation where the conditions used need to be fully under the control of the researcher (so that an application to real data will not do) and where an analytical solution to the problem is also unavailable. In econometrics, simulations are particularly useful for examining the impact of model mis-specification on the properties of estimators and forecasts. For example, what is the impact of ignored structural breaks in a series upon GARCH model estimation and forecasting? What is the impact of several very large outliers occurring one after another on tests for ARCH? In finance, an obvious application of simulations, as well as those discussed in Chapter 10, is to producing scenarios for stress-testing risk measurement models. For example, what would be the impact on bank portfolio volatility if the correlations between European stock indices rose to one? What would be the impact on the price discovery process or on market volatility if the number and size of index funds increased substantially?

(b) Pure simulation involves the construction of an entirely new dataset made from artificially constructed data, while bootstrapping involves resampling with replacement from a set of actual data.

Which technique of the two is the more appropriate would obviously depend on the situation at hand. Pure simulation is more useful when it is necessary to work in a completely controlled environment. For example, when examining the effect of a particular mis-specification on the behaviour of hypothesis tests, it would be inadvisable to use bootstrapping, because of course the boostrapped samples could contain other forms of mis-specification. Consider an examination of the effect of autocorrelation on the power of the regression F-test. Use of bootstrapped data may be inappropriate because it violates one or more other assumptions for example, the data may be heteroscedastic or non-normal as well. If the bootstrap were used in this case, the result would be a test of the effect of several mis-specifications on the F-test!

Bootstrapping is useful, however, when it is desirable to mimic some of the distributional properties of actual data series, even if we are not sure quite what they are. For example, when simulating future possible paths for price series as inputs to risk management models or option prices, bootstrapping is useful. In such instances, pure simulation would be less appropriate since it would bring with it a particular set of assumptions in order to simulate the data e.g. that returns are normally distributed. To the extent that these assumptions are not supported by the real data, the simulated option price or risk assessment could be inaccurate.

(c) Variance reduction techniques aim to reduce Monte Carlo sampling error. In other words, they seek to reduce the variability in the estimates of the quantity of interest across different experiments, rather like reducing the standard errors in a regression model. This either makes Monte Carlo simulation more accurate for a given number of replications, making the answers more robust, or it enables the same level of accuracy to be achieved using a considerably smaller number of replications. The two techniques that were discussed in Chapter 10 were antithetic variates and control variates. Mathematical details were given in the chapter and will therefore not be repeated here.

Antithetic variates try to ensure that more of the probability space is covered by taking the opposite (usually the negative) of the selected random draws, and using those as another set of draws to compute the required statistics. Control variates use the known analytical solutions to a similar problem to improve accuracy. Obviously, the success of this latter technique will depend on how close the analytical problem is to the actual one under study. If the two are almost unrelated, the reduction in Monte Carlo sampling variation will be negligible or even negative (i.e. the variance will be higher than if control variates were not used).

(d) Almost all statistical analysis is based on central limit theorems and laws of large numbers. These are used to analytically determine how an estimator will behave as the sample tends to infinity, although the behaviour could be quite different for small samples. If a sample of actual data that is too small is used, there is a high probability that the sample will not be representative of the population as a whole. As the sample size is increased, the probability of obtaining a sample that is unrepresentative of the population is reduced. Exactly the same logic can be applied to the number of replications employed in a Monte Carlo study. If too small a number of replications is used, it is possible that odd combinations of random number draws will lead to results that do not accurately reflect the data generating process. This is increasingly unlikely to happen as the number of replications is increased. Put another way, the whole probability space will gradually be appropriately covered as the number of replications is increased.

(e) Computer generated (pseudo-) random numbers are not random at all, but are entirely deterministic since their generation exactly follows a formula. In intuitive terms, the way that this is done is to start with a number (a seed, usually chosen based on a numerical representation of the computers clock time), and then this number is updated using modular arithmetic. Provided that the seed and the other required parameters that control how the updating occurs are set carefully, the pseudo-random numbers will behave almost exactly as true random numbers would.

(f) Simulation methods are particularly useful when an analytical solution is unavailable for example, for many problems in econometrics, or for pricing exotic options. In such cases, they may be the only approach available. However, in situations where analytical results are available, simulations have several disadvantages.

First, simulations may require a great deal of computer power. There are still many problems in econometrics that are unsolvable even with a brand new supercomputer! The problems seem to grow in dimension and complexity at the same rate as the power of computers!

Second, simulations may be inaccurate if an insufficient number of replications are used. Even if variance reduction techniques are employed, the required number of replications to achieve acceptable accuracy could be very large. This is especially true in the case of simulations that require accurate estimation of extreme or tail events. For example, the pricing of deep out of the money options is difficult to do accurately using simulation since most of the replications will give a zero value for the option. Equally, it is difficult to measure the probability of crashes, or to determine 1% critical values accurately.

Third, a corollary of the second point is that simulations by their very nature are difficult to replicate. The random draws used for simulations are usually calculated based on a random number seed that is set according to the clock time of the computer. So run two simulations, one ten minutes after the other and you will get two completely different sets or random draws. Obviously, again this problem would disappear if enough replications are used, but the required number of replications may be infeasibly large.

Finally, there is a real danger that the results of a Monte Carlo study will be specific to the particular sets of parameters investigated. An analytical result, on the other hand, if available, may be generally applicable. The answer given below for question 3 shows that, for a multi-dimensioned problem, a lot of work is required to do sufficient experiments to ensure that the results are sufficiently general.

2. Although this is a short question, a good answer would be quite involved. Recall the null and alternatives for the Ljung-Box (LB) test:H0: (1 = 0 and (2 = 0 and and (m = 0

H0: (1 ( 0 or (2 ( 0 or or (m ( 0The question does not state the number of lags that should be used in the LB test, so it would be advisable to design a framework that examined the results using several different lag lengths. I assume that lag lengths m = 1, 5, and 10 are used, for sample sizes T = 100, 500, 1000.

Probably the easiest way to explain what would happen is using pseudo-code. I have written a single set of instructions that would do the whole simulation in two goes, although it is of course possible to separate it into several different experiments (e.g. one for size and a separate one for power, and separate experiments for each sample size etc).

Part 1: Simulation with no GARCH as benchmark.1. Generate a sample of length T from a standard normal distribution call these draws ut. The size of a test is examined by using a DGP that is correct under the null hypothesis in this case, we want a series that is not autocorrelated. Thus the data for examination would be ut. Set xt = ut to avoid any later confusion.

The power of the test would be examined by using a DGP that is wrong under the null hypothesis. Obviously, the power of the test should increase as the null becomes more wrong. To evaluate the power of the test, generate the following data:

yt = (yt-1 + utwith ( = 0.1, 0.5, and 0.8 (assume y0 = 0 in each case).

2. For each sample of xt and yt, construct the Ljung-Box test for each lag length (m, and perform the test.

3. Repeat steps 1 and 2 N times, where N is the number of replications. Assume that N has been set to 10,000. The size of the test will be given by the percentage of times that the null hypothesis on xt is correctly not rejected. The power of the test will be given by the percentage of times that the null on yt is correctly rejected.

Part 2: Simulation with GARCH.Repeat steps 1 to 3 above exactly, but generate the data so that ut follows a GARCH(1,1) process. This would be achieved, given that ut has been drawn. Some parameters for the GARCH process would have to also be assumed. For xt, use the equations:

(t2 = 0.0001 + 0.1xt-12 + 0.8(t-12, with (02 = 0.001.

xt = (tutand for yt:(t2 = 0.0001 + 0.1xt-12 + 0.8(t-12, with (02 = 0.001.

yt = (yt-1 + (tutFinally, the effect of GARCH would be investigated by comparing the size and power under parts 1 and 2. If the size increases from its nominal value, or the power falls, when the GARCH is added, it would be concluded that GARCH does have an adverse effect on the LB test.

3. (a) This would be a very easy simulation to do. The first thing would be to choose the appropriate values of ( to use. It would be tempting to choose a spread of values, say from 0 to 1 in units of 0.2, (i.e. 0, 0.2, .., 0.8, 1), but this would not be optimal since all of the interesting behaviour will occur when ( gets close to 1. Therefore the values are better skewed towards 1, e.g. 0, 0.5, 0.8, 0.9, 0.95, 0.99, and 1. This gives 7 values of ( which should give the flavour of what happens without the results being overwhelming or the exercise becoming too tedious.

Note that the question says nothing about the sample sizes to use and in fact the impact of a unit root (i.e. ( = 1) will not disappear asymptotically. A good researcher, however, would choose a range of sample sizes that are empirically relevant (e.g. 100, 500, and 2000 observations) and would conduct the simulation using all of them. Assuming that a sample size of 500 is used, the next step would be to generate the random draws. A good simulation would generate more than 500 draws for each replication, to allow for some start-up observations that are later discarded. Again, nothing is stated in the question about what distribution should be used to generate the random draws. Unless there is a particular reason to do otherwise (for example, if the impact of fat tails is of particular importance), it is common to use a standard normal distribution for the disturbances. We also need to select a starting value for y call this y1, and set this starting value to zero (note: the reason why we allow for start-up observations is to minimise the impact of this choice of initial value for y).

Once a set of disturbances are drawn, and the initial value for y is chosen, the next stage is to recursively construct a series that follows the required AR(1) model:

y2 = (y1 + u2y3 = (y2 + u3

y500 = (y499 + u500

The next stage would be to estimate an AR(1), and to construct and save the t-ratios on the ( coefficient for each replication. Note that this estimated regression should include an intercept to allow for a simulated series with a non-zero mean, even though y has a zero mean under the DGP. It is probably sensible and desirable to use the same set of random draws for each experiment that has different values of (.

Producing the EViews or RATS code for this problem is left for readers and should not be difficult given the examples in the chapter! However, some pseudo-code could beSET NUMBER OF REPS

SET SAMPLE SIZE

SET VALUES OF PHI CALL THEM PHI1 TO PHI7

CLEAR ARRAYS FOR Y USED FOR EACH VALUE OF PHI Y1 TO Y7

SET INITIAL VALUES FOR THE YS TO ZER0 Y1(1)=0, Y2(1)=0, ETC

START REPLICATIONS LOOP

GENERATE RANDOM DRAWS U FOR SAMPLE SIZE PLUS 200 START-UP OBSERVATIONSSTART RECURSIVE Y GENERATION LOOP WITH J=2

Y1(J)=PHI1*Y1(J-1)+U(J)

Y2(J)=PHI2*Y2(J-1)+U(J)

Y3(J)=PHI3*Y3(J-1)+U(J)

Y4(J)=PHI4*Y4(J-1)+U(J)

Y5(J)=PHI5*Y5(J-1)+U(J)

Y6(J)=PHI6*Y6(J-1)+U(J)

Y7(J)=PHI7*Y7(J-1)+U(J)

END Y GENERATION LOOP

RUN A LINEAR REGRESSION OF Y1 ON A CONSTANT AND THE LAGGED VALUE OF Y1

AND REPEAT THIS FOR EACH OF Y2, Y3 ETC

SAVE THE SLOPE COEFFICIENT T-RATIOS AND STORE THESE

END REPLICATIONS LOOP

This will yield a set of estimated t-ratios for each of the values of (. These could be plotted or summarised as desired.

(b) The broad structure of the appropriate simulation for part (b) would be similar to that for part (a), but there is now the added dimension of the size of the sample for each replication. This is therefore a multi-dimensioned problem: we want to produce a simulated series for each value of ( and for each sample size. The easiest way to achieve this if the answer to part (a) had already been constructed would be to run separate simulations as above for different sample sizes. Note also that in part (b), it is the estimated values of the ( coefficients rather than their t-ratios that are of interest. What we would expect is that the average of the ( estimates across the replications would converge upon their actual value as the sample size increases.

4. Again, this is a fairly simple exercise given the code that was presented in the chapter. All that is required is to add a clause in to the recursive generation of the path of the underlying asset to say that if the price at that time falls below the barrier, then the value of the knock-out call for that replication is zero. It would also save computational time if this clause also halted the simulation of the path for that replication and went straight on to the next replication.

6/6 Introductory Econometrics for Finance Chris Brooks 2008

_1077448875.unknown

Chapter12 Solutions

Documents

Transcript of Chapter12 Solutions